PhD student Ari Gewirtz presents Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues with Will Townes.

In this preprint, we combine latent Dirichlet allocation + Structure model to find associated genetic variants and RNA-seq read counts. We added telescoping to allow one (donor’s) genotype to be paired with many tissue samples from that donor, instead of a one-to-one pairing.

When applied to genotypes and gene expression from ten tissues from the GTEx v8 data, we find that many of the latent factors are tissue specific, although this varies by tissue.

We map cis-eQTLs in every relevant latent factor (intrachromosomal genes and SNPs), a total of 53,358 across ten tissues, compared with ~6% of interchromosomal factors yielding trans-eQTLs. We compared against known GTEx cis-eQTLs and found mostly known but some new cis-eQTLs.

We find a number of trans-eQTLs with broad effects across the genome, perhaps because the PEER-correction performed in univariate analyses removes these signals.

We also find that some components correspond well with the inferred cell type proportions of specific cell types in the GTEx bulk RNA-sequencing data.

Lots of opportunities to look for associations in one-vs-many count data! Feedback welcome. Try the TBLDA method out on your data.