Can public online databases serve as a source of phenotypic information for Cannabis genetic association studies?

Matthew L. Aardema, Rob DeSalle

Research output: Contribution to journalArticlepeer-review


The use of Cannabis is gaining greater social acceptance for its beneficial medicinal and recreational uses. With this acceptance has come new opportunities for crop management, selective breeding, and the potential for targeted genetic manipulation. However, as an agricultural product Cannabis lags far behind other domesticated plants in knowledge of the genes and genetic variation that influence plant traits of interest such as growth form and chemical composition. Despite this lack of information, there are substantial publicly available resources that document phenotypic traits believed to be associated with particular Cannabis varieties. Such databases could be a valuable resource for developing a greater understanding of genes underlying phenotypic variation if combined with appropriate genetic information. To test this potential, we collated phenotypic data from information available through multiple online databases. We then produced a Cannabis SNP database from 845 strains to examine genome wide associations in conjunction with our assembled phenotypic traits. Our goal was not to locate Cannabis-specific genetic variation that correlates with phenotypic variation as such, but rather to examine the potential utility of these databases more broadly for future, explicit genome wide association studies (GWAS), either in stand-alone analyses or to complement other types of data. For this reason, we examined a very broad array of phenotypic traits. In total, we performed 201 distinct association tests using web-derived phenotype data appended to 290 uniquely named Cannabis strains. Our results indicated that chemical phenotypes, such as tetrahydrocannabinol (THC) and cannabidiol (CBD) content, may have sufficiently high-quality information available through web-based sources to allow for genetic association inferences. In many cases, variation in chemical traits correlated with genetic variation in or near biologically reasonable candidate genes, including several not previously implicated in Cannabis chemical variation. As with chemical phenotypes, we found that publicly available data on growth traits such as height, area of growth, and floral yield may be precise enough for use in future association studies. In contrast, phenotypic information for subjective traits such as taste, physiological affect, neurological affect, and medicinal use appeared less reliable. These results are consistent with the high degree of subjectivity for such trait data found on internet databases, and suggest that future work on these important but less easily quantifiable characteristics of Cannabis may require dedicated, controlled phenotyping.

Original languageEnglish
Article numbere0247607
JournalPLoS ONE
Issue number2 February
StatePublished - Feb 2021


Dive into the research topics of 'Can public online databases serve as a source of phenotypic information for Cannabis genetic association studies?'. Together they form a unique fingerprint.

Cite this