TY - JOUR
T1 - HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes
AU - Xiong, Wenwei
AU - He, Limei
AU - Lai, Jinsheng
AU - Dooner, Hugo K.
AU - Du, Chunguang
PY - 2014/7/15
Y1 - 2014/7/15
N2 - Transposons make up the bulk of eukaryotic genomes, but are difficult to annotate because they evolve rapidly. Most of the unannotated portion of sequenced genomes is probably made up of various divergent transposons that have yet to be categorized. Helitrons are unusual rolling circle eukaryotic transposons that often capture gene sequences, making them of considerable evolutionary importance. Unlike other DNA transposons, Helitrons do not end in inverted repeats or create target site duplications, so they are particularly challenging to identify. Here we present HelitronScanner, a two-layered local combinational variable (LCV) tool for generalized Helitron identification that represents a major improvement over previous identification programs based on DNA sequence or structure. HelitronScanner identified 64,654 Helitrons from a wide range of plant genomes in a highly automated way. We tested HelitronScanner's predictive ability in maize, a species with highly heterogeneous Helitron elements. LCV scores for the 5? and 3? termini of the predicted Helitrons provide a primary confidence level and element copy number provides a secondary one. Newly identified Helitrons were validated by PCR assays or by in silico comparative analysis of insertion site polymorphism among multiple accessions. Many new Helitrons were identified in model species, such as maize, rice, and Arabidopsis, and in a variety of organisms where Helitrons had not been reported previously to our knowledge, leading to a major upward reassessment of their abundance in plant genomes. HelitronScanner promises to be a valuable tool in future comparative and evolutionary studies of this major transposon superfamily.
AB - Transposons make up the bulk of eukaryotic genomes, but are difficult to annotate because they evolve rapidly. Most of the unannotated portion of sequenced genomes is probably made up of various divergent transposons that have yet to be categorized. Helitrons are unusual rolling circle eukaryotic transposons that often capture gene sequences, making them of considerable evolutionary importance. Unlike other DNA transposons, Helitrons do not end in inverted repeats or create target site duplications, so they are particularly challenging to identify. Here we present HelitronScanner, a two-layered local combinational variable (LCV) tool for generalized Helitron identification that represents a major improvement over previous identification programs based on DNA sequence or structure. HelitronScanner identified 64,654 Helitrons from a wide range of plant genomes in a highly automated way. We tested HelitronScanner's predictive ability in maize, a species with highly heterogeneous Helitron elements. LCV scores for the 5? and 3? termini of the predicted Helitrons provide a primary confidence level and element copy number provides a secondary one. Newly identified Helitrons were validated by PCR assays or by in silico comparative analysis of insertion site polymorphism among multiple accessions. Many new Helitrons were identified in model species, such as maize, rice, and Arabidopsis, and in a variety of organisms where Helitrons had not been reported previously to our knowledge, leading to a major upward reassessment of their abundance in plant genomes. HelitronScanner promises to be a valuable tool in future comparative and evolutionary studies of this major transposon superfamily.
KW - Algorithm
KW - Bioinformatic analysis
KW - Computational tool
KW - Transposition
UR - http://www.scopus.com/inward/record.url?scp=84904310867&partnerID=8YFLogxK
U2 - 10.1073/pnas.1410068111
DO - 10.1073/pnas.1410068111
M3 - Article
C2 - 24982153
AN - SCOPUS:84904310867
SN - 0027-8424
VL - 111
SP - 10263
EP - 10268
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 28
ER -