InsertionMapper: A pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data

Wenwei Xiong, Limei He, Yubin Li, Hugo K. Dooner, Chunguang Du

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: The advent of next-generation high-throughput technologies has revolutionized whole genome sequencing, yet some experiments require sequencing only of targeted regions of the genome from a very large number of samples. These regions can be amplified by PCR and sequenced by next-generation methods using a multidimensional pooling strategy. However, there is at present no available generalized tool for the computational analysis of target-enriched NGS data from multidimensional pools.Results: Here we present InsertionMapper, a pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data. InsertionMapper consists of four independently working modules: Data Preprocessing, Database Modeling, Dimension Deconvolution and Element Mapping. We illustrate InsertionMapper with an example from our project 'New reverse genetics resources for maize', which aims to sequence-index a collection of 15,000 independent insertion sites of the transposon Ds in maize. Identified sequences are validated by PCR assays. This pipeline tool is applicable to similar scenarios requiring analysis of the tremendous output of short reads produced in NGS sequencing experiments of targeted genome sequences.Conclusions: InsertionMapper is proven efficacious for the identification of target-enriched sequences from multidimensional high throughput sequencing data. With adjustable parameters and experiment configurations, this tool can save great computational effort to biologists interested in identifying their sequences of interest within the huge output of modern DNA sequencers. InsertionMapper is freely accessible at https://sourceforge.net/p/insertionmapper and http://bo.csam.montclair.edu/du/insertionmapper.

Original languageEnglish
Article number679
JournalBMC Genomics
Volume14
Issue number1
DOIs
StatePublished - 4 Oct 2013

Fingerprint

Genome
Zea mays
Reverse Genetics
Polymerase Chain Reaction
Databases
Technology
DNA

Keywords

  • Multidimensional pooling
  • Next-generation sequencing
  • Sequence identification
  • Target enrichment

Cite this

@article{ee353fec95c74c34a910b80f350006ea,
title = "InsertionMapper: A pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data",
abstract = "Background: The advent of next-generation high-throughput technologies has revolutionized whole genome sequencing, yet some experiments require sequencing only of targeted regions of the genome from a very large number of samples. These regions can be amplified by PCR and sequenced by next-generation methods using a multidimensional pooling strategy. However, there is at present no available generalized tool for the computational analysis of target-enriched NGS data from multidimensional pools.Results: Here we present InsertionMapper, a pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data. InsertionMapper consists of four independently working modules: Data Preprocessing, Database Modeling, Dimension Deconvolution and Element Mapping. We illustrate InsertionMapper with an example from our project 'New reverse genetics resources for maize', which aims to sequence-index a collection of 15,000 independent insertion sites of the transposon Ds in maize. Identified sequences are validated by PCR assays. This pipeline tool is applicable to similar scenarios requiring analysis of the tremendous output of short reads produced in NGS sequencing experiments of targeted genome sequences.Conclusions: InsertionMapper is proven efficacious for the identification of target-enriched sequences from multidimensional high throughput sequencing data. With adjustable parameters and experiment configurations, this tool can save great computational effort to biologists interested in identifying their sequences of interest within the huge output of modern DNA sequencers. InsertionMapper is freely accessible at https://sourceforge.net/p/insertionmapper and http://bo.csam.montclair.edu/du/insertionmapper.",
keywords = "Multidimensional pooling, Next-generation sequencing, Sequence identification, Target enrichment",
author = "Wenwei Xiong and Limei He and Yubin Li and Dooner, {Hugo K.} and Chunguang Du",
year = "2013",
month = "10",
day = "4",
doi = "10.1186/1471-2164-14-679",
language = "English",
volume = "14",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central Ltd.",
number = "1",

}

InsertionMapper : A pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data. / Xiong, Wenwei; He, Limei; Li, Yubin; Dooner, Hugo K.; Du, Chunguang.

In: BMC Genomics, Vol. 14, No. 1, 679, 04.10.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - InsertionMapper

T2 - A pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data

AU - Xiong, Wenwei

AU - He, Limei

AU - Li, Yubin

AU - Dooner, Hugo K.

AU - Du, Chunguang

PY - 2013/10/4

Y1 - 2013/10/4

N2 - Background: The advent of next-generation high-throughput technologies has revolutionized whole genome sequencing, yet some experiments require sequencing only of targeted regions of the genome from a very large number of samples. These regions can be amplified by PCR and sequenced by next-generation methods using a multidimensional pooling strategy. However, there is at present no available generalized tool for the computational analysis of target-enriched NGS data from multidimensional pools.Results: Here we present InsertionMapper, a pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data. InsertionMapper consists of four independently working modules: Data Preprocessing, Database Modeling, Dimension Deconvolution and Element Mapping. We illustrate InsertionMapper with an example from our project 'New reverse genetics resources for maize', which aims to sequence-index a collection of 15,000 independent insertion sites of the transposon Ds in maize. Identified sequences are validated by PCR assays. This pipeline tool is applicable to similar scenarios requiring analysis of the tremendous output of short reads produced in NGS sequencing experiments of targeted genome sequences.Conclusions: InsertionMapper is proven efficacious for the identification of target-enriched sequences from multidimensional high throughput sequencing data. With adjustable parameters and experiment configurations, this tool can save great computational effort to biologists interested in identifying their sequences of interest within the huge output of modern DNA sequencers. InsertionMapper is freely accessible at https://sourceforge.net/p/insertionmapper and http://bo.csam.montclair.edu/du/insertionmapper.

AB - Background: The advent of next-generation high-throughput technologies has revolutionized whole genome sequencing, yet some experiments require sequencing only of targeted regions of the genome from a very large number of samples. These regions can be amplified by PCR and sequenced by next-generation methods using a multidimensional pooling strategy. However, there is at present no available generalized tool for the computational analysis of target-enriched NGS data from multidimensional pools.Results: Here we present InsertionMapper, a pipeline tool for the identification of targeted sequences from multidimensional high throughput sequencing data. InsertionMapper consists of four independently working modules: Data Preprocessing, Database Modeling, Dimension Deconvolution and Element Mapping. We illustrate InsertionMapper with an example from our project 'New reverse genetics resources for maize', which aims to sequence-index a collection of 15,000 independent insertion sites of the transposon Ds in maize. Identified sequences are validated by PCR assays. This pipeline tool is applicable to similar scenarios requiring analysis of the tremendous output of short reads produced in NGS sequencing experiments of targeted genome sequences.Conclusions: InsertionMapper is proven efficacious for the identification of target-enriched sequences from multidimensional high throughput sequencing data. With adjustable parameters and experiment configurations, this tool can save great computational effort to biologists interested in identifying their sequences of interest within the huge output of modern DNA sequencers. InsertionMapper is freely accessible at https://sourceforge.net/p/insertionmapper and http://bo.csam.montclair.edu/du/insertionmapper.

KW - Multidimensional pooling

KW - Next-generation sequencing

KW - Sequence identification

KW - Target enrichment

UR - http://www.scopus.com/inward/record.url?scp=84884967275&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-14-679

DO - 10.1186/1471-2164-14-679

M3 - Article

C2 - 24090499

AN - SCOPUS:84884967275

VL - 14

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 679

ER -