Consistency of metagenomic assignment programs in simulated and real data. 2014

Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell

BACKGROUND Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. RESULTS Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST + LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. CONCLUSIONS The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information.

UI MeSH Term Description Entries
D008810 Mice, Inbred C57BL One of the first INBRED MOUSE STRAINS to be sequenced. This strain is commonly used as genetic background for transgenic mouse models. Refractory to many tumors, this strain is also preferred model for studying role of genetic variations in development of diseases. Mice, C57BL,Mouse, C57BL,Mouse, Inbred C57BL,C57BL Mice,C57BL Mice, Inbred,C57BL Mouse,C57BL Mouse, Inbred,Inbred C57BL Mice,Inbred C57BL Mouse
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001499 Bayes Theorem A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result. Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D012867 Skin The outer covering of the body that protects it from the environment. It is composed of the DERMIS and the EPIDERMIS.
D015203 Reproducibility of Results The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results. Reliability and Validity,Reliability of Result,Reproducibility Of Result,Reproducibility of Finding,Validity of Result,Validity of Results,Face Validity,Reliability (Epidemiology),Reliability of Results,Reproducibility of Findings,Test-Retest Reliability,Validity (Epidemiology),Finding Reproducibilities,Finding Reproducibility,Of Result, Reproducibility,Of Results, Reproducibility,Reliabilities, Test-Retest,Reliability, Test-Retest,Result Reliabilities,Result Reliability,Result Validities,Result Validity,Result, Reproducibility Of,Results, Reproducibility Of,Test Retest Reliability,Validity and Reliability,Validity, Face
D016678 Genome The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA. Genomes
D051379 Mice The common name for the genus Mus. Mice, House,Mus,Mus musculus,Mice, Laboratory,Mouse,Mouse, House,Mouse, Laboratory,Mouse, Swiss,Mus domesticus,Mus musculus domesticus,Swiss Mice,House Mice,House Mouse,Laboratory Mice,Laboratory Mouse,Mice, Swiss,Swiss Mouse,domesticus, Mus musculus
D056186 Metagenomics The systematic study of the GENOMES of assemblages of organisms. Community Genomics,Environmental Genomics,Population Genomics,Genomics, Community,Genomics, Environmental,Genomics, Population

Related Publications

Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
November 2012, Briefings in bioinformatics,
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
October 2023, Microorganisms,
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
May 2020, Briefings in bioinformatics,
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
January 2021, Briefings in bioinformatics,
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
January 2010, Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference,
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
January 2012, PloS one,
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
November 2018, Bioinformatics (Oxford, England),
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
July 2017, Bioinformatics (Oxford, England),
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
October 2015, Journal of aerosol medicine and pulmonary drug delivery,
Koldo Garcia-Etxebarria, and Marc Garcia-Garcerà, and Francesc Calafell
December 1965, Perceptual and motor skills,
Copied contents to your clipboard!