Computational methods for ab initio and comparative gene finding. 2010

Ernesto Picardi, and Graziano Pesole
Dipartimento di Biochimica e Biologia Molecolare E Quagliariello, University of Bari, Bari, Italy.

High-throughput DNA sequencing is increasing the amount of public complete genomes even though a precise gene catalogue for each organism is not yet available. In this context, computational gene finders play a key role in producing a first and cost-effective annotation. Nowadays a compilation of gene prediction tools has been made available to the scientific community and, despite the high number, they can be divided into two main categories: (1) ab initio and (2) evidence based. In the following, we will provide an overview of main methodologies to predict correct exon-intron structures of eukaryotic genes falling in such categories. We will take into account also new strategies that commonly refine ab initio predictions employing comparative genomics or other evidence such as expression data. Finally, we will briefly introduce metrics to in house evaluation of gene predictions in terms of sensitivity and specificity at nucleotide, exon, and gene levels as well.

UI MeSH Term Description Entries
D007438 Introns Sequences of DNA in the genes that are located between the EXONS. They are transcribed along with the exons but are removed from the primary gene transcript by RNA SPLICING to leave mature RNA. Some introns code for separate genes. Intervening Sequences,Sequences, Intervening,Intervening Sequence,Intron,Sequence, Intervening
D008390 Markov Chains A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system. Markov Process,Markov Chain,Chain, Markov,Chains, Markov,Markov Processes,Process, Markov,Processes, Markov
D005091 Exons The parts of a transcript of a split GENE remaining after the INTRONS are removed. They are spliced together to become a MESSENGER RNA or other functional RNA. Mini-Exon,Exon,Mini Exon,Mini-Exons
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D015203 Reproducibility of Results The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results. Reliability and Validity,Reliability of Result,Reproducibility Of Result,Reproducibility of Finding,Validity of Result,Validity of Results,Face Validity,Reliability (Epidemiology),Reliability of Results,Reproducibility of Findings,Test-Retest Reliability,Validity (Epidemiology),Finding Reproducibilities,Finding Reproducibility,Of Result, Reproducibility,Of Results, Reproducibility,Reliabilities, Test-Retest,Reliability, Test-Retest,Result Reliabilities,Result Reliability,Result Validities,Result Validity,Result, Reproducibility Of,Results, Reproducibility Of,Test Retest Reliability,Validity and Reliability,Validity, Face
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model
D017422 Sequence Analysis, DNA A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis. DNA Sequence Analysis,Sequence Determination, DNA,Analysis, DNA Sequence,DNA Sequence Determination,DNA Sequence Determinations,DNA Sequencing,Determination, DNA Sequence,Determinations, DNA Sequence,Sequence Determinations, DNA,Analyses, DNA Sequence,DNA Sequence Analyses,Sequence Analyses, DNA,Sequencing, DNA
D057225 Data Mining Use of sophisticated analysis tools to sort through, organize, examine, and combine large sets of information. Text Mining,Mining, Data,Mining, Text

Related Publications

Ernesto Picardi, and Graziano Pesole
January 2012, Frontiers in genetics,
Ernesto Picardi, and Graziano Pesole
April 2000, Genome research,
Ernesto Picardi, and Graziano Pesole
January 2003, Methods of biochemical analysis,
Ernesto Picardi, and Graziano Pesole
April 2020, The journal of physical chemistry. A,
Ernesto Picardi, and Graziano Pesole
August 2007, Physical chemistry chemical physics : PCCP,
Ernesto Picardi, and Graziano Pesole
May 2018, Chimia,
Ernesto Picardi, and Graziano Pesole
January 2019, PeerJ. Computer science,
Ernesto Picardi, and Graziano Pesole
October 2002, Bioinformatics (Oxford, England),
Ernesto Picardi, and Graziano Pesole
July 2022, Journal of molecular modeling,
Ernesto Picardi, and Graziano Pesole
March 2016, Interdisciplinary sciences, computational life sciences,
Copied contents to your clipboard!