Relative mutation rates of each nucleotide for another estimated from allele frequency spectra at human gene loci. 2009

Leeyoung Park
Natural Science Research Institute, Yonsei University, 134 Shinchon-Dong, Seodaemun-Ku, Seoul, Korea. lypark@yonsei.ac.kr

This study aims to comprehensively examine the mutation rates of one base for another in human gene loci. In contrast to most previous efforts based on divergence data from untranscribed regions, the present study employs the basic theory of the reversible recurrent mutation model using large-scale, high-quality re-sequencing data from public databases of gene loci. Population mutation parameters (4Nnu and 4Nmu) are obtained for each pair of base substitutions. The estimated parameters show good strand reversal symmetry, supporting the existence of mutation-drift equilibrium. Analysis of specific gene regions including mRNA, coding sequence (CDS), 5'-untranslated region (5'-UTRs), 3'-UTR and intron shows that there are clear differences in the mutation rates of each base for another depending on the location of the base in question. Results from analyses that take the adjacent bases into account exhibit excellent strand reversal symmetry, confirming that the identity of an adjacent base influences mutation rates. The CpG to TpG (or CpG to CpA) substitution is found at a rate approximately seven-fold higher than the reverse transition in intron regions due to cytosine deamination, but the effect is strongly reduced in mRNA regions and almost entirely lost in 5'-UTRs. However, from the overall increased transitions in sites other than CpGs and the proportion of CpGs in the total sequence, CpG methylation is not the main factor responsible for the increased rate of transitions as compared with transversions. In this report, after adjusting average mutation rates to the sequence compositions, no substitution bias is found between A+T and C+G, indicating base composition equilibrium in human gene loci. Population differences are also identified between groups of people of African and European descent, presumably due to past population histories. By applying the basic theory of population genetics to re-sequenced data, this study contributes new, detailed information regarding mutations in human gene regions.

UI MeSH Term Description Entries
D008957 Models, Genetic Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Genetic Models,Genetic Model,Model, Genetic
D009154 Mutation Any detectable and heritable change in the genetic material that causes a change in the GENOTYPE and which is transmitted to daughter cells and to succeeding generations. Mutations
D009711 Nucleotides The monomeric units from which DNA or RNA polymers are constructed. They consist of a purine or pyrimidine base, a pentose sugar, and a phosphate group. (From King & Stansfield, A Dictionary of Genetics, 4th ed) Nucleotide
D005787 Gene Frequency The proportion of one particular in the total of all ALLELES for one genetic locus in a breeding POPULATION. Allele Frequency,Genetic Equilibrium,Equilibrium, Genetic,Allele Frequencies,Frequencies, Allele,Frequencies, Gene,Frequency, Allele,Frequency, Gene,Gene Frequencies
D005828 Genetics, Population The discipline studying genetic composition of populations and effects of factors such as GENETIC SELECTION, population size, MUTATION, migration, and GENETIC DRIFT on the frequencies of various GENOTYPES and PHENOTYPES using a variety of GENETIC TECHNIQUES. Population Genetics
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D001482 Base Composition The relative amounts of the PURINES and PYRIMIDINES in a nucleic acid. Base Ratio,G+C Composition,Guanine + Cytosine Composition,G+C Content,GC Composition,GC Content,Guanine + Cytosine Content,Base Compositions,Base Ratios,Composition, Base,Composition, G+C,Composition, GC,Compositions, Base,Compositions, G+C,Compositions, GC,Content, G+C,Content, GC,Contents, G+C,Contents, GC,G+C Compositions,G+C Contents,GC Compositions,GC Contents,Ratio, Base,Ratios, Base
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D015894 Genome, Human The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs. Human Genome,Genomes, Human,Human Genomes
D016366 Open Reading Frames A sequence of successive nucleotide triplets that are read as CODONS specifying AMINO ACIDS and begin with an INITIATOR CODON and end with a stop codon (CODON, TERMINATOR). ORFs,Protein Coding Region,Small Open Reading Frame,Small Open Reading Frames,sORF,Unassigned Reading Frame,Unassigned Reading Frames,Unidentified Reading Frame,Coding Region, Protein,Frame, Unidentified Reading,ORF,Open Reading Frame,Protein Coding Regions,Reading Frame, Open,Reading Frame, Unassigned,Reading Frame, Unidentified,Region, Protein Coding,Unidentified Reading Frames

Related Publications

Leeyoung Park
February 1997, Proceedings of the National Academy of Sciences of the United States of America,
Leeyoung Park
May 1995, Mammalian genome : official journal of the International Mammalian Genome Society,
Leeyoung Park
January 1981, Annals of human biology,
Leeyoung Park
October 1956, Proceedings of the National Academy of Sciences of the United States of America,
Leeyoung Park
June 2011, Bulletin of mathematical biology,
Leeyoung Park
February 2016, Nature genetics,
Copied contents to your clipboard!