Fundamental asymmetry of insertions and deletions in genomes size evolution. 2019

Yang He, and Suyan Tian, and Pu Tian
School of Life Sciences, Jilin University Changchun, 2699 Qianjin Street, China 130012.

The origin of large genomes that underlies the long standing "C-value enigma" is only partially explained by selfish DNA. We investigated insertions and deletions (indels) of nucleotides and discussed their relevance in size evolution of random biological sequences (RBS) and genomes. By developing a probabilistic model of RBS based on size evolution of expandable sites in a thought perfect genome, it was found that insertion bias engenders exponential increase of average RBS sizes. When combined with existing large segments of genome that are not subject to selection pressure (e.g. selfish DNA), such insertion bias results in explosive expansion of genomes, and therefore helps explain the "C value enigma" besides selfish DNA. Such increase of RBS size is caused by the fundamental asymmetry of indels, with insertions result in more available sites and deletions result in less deletable nucleotides. In qualitative agreement with the size distribution of known genomes, tails of RBS size distributions exhibit exponential decay with probabilities of larger RBS segments being smaller. Unsurprisingly, a slight deletion bias (higher deletions probabilities) results in a slow decrease of average RBS size and may lead to their eventual vanishing. Contrary to intuition, strictly balanced insertion and deletion results in linearly increasing instead of completely fixed RBS size. Nonetheless, such slow linear increase of average RBS sizes with time are small in magnitude and are consequently not influential on genome size evolution, and certainly not a major contributor for the "C-value enigma". Our model suggested that insertion bias of nucleotides may provide complementary explanation for large genomes besides selfish DNA. The fundamental indel asymmetry is applicable for all forms of genomic insertions and deletions. Long-lasting exponential increase of genome size present energy and material requirement that is impossible to sustain. We therefore concluded that if there were explosively accelerating expansion caused by significant effective insertion bias for any survival species, it must have occurred sporadically. Our model also provided an explanation for the observed proportional evolution of genome size.

UI MeSH Term Description Entries
D008962 Models, Theoretical Theoretical representations that simulate the behavior or activity of systems, processes, or phenomena. They include the use of mathematical equations, computers, and other electronic equipment. Experimental Model,Experimental Models,Mathematical Model,Model, Experimental,Models (Theoretical),Models, Experimental,Models, Theoretic,Theoretical Study,Mathematical Models,Model (Theoretical),Model, Mathematical,Model, Theoretical,Models, Mathematical,Studies, Theoretical,Study, Theoretical,Theoretical Model,Theoretical Models,Theoretical Studies
D011897 Random Allocation A process involving chance used in therapeutic trials or other research endeavor for allocating experimental subjects, human or animal, between treatment and control groups, or among treatment groups. It may also apply to experiments on inanimate objects. Randomization,Allocation, Random
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D016254 Mutagenesis, Insertional Mutagenesis where the mutation is caused by the introduction of foreign DNA sequences into a gene or extragenic sequence. This may occur spontaneously in vivo or be experimentally induced in vivo or in vitro. Proviral DNA insertions into or adjacent to a cellular proto-oncogene can interrupt GENETIC TRANSLATION of the coding sequences or interfere with recognition of regulatory elements and cause unregulated expression of the proto-oncogene resulting in tumor formation. Gene Insertion,Insertion Mutation,Insertional Activation,Insertional Mutagenesis,Linker-Insertion Mutagenesis,Mutagenesis, Cassette,Sequence Insertion,Viral Insertional Mutagenesis,Activation, Insertional,Activations, Insertional,Cassette Mutagenesis,Gene Insertions,Insertion Mutations,Insertion, Gene,Insertion, Sequence,Insertional Activations,Insertional Mutagenesis, Viral,Insertions, Gene,Insertions, Sequence,Linker Insertion Mutagenesis,Mutagenesis, Linker-Insertion,Mutagenesis, Viral Insertional,Mutation, Insertion,Mutations, Insertion,Sequence Insertions
D016415 Sequence Alignment The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms. Sequence Homology Determination,Determination, Sequence Homology,Alignment, Sequence,Alignments, Sequence,Determinations, Sequence Homology,Sequence Alignments,Sequence Homology Determinations
D017384 Sequence Deletion Deletion of sequences of nucleic acids from the genetic material of an individual. Deletion Mutation,Deletion Mutations,Deletion, Sequence,Deletions, Sequence,Mutation, Deletion,Mutations, Deletion,Sequence Deletions
D054643 INDEL Mutation A mutation named with the blend of insertion and deletion. It refers to a length difference between two ALLELES where it is unknowable if the difference was originally caused by a SEQUENCE INSERTION or by a SEQUENCE DELETION. If the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region, it is also a FRAMESHIFT MUTATION. INDELs Mutation,Insertions-Deletions Mutation,Insertion-Deletion Mutation,INDEL Mutations,INDELs Mutations,Insertion Deletion Mutation,Insertion-Deletion Mutations,Insertions Deletions Mutation,Insertions-Deletions Mutations,Mutation, INDEL,Mutation, INDELs,Mutation, Insertion-Deletion,Mutation, Insertions-Deletions
D059646 Genome Size The amount of DNA (or RNA) in one copy of a genome. Genome Sizes,Size, Genome,Sizes, Genome

Related Publications

Yang He, and Suyan Tian, and Pu Tian
October 2010, Human molecular genetics,
Yang He, and Suyan Tian, and Pu Tian
January 2010, IEEE/ACM transactions on computational biology and bioinformatics,
Yang He, and Suyan Tian, and Pu Tian
November 2022, Biotechnology advances,
Yang He, and Suyan Tian, and Pu Tian
April 2004, Genome research,
Yang He, and Suyan Tian, and Pu Tian
December 2013, Human mutation,
Yang He, and Suyan Tian, and Pu Tian
December 2008, Physical review. E, Statistical, nonlinear, and soft matter physics,
Yang He, and Suyan Tian, and Pu Tian
November 1994, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme,
Copied contents to your clipboard!