Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. 2007

Jong Hyun Kim, and Michael S Waterman, and Lei M Li
Department of Computer Science, Yonsei University, Seoul, Republic of Korea. jonghkim@usc.edu

One of the main goals in genome sequencing projects is to determine a haploid consensus sequence even when clone libraries are constructed from homologous chromosomes. However, it has been noticed that haplotypes can be inferred from genome assemblies by investigating phase conservation in sequenced reads. In this study, we seek to infer haplotypes, a diploid consensus sequence, from the genome assembly of an organism, Ciona intestinalis. The Ciona intestinalis genome is an ideal resource from which haplotypes can be inferred because of the high polymorphism rate (1.2%). The haplotype estimation scheme consists of polymorphism detection and phase estimation. The core step of our method is a Gibbs sampling procedure. The mate-pair information from two-end sequenced clone inserts is exploited to provide long-range continuity. We estimate the polymorphism rate of Ciona intestinalis to be 1.2% and 1.5%, according to two different polymorphism counting schemes. The distribution of heterozygosity number is well fit by a compound Poisson distribution. The N50 length of haplotype segments is 37.9 kb in our assembly, while the N50 scaffold length of the Ciona intestinalis assembly is 190 kb. We also infer diploid gene sequences from haplotype segments. According to our reconstruction, 85.4% of predicted gene sequences are continuously covered by single haplotype segments. Our results indicate 97% accuracy in haplotype estimation, based on a simulated data set. We conduct a comparative analysis with Ciona savignyi, and discover interesting patterns of conserved DNA elements in chordates.

UI MeSH Term Description Entries
D002938 Ciona intestinalis Vase or tube shaped TUNICATES with a cosmopolitan distribution. Ciona robusta,Vase Tunicate,Yellow Sea Squirt,Sea Squirt, Yellow,Sea Squirts, Yellow,Vase Tunicates,Yellow Sea Squirts
D003198 Computer Simulation Computer-based representation of physical systems and phenomena such as chemical processes. Computational Modeling,Computational Modelling,Computer Models,In silico Modeling,In silico Models,In silico Simulation,Models, Computer,Computerized Models,Computer Model,Computer Simulations,Computerized Model,In silico Model,Model, Computer,Model, Computerized,Model, In silico,Modeling, Computational,Modeling, In silico,Modelling, Computational,Simulation, Computer,Simulation, In silico,Simulations, Computer
D004171 Diploidy The chromosomal constitution of cells, in which each type of CHROMOSOME is represented twice. Symbol: 2N or 2X. Diploid,Diploid Cell,Cell, Diploid,Cells, Diploid,Diploid Cells,Diploidies,Diploids
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001483 Base Sequence The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence. DNA Sequence,Nucleotide Sequence,RNA Sequence,DNA Sequences,Base Sequences,Nucleotide Sequences,RNA Sequences,Sequence, Base,Sequence, DNA,Sequence, Nucleotide,Sequence, RNA,Sequences, Base,Sequences, DNA,Sequences, Nucleotide,Sequences, RNA
D014561 Urochordata A subphylum of chordates intermediate between the invertebrates and the true vertebrates. It includes the Ascidians. Ascidia,Tunicata,Ascidiacea,Ascidians,Sea Squirts,Tunicates,Urochordates,Ascidian,Sea Squirt,Squirt, Sea,Tunicate,Urochordate
D014644 Genetic Variation Genotypic differences observed among individuals in a population. Genetic Diversity,Variation, Genetic,Diversity, Genetic,Diversities, Genetic,Genetic Diversities,Genetic Variations,Variations, Genetic
D014714 Vertebrates Animals having a vertebral column, members of the phylum Chordata, subphylum Craniata comprising mammals, birds, reptiles, amphibians, and fishes. Vertebrate
D016384 Consensus Sequence A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is the one which occurs most frequently at that site in the different sequences which occur in nature. The phrase also refers to an actual sequence which approximates the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by conserved sequences. Consensus Sequences,Sequence, Consensus,Sequences, Consensus
D016678 Genome The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA. Genomes

Related Publications

Jong Hyun Kim, and Michael S Waterman, and Lei M Li
February 2010, Zoological science,
Jong Hyun Kim, and Michael S Waterman, and Lei M Li
June 1993, Developmental biology,
Jong Hyun Kim, and Michael S Waterman, and Lei M Li
August 2002, Gene,
Jong Hyun Kim, and Michael S Waterman, and Lei M Li
January 2000, Molecular reproduction and development,
Jong Hyun Kim, and Michael S Waterman, and Lei M Li
August 2000, Molecular reproduction and development,
Jong Hyun Kim, and Michael S Waterman, and Lei M Li
December 2014, Development genes and evolution,
Jong Hyun Kim, and Michael S Waterman, and Lei M Li
January 2017, Methods in molecular biology (Clifton, N.J.),
Jong Hyun Kim, and Michael S Waterman, and Lei M Li
July 2003, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme,
Copied contents to your clipboard!