Faster sequence homology searches by clustering subsequences. 2015

Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan Graduate School of Information Science and Engineering, Tokyo Institute of Technology and Education Academy of Computational Life Sciences (ACLS), Tokyo Institute of Technology, Tokyo 152-8550, Japan.

BACKGROUND Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. RESULTS We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. METHODS The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ BACKGROUND akiyama@cs.titech.ac.jp BACKGROUND Supplementary data are available at Bioinformatics online.

UI MeSH Term Description Entries
D008889 Military Personnel Persons including soldiers involved with the armed forces. Air Force Personnel,Armed Forces Personnel,Army Personnel,Coast Guard,Marines,Navy Personnel,Sailors,Soldiers,Submariners,Military,Force Personnel, Air,Personnel, Air Force,Personnel, Armed Forces,Personnel, Army,Personnel, Military,Personnel, Navy,Sailor,Soldier,Submariner
D008969 Molecular Sequence Data Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories. Sequence Data, Molecular,Molecular Sequencing Data,Data, Molecular Sequence,Data, Molecular Sequencing,Sequencing Data, Molecular
D011381 Programming Languages Specific languages used to prepare computer programs. Language, Programming,Languages, Programming,Programming Language
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000595 Amino Acid Sequence The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION. Protein Structure, Primary,Amino Acid Sequences,Sequence, Amino Acid,Sequences, Amino Acid,Primary Protein Structure,Primary Protein Structures,Protein Structures, Primary,Structure, Primary Protein,Structures, Primary Protein
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D012984 Software Sequential operating programs and data which instruct the functioning of a digital computer. Computer Programs,Computer Software,Open Source Software,Software Engineering,Software Tools,Computer Applications Software,Computer Programs and Programming,Computer Software Applications,Application, Computer Software,Applications Software, Computer,Applications Softwares, Computer,Applications, Computer Software,Computer Applications Softwares,Computer Program,Computer Software Application,Engineering, Software,Open Source Softwares,Program, Computer,Programs, Computer,Software Application, Computer,Software Applications, Computer,Software Tool,Software, Computer,Software, Computer Applications,Software, Open Source,Softwares, Computer Applications,Softwares, Open Source,Source Software, Open,Source Softwares, Open,Tool, Software,Tools, Software
D012987 Soil The unconsolidated mineral or organic matter on the surface of the earth that serves as a natural medium for the growth of land plants. Peat,Humus,Soils
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings

Related Publications

Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
November 2013, Bioinformatics (Oxford, England),
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
January 2003, Applied bioinformatics,
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
July 2014, Nucleic acids research,
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
June 2011, BMC bioinformatics,
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
January 1994, Methods in molecular biology (Clifton, N.J.),
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
January 1998, Bioinformatics (Oxford, England),
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
March 2012, BMC bioinformatics,
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
January 1986, Bulletin of mathematical biology,
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
February 2016, Bioinformatics (Oxford, England),
Shuji Suzuki, and Masanori Kakuta, and Takashi Ishida, and Yutaka Akiyama
April 2002, BMC bioinformatics,
Copied contents to your clipboard!