A comparative study of machine learning algorithms applied to predictive toxicology data mining. 2007

Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
Department of Computing, University of Bradford, Bradford, UK. D.Neagu@bradford.ac.uk

This paper reports results of a comparative study of widely used machine learning algorithms applied to predictive toxicology data mining. The machine learning algorithms involved were chosen in terms of their representability and diversity, and were extensively evaluated with seven toxicity data sets which were taken from real-world applications. Some results based on visual analysis of the correlations of different descriptors to the class values of chemical compounds, and on the relationships of the range of chosen descriptors to the performance of machine learning algorithms, are emphasised from our experiments. Some interesting findings relating to the data and the quality of the models are presented--for example, that no specific algorithm appears best for all seven toxicity data sets, and that up to five descriptors are sufficient for creating classification models for each toxicity data set with good accuracy. We suggest that, for a specific data set, model accuracy is affected by the feature selection method and model development technique. Models built with too many or too few descriptors are undesirable, and finding the optimal feature subset appears at least as important as selecting appropriate algorithms with which to build a final model.

UI MeSH Term Description Entries
D010636 Phenols Benzene derivatives that include one or more hydroxyl groups attached to the ring structure.
D011237 Predictive Value of Tests In screening and diagnostic tests, the probability that a person with a positive test is a true positive (i.e., has the disease), is referred to as the predictive value of a positive test; whereas, the predictive value of a negative test is the probability that the person with a negative test does not have the disease. Predictive value is related to the sensitivity and specificity of the test. Negative Predictive Value,Positive Predictive Value,Predictive Value Of Test,Predictive Values Of Tests,Negative Predictive Values,Positive Predictive Values,Predictive Value, Negative,Predictive Value, Positive
D011784 Quail Common name for two distinct groups of BIRDS in the order GALLIFORMES: the New World or American quails of the family Odontophoridae and the Old World quails in the genus COTURNIX, family Phasianidae. Quails
D003621 Daphnia A diverse genus of minute freshwater CRUSTACEA, of the suborder CLADOCERA. They are a major food source for both young and adult freshwater fish. Daphnias
D003627 Data Interpretation, Statistical Application of statistical procedures to analyze specific observed or assumed facts from a particular study. Data Analysis, Statistical,Data Interpretations, Statistical,Interpretation, Statistical Data,Statistical Data Analysis,Statistical Data Interpretation,Analyses, Statistical Data,Analysis, Statistical Data,Data Analyses, Statistical,Interpretations, Statistical Data,Statistical Data Analyses,Statistical Data Interpretations
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D000818 Animals Unicellular or multicellular, heterotrophic organisms, that have sensation and the power of voluntary movement. Under the older five kingdom paradigm, Animalia was one of the kingdoms. Under the modern three domain model, Animalia represents one of the many groups in the domain EUKARYOTA. Animal,Metazoa,Animalia
D001185 Artificial Intelligence Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language. AI (Artificial Intelligence),Computer Reasoning,Computer Vision Systems,Knowledge Acquisition (Computer),Knowledge Representation (Computer),Machine Intelligence,Computational Intelligence,Acquisition, Knowledge (Computer),Computer Vision System,Intelligence, Artificial,Intelligence, Computational,Intelligence, Machine,Knowledge Representations (Computer),Reasoning, Computer,Representation, Knowledge (Computer),System, Computer Vision,Systems, Computer Vision,Vision System, Computer,Vision Systems, Computer
D001516 Bees Insect members of the superfamily Apoidea, found almost everywhere, particularly on flowers. About 3500 species occur in North America. They differ from most WASPS in that their young are fed honey and pollen rather than animal food. Apidae,Apis,Apis mellifera,Apis mellifica,European Honey Bee,Honey Bee Drone,Bee,Bee, European Honey,Drone, Honey Bee,European Honey Bees,Honey Bee Drones,Honey Bee, European
D014337 Trout Various fish of the family SALMONIDAE, usually smaller than salmon. They are mostly restricted to cool clear freshwater. Some are anadromous. They are highly regarded for their handsome colors, rich well-flavored flesh, and gameness as an angling fish. The genera Salvelinus, Salmo, and ONCORHYNCHUS have been introduced virtually throughout the world. Chars,Salvelinus,Char

Related Publications

Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
March 2014, Bone marrow transplantation,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2024, La Clinica terapeutica,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2024, Healthcare informatics research,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2004, SAR and QSAR in environmental research,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2020, PeerJ. Computer science,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2014, BioMed research international,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
April 2019, Sensors (Basel, Switzerland),
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2021, Risk management and healthcare policy,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2021, Journal of healthcare engineering,
Daniel C Neagu, and Gongde Guo, and Paul R Trundle, and Mark T D Cronin
January 2020, Frontiers in bioengineering and biotechnology,
Copied contents to your clipboard!