General purpose computer-assisted clustering and conceptualization. 2011

Justin Grimmer, and Gary King
Department of Political Science, Stanford University, Encina Hall West, 616 Serra Street, Palo Alto, CA 94305, USA.

We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an "insightful" or "useful" way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given dataset (along with millions of other solutions we add based on combinations of existing clusterings) and enable a user to explore and interact with it and quickly reveal or prompt useful or insightful conceptualizations. In addition, although it is uncommon to do so in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than expert human coders or many existing fully automated methods.

UI MeSH Term Description Entries
D008490 Medical Informatics The field of information science concerned with the analysis and dissemination of medical data through the application of computers to various aspects of health care and medicine. Clinical Informatics,Medical Computer Science,Medical Information Science,Computer Science, Medical,Health Informatics,Health Information Technology,Informatics, Clinical,Informatics, Medical,Information Science, Medical,Health Information Technologies,Informatics, Health,Information Technology, Health,Medical Computer Sciences,Medical Information Sciences,Science, Medical Computer,Technology, Health Information
D010363 Pattern Recognition, Automated In INFORMATION RETRIEVAL, machine-sensing or identification of visible patterns (shapes, forms, and configurations). (Harrod's Librarians' Glossary, 7th ed) Automated Pattern Recognition,Pattern Recognition System,Pattern Recognition Systems
D002965 Classification The systematic arrangement of entities in any field into categories classes based on common characteristics such as properties, morphology, subject matter, etc. Systematics,Taxonomy,Classifications,Taxonomies
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D001185 Artificial Intelligence Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language. AI (Artificial Intelligence),Computer Reasoning,Computer Vision Systems,Knowledge Acquisition (Computer),Knowledge Representation (Computer),Machine Intelligence,Computational Intelligence,Acquisition, Knowledge (Computer),Computer Vision System,Intelligence, Artificial,Intelligence, Computational,Intelligence, Machine,Knowledge Representations (Computer),Reasoning, Computer,Representation, Knowledge (Computer),System, Computer Vision,Systems, Computer Vision,Vision System, Computer,Vision Systems, Computer
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings

Related Publications

Justin Grimmer, and Gary King
May 1962, Science (New York, N.Y.),
Justin Grimmer, and Gary King
November 2023, Communications biology,
Justin Grimmer, and Gary King
June 1962, Annals of the New York Academy of Sciences,
Justin Grimmer, and Gary King
January 1994, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society,
Justin Grimmer, and Gary King
September 1984, Journal of neuroscience methods,
Justin Grimmer, and Gary King
January 1995, M.D. computing : computers in medical practice,
Justin Grimmer, and Gary King
January 2023, Frontiers in bioinformatics,
Justin Grimmer, and Gary King
January 1967, Texas reports on biology and medicine,
Justin Grimmer, and Gary King
March 1969, The Journal of physiology,
Justin Grimmer, and Gary King
June 2004, IEEE transactions on bio-medical engineering,
Copied contents to your clipboard!