Clustering multivariate time series using Hidden Markov Models. 2014

Shima Ghassempour, and Federico Girosi, and Anthony Maeder
School of Computing, Engineering and Mathematics, University of Western Sydney, Campbelltown, NSW 2751 , Australia. shima.ghassempour@gmail.com.

In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.

UI MeSH Term Description Entries
D008390 Markov Chains A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system. Markov Process,Markov Chain,Chain, Markov,Chains, Markov,Markov Processes,Process, Markov,Processes, Markov
D003695 Delivery of Health Care The concept concerned with all aspects of providing and distributing health services to a patient population. Delivery of Dental Care,Health Care,Health Care Delivery,Health Care Systems,Community-Based Distribution,Contraceptive Distribution,Delivery of Healthcare,Dental Care Delivery,Distribution, Non-Clinical,Distribution, Nonclinical,Distributional Activities,Healthcare,Healthcare Delivery,Healthcare Systems,Non-Clinical Distribution,Nonclinical Distribution,Activities, Distributional,Activity, Distributional,Care, Health,Community Based Distribution,Community-Based Distributions,Contraceptive Distributions,Deliveries, Healthcare,Delivery, Dental Care,Delivery, Health Care,Delivery, Healthcare,Distribution, Community-Based,Distribution, Contraceptive,Distribution, Non Clinical,Distributional Activity,Distributions, Community-Based,Distributions, Contraceptive,Distributions, Non-Clinical,Distributions, Nonclinical,Health Care System,Healthcare Deliveries,Healthcare System,Non Clinical Distribution,Non-Clinical Distributions,Nonclinical Distributions,System, Health Care,System, Healthcare,Systems, Health Care,Systems, Healthcare
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D015233 Models, Statistical Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc. Probabilistic Models,Statistical Models,Two-Parameter Models,Model, Statistical,Models, Binomial,Models, Polynomial,Statistical Model,Binomial Model,Binomial Models,Model, Binomial,Model, Polynomial,Model, Probabilistic,Model, Two-Parameter,Models, Probabilistic,Models, Two-Parameter,Polynomial Model,Polynomial Models,Probabilistic Model,Two Parameter Models,Two-Parameter Model
D015999 Multivariate Analysis A set of techniques used when variation in several variables are studied simultaneously. In statistics, multivariate analysis is interpreted as any analytic method that allows simultaneous study of two or more dependent variables. Analysis, Multivariate,Multivariate Analyses
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings

Related Publications

Shima Ghassempour, and Federico Girosi, and Anthony Maeder
January 2022, Statistics and computing,
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
September 2023, Biometrics,
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
July 2004, Bioinformatics (Oxford, England),
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
July 2023, Statistical methods in medical research,
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
September 2019, Nature methods,
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
March 2017, Biometrics,
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
June 2014, Bioinformatics (Oxford, England),
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
March 2023, IEEE transactions on neural networks and learning systems,
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
May 2019, Statistical methods in medical research,
Shima Ghassempour, and Federico Girosi, and Anthony Maeder
January 1995, Proceedings. International Conference on Intelligent Systems for Molecular Biology,
Copied contents to your clipboard!