An Integrated Cluster Detection, Optimization, and Interpretation Approach for Financial Data. 2022

Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu

In many financial applications, such as fraud detection, reject inference, and credit evaluation, detecting clusters automatically is critical because it helps to understand the subpatterns of the data that can be used to infer user's behaviors and identify potential risks. Due to the complexity of human behaviors and changing social environments, the distributions of financial data are usually complex and it is challenging to find clusters and give reasonable interpretations. The goal of this study is to develop an integrated approach to detect clusters in financial data, and optimize the scope of the clusters such that the clusters can be easily interpreted. Specifically, we first proposed a new cluster quality evaluation criterion, which is free from large-scale computation and can guide base clustering algorithms such as k -Means to detect hyperellipsoidal clusters adaptively. Then, we designed a new solver for a revised support vector data description model, which efficiently refines the centroids and scopes of the detected clusters to make the clusters tighter such that the data in the clusters share greater similarities, and thus, the clusters can be easily interpreted with eigenvectors. Using ten financial datasets, the experiments showed that the proposed algorithm can efficiently find reasonable number of clusters. The proposed approach is suitable for large-scale financial datasets whose features are meaningful, and also applicable to financial mining tasks, such as data distribution interpretation and anomaly detection.

UI MeSH Term Description Entries
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465 Algorithms A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task. Algorithm
D016000 Cluster Analysis A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both. Clustering,Analyses, Cluster,Analysis, Cluster,Cluster Analyses,Clusterings

Related Publications

Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
January 2005, Gene expression,
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
November 2003, Biotechnology and bioengineering,
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
February 2005, Bioinformatics (Oxford, England),
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
March 2011, Nucleic acids research,
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
May 2015, Microarrays (Basel, Switzerland),
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
September 1975, Clinical chemistry,
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
January 2016, Cancer informatics,
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
January 2002, Biomedical sciences instrumentation,
Tie Li, and Gang Kou, and Yi Peng, and Philip S Yu
November 2022, Biological procedures online,
Copied contents to your clipboard!