Phenotype prediction based on genome-wide DNA methylation data. 2014

Thomas Wilhelm
Theoretical Systems Biology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK. Thomas.wilhelm@ifr.ac.uk.

BACKGROUND DNA methylation (DNAm) has important regulatory roles in many biological processes and diseases. It is the only epigenetic mark with a clear mechanism of mitotic inheritance and the only one easily available on a genome scale. Aberrant cytosine-phosphate-guanine (CpG) methylation has been discussed in the context of disease aetiology, especially cancer. CpG hypermethylation of promoter regions is often associated with silencing of tumour suppressor genes and hypomethylation with activation of oncogenes.Supervised principal component analysis (SPCA) is a popular machine learning method. However, in a recent application to phenotype prediction from DNAm data SPCA was inferior to the specific method EVORA. RESULTS We present Model-Selection-SPCA (MS-SPCA), an enhanced version of SPCA. MS-SPCA applies several models that perform well in the training data to the test data and selects the very best models for final prediction based on parameters of the test data.We have applied MS-SPCA for phenotype prediction from genome-wide DNAm data. CpGs used for prediction are selected based on the quantification of three features of their methylation (average methylation difference, methylation variation difference and methylation-age-correlation). We analysed four independent case-control datasets that correspond to different stages of cervical cancer: (i) cases currently cytologically normal, but will later develop neoplastic transformations, (ii, iii) cases showing neoplastic transformations and (iv) cases with confirmed cancer. The first dataset was split into several smaller case-control datasets (samples either Human Papilloma Virus (HPV) positive or negative). We demonstrate that cytology normal HPV+ and HPV- samples contain DNAm patterns which are associated with later neoplastic transformations. We present evidence that DNAm patterns exist in cytology normal HPV- samples that (i) predispose to neoplastic transformations after HPV infection and (ii) predispose to HPV infection itself. MS-SPCA performs significantly better than EVORA. CONCLUSIONS MS-SPCA can be applied to many classification problems. Additional improvements could include usage of more than one principal component (PC), with automatic selection of the optimal number of PCs. We expect that MS-SPCA will be useful for analysing recent larger DNAm data to predict future neoplastic transformations.

UI MeSH Term Description Entries
D010641 Phenotype The outward appearance of the individual. It is the product of interactions between genes, and between the GENOTYPE and the environment. Phenotypes
D002583 Uterine Cervical Neoplasms Tumors or cancer of the UTERINE CERVIX. Cancer of Cervix,Cancer of the Cervix,Cancer of the Uterine Cervix,Cervical Cancer,Cervical Neoplasms,Cervix Cancer,Cervix Neoplasms,Neoplasms, Cervical,Neoplasms, Cervix,Uterine Cervical Cancer,Cancer, Cervical,Cancer, Cervix,Cancer, Uterine Cervical,Cervical Cancer, Uterine,Cervical Cancers,Cervical Neoplasm,Cervical Neoplasm, Uterine,Cervix Neoplasm,Neoplasm, Cervix,Neoplasm, Uterine Cervical,Uterine Cervical Cancers,Uterine Cervical Neoplasm
D005260 Female Females
D006801 Humans Members of the species Homo sapiens. Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D015894 Genome, Human The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs. Human Genome,Genomes, Human,Human Genomes
D018899 CpG Islands Areas of increased density of the dinucleotide sequence cytosine--phosphate diester--guanine. They form stretches of DNA several hundred to several thousand base pairs long. In humans there are about 45,000 CpG islands, mostly found at the 5' ends of genes. They are unmethylated except for those on the inactive X chromosome and some associated with imprinted genes. CpG Clusters,CpG-Rich Islands,Cluster, CpG,Clusters, CpG,CpG Cluster,CpG Island,CpG Rich Islands,CpG-Rich Island,Island, CpG,Island, CpG-Rich,Islands, CpG,Islands, CpG-Rich
D019175 DNA Methylation Addition of methyl groups to DNA. DNA methyltransferases (DNA methylases) perform this reaction using S-ADENOSYLMETHIONINE as the methyl group donor. DNA Methylations,Methylation, DNA,Methylations, DNA
D025341 Principal Component Analysis Mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. Analyses, Principal Component,Analysis, Principal Component,Principal Component Analyses
D027383 Papillomaviridae A family of small, non-enveloped DNA viruses infecting birds and most mammals, especially humans. They are grouped into multiple genera, but the viruses are highly host-species specific and tissue-restricted. They are commonly divided into hundreds of papillomavirus "types", each with specific gene function and gene control regions, despite sequence homology. Human papillomaviruses are found in the genera ALPHAPAPILLOMAVIRUS; BETAPAPILLOMAVIRUS; GAMMAPAPILLOMAVIRUS; and MUPAPILLOMAVIRUS.
D030361 Papillomavirus Infections Neoplasms of the skin and mucous membranes caused by papillomaviruses. They are usually benign but some have a high risk for malignant progression. HPV Infection,Human Papillomavirus Infection,HPV Infections,Human Papillomavirus Infections,Papillomavirus Infection,Papillomavirus Infection, Human,Papillomavirus Infections, Human

Related Publications

Thomas Wilhelm
September 2017, Nucleic acids research,
Thomas Wilhelm
January 2016, Epigenetics & chromatin,
Thomas Wilhelm
January 2010, Wiley interdisciplinary reviews. Systems biology and medicine,
Thomas Wilhelm
June 2022, Communications biology,
Thomas Wilhelm
July 2014, Bioinformatics (Oxford, England),
Copied contents to your clipboard!