Biomaterial Database

Bayesian fluorescence in situ hybridisation signal classification. 2004

Boaz Lerner

Pattern Analysis & Machine Learning Lab, Department of Electrical & Computer Engineering, Ben-Gurion University, Beer-Sheva, Israel. boaz@ee.bgu.ac.il

Previous research has indicated the significance of accurate classification of fluorescence in situ hybridisation (FISH) signals for the detection of genetic abnormalities. Based on well-discriminating features and a trainable neural network (NN) classifier, a previous system enabled highly-accurate classification of valid signals and artefacts of two fluorophores. However, since this system employed several features that are considered independent, the naive Bayesian classifier (NBC) is suggested here as an alternative to the NN. The NBC independence assumption permits the decomposition of the high-dimensional likelihood of the model for the data into a product of one-dimensional probability densities. The naive independence assumption together with the Bayesian methodology allow the NBC to predict a posteriori probabilities of class membership using estimated class-conditional densities in a close and simple form. Since the probability densities are the only parameters of the NBC, the misclassification rate of the model is determined exclusively by the quality of density estimation. Densities are evaluated by three methods: single Gaussian estimation (SGE; parametric method), Gaussian mixture model assuming spherical covariance matrices (GMM; semi-parametric method) and kernel density estimation (KDE; non-parametric method). For low-dimensional densities, the GMM generally outperforms the KDE that tends to overfit the training set at the cost of reduced generalisation capability. But, it is the GMM that loses some accuracy when modelling higher-dimensional densities due to the violation of the assumption of spherical covariance matrices when dependent features are added to the set. Compared with these two methods, the SGE and NN provide inferior and superior performance, respectively. However, the NBC avoids the intensive training and optimisation required for the NN, demanding extensive resources and experimentation. Therefore, when supporting these two classifiers, the system enables a trade-off between the NN performance and NBC simplicity of implementation.

UI	MeSH Term	Description	Entries
D011336	Probability	The study of chance processes or the relative frequency characterizing a chance process.	Probabilities
D002863	Chromogenic Compounds	Colorless, endogenous or exogenous pigment precursors that may be transformed by biological mechanisms into colored compounds; used in biochemical assays and in diagnosis as indicators, especially in the form of enzyme substrates. Synonym: chromogens (not to be confused with pigment-synthesizing bacteria also called chromogens).	Chromogenic Compound,Chromogenic Substrate,Chromogenic Substrates,Compound, Chromogenic,Compounds, Chromogenic,Substrate, Chromogenic,Substrates, Chromogenic
D006801	Humans	Members of the species Homo sapiens.	Homo sapiens,Man (Taxonomy),Human,Man, Modern,Modern Man
D000465	Algorithms	A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task.	Algorithm
D001499	Bayes Theorem	A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result.	Bayesian Analysis,Bayesian Estimation,Bayesian Forecast,Bayesian Method,Bayesian Prediction,Analysis, Bayesian,Bayesian Approach,Approach, Bayesian,Approachs, Bayesian,Bayesian Approachs,Estimation, Bayesian,Forecast, Bayesian,Method, Bayesian,Prediction, Bayesian,Theorem, Bayes
D016008	Statistical Distributions	The complete summaries of the frequencies of the values or categories of a measurement made on a group of items, a population, or other collection of data. The distribution tells either how many or what proportion of the group was found to have each value (or each range of values) out of all the possible values that the quantitative measure can have.	Distribution, Statistical,Distributions, Statistical,Statistical Distribution
D016011	Normal Distribution	Continuous frequency distribution of infinite range. Its properties are as follows: 1, continuous, symmetrical distribution with both tails extending to infinity; 2, arithmetic mean, mode, and median identical; and 3, shape completely determined by the mean and standard deviation.	Gaussian Distribution,Distribution, Gaussian,Distribution, Normal,Distributions, Normal,Normal Distributions
D016013	Likelihood Functions	Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters.	Likelihood Ratio Test,Maximum Likelihood Estimates,Estimate, Maximum Likelihood,Estimates, Maximum Likelihood,Function, Likelihood,Functions, Likelihood,Likelihood Function,Maximum Likelihood Estimate,Test, Likelihood Ratio
D016477	Artifacts	Any visible result of a procedure which is caused by the procedure itself and not by the entity being analyzed. Common examples include histological structures introduced by tissue processing, radiographic images of structures that are not naturally present in living tissue, and products of chemical reactions that occur during analysis.	Artefacts,Artefact,Artifact
D016571	Neural Networks, Computer	A computer architecture, implementable in either hardware or software, modeled after biological neural networks. Like the biological system in which the processing capability is a result of the interconnection strengths between arrays of nonlinear processing nodes, computerized neural networks, often called perceptrons or multilayer connectionist models, consist of neuron-like units. A homogeneous group of units makes up a layer. These networks are good at pattern recognition. They are adaptive, performing tasks by example, and thus are better for decision-making than are linear learning machines or cluster analysis. They do not require explicit programming.	Computational Neural Networks,Connectionist Models,Models, Neural Network,Neural Network Models,Neural Networks (Computer),Perceptrons,Computational Neural Network,Computer Neural Network,Computer Neural Networks,Connectionist Model,Model, Connectionist,Model, Neural Network,Models, Connectionist,Network Model, Neural,Network Models, Neural,Network, Computational Neural,Network, Computer Neural,Network, Neural (Computer),Networks, Computational Neural,Networks, Computer Neural,Networks, Neural (Computer),Neural Network (Computer),Neural Network Model,Neural Network, Computational,Neural Network, Computer,Neural Networks, Computational,Perceptron