Abreu, G C G1; Pinheiro, A3; Drummond, R D3; Camargo, S R3; Menossi, M3
1 Biostatistik, Faculty of Agricultural Sciences, Aarhus University, Aarhus University2 Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, Aarhus University, Aarhus University3 unknown
DNA arrays have been a rich source of data for the study of genomic expression of a wide variety of biological systems. Gene clustering is one of the paradigms quite used to assess the significance of a gene (or group of genes). However, most of the gene clustering techniques are applied to cDNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https://ipe.cbmeg.unicamp.br/pub/abreu.gcg. Code implementation in R is in progress.
Advances and Applications in Statistics, 2010, Vol 14, Issue 2, p. 191-204