1 Human Genetics, Department of Clinical Research, Det Sundhedsvidenskabelige Fakultet, SDU2 Department of Clinical Research, Det Sundhedsvidenskabelige Fakultet, SDU3 Epidemiology, Biostatistics and Biodemography, Department of Public Health, Det Sundhedsvidenskabelige Fakultet, SDU4 Human Genetics, Department of Clinical Research, Det Sundhedsvidenskabelige Fakultet, SDU5 Epidemiology, Biostatistics and Biodemography, Department of Public Health, Det Sundhedsvidenskabelige Fakultet, SDU
Microarray is a powerful technique used extensively for gene expression analysis. Different technologies are available, but lack of standardization makes it challenging to compare and integrate data. Furthermore, batch-related biases within datasets are common but often not tackled. We have analyzed the same 234 breast cancers on two different microarray platforms. One dataset contained known batch-effects associated with the fabrication procedure used. The aim was to assess the significance of correcting for systematic batch-effects when integrating data from different platforms. We here demonstrate the importance of detecting batch-effects and how tools, such as ComBat, can be used to successfully overcome such systematic variations in order to unmask essential biological signals. Batch adjustment was found to be particularly valuable in the detection of more delicate differences in gene expression. Furthermore, our results show that prober adjustment is essential for integration of gene expression data obtained from multiple sources. We show that high-variance genes are highly reproducibly expressed across platforms making them particularly well suited as biomarkers and for building gene signatures, exemplified by prediction of estrogen-receptor status and molecular subtypes. In conclusion, the study emphasizes the importance of utilizing proper batch adjustment methods when integrating data across different batches and platforms.