1 Department of Molecular Biology and Genetics - Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Science and Technology, Aarhus University2 Department of Animal Genetics and Breeding, China Agricultural University3 Department of Molecular Biology and Genetics - Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Science and Technology, Aarhus University
This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence of these factors on the accuracy of imputation. Data set 1 consisted of 2,931 reference bulls and 971 test bulls, and was used for validation of imputation from 3,000 markers (3K) to 54,000 markers (54K). Data set 2 contained 341 bulls in the reference set and 117 in the test set, and was used for validation of imputation from 54K to high density [777,000 markers (777K)]. Both test sets were divided into 4 groups according to their relationship to the reference population. Five imputation methods (Beagle, IMPUTE2, findhap, AlphaImpute, and FImpute) were used in this study. Imputation accuracy was measured as the allele correct rate and correlation between imputed and true genotypes. Results demonstrated that the accuracy was lower when imputing from 3K to 54K than from 54K to 777K. Using various imputation methods, the allele correct rates varied from 93.5 to 97.1% when imputing from 3K to 54K, and from 97.1 to 99.3% when imputing from 54K to 777K; IMPUTE2 and Beagle resulted in higher accuracies and were more robust under various conditions than the other 3 methods when imputing from 3K to 54K. The accuracy of imputation using FImpute was similar to those results from Beagle and IMPUTE2 when imputing from 54K to high density, and higher than the remaining 2 methods. The results also showed that a closer relationship between test set and reference set led to a higher accuracy for all the methods. In addition, the correct rate was higher when the minor allele frequency was lower, whereas the correlation coefficient was lower when the minor allele frequency was lower. The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method.
Journal of Dairy Science, 2013, Vol 96, Issue 7, p. 4666-4677