Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine-readable index of artist and song titles, identifying nearly all excerpts. We also catalog numerous problems with its integrity, including replications, mislabelings, and distortion.
Proceedings of the Second International Acm Workshop on Music Information Retrieval With User-centered and Multimodal Strategies, 2012, p. 7-12