Mølgaard, Lasse Lohilahti2; Hansen, Lars Kai3; Larsen, Jan3
1 Department of Informatics and Mathematical Modeling, Technical University of Denmark2 Department of Applied Mathematics and Computer Science, Technical University of Denmark3 Copenhagen Center for Health Technology, Center, Technical University of Denmark
Wikipedia has significant potential in music information retrieval research. In this work we analyze the of the link structure in the musical Wikipedia. Wikipedia links differ in certain ways from links on the Web at large. There are an over-abundance of internal links in Wikipedia, links are generated automatically, and they may even maliciously be used to promote certain topics. Wikipedia has been analyzed recently using methods fromWeb and text mining, however, the fact the link structure is different from the Web’s makes this approach questionable. To better understand the link structure and specifically to test the level of consistency of links and page content we perform Probabilistic Latent Semantic Analysis to extract topics from Wikipedia articles. The PLSA model is used to quantify how articles are related. The PLSA-based similarity of documents is then used to evaluate the semantic relevance of the actual links. Our analysis highlights the diversity of Wikipedia links and we conclude that semantic analysis could be a useful tool for Wikipedia.