We present PubSearch, a hybrid heuristic scheme for re-ranking academic papers retrieved from standard digital libraries such as the ACM Portal. The scheme is based on the hierarchical combination of a custom implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper’s index terms with each other. We designed and developed a meta-search engine that submits user queries to standard digital repositories of academic publications and re-ranks the repository results using the hierarchical heuristic scheme. We evaluate our proposed reranking scheme via user feedback against the results of ACM Portal on a total of 58 different user queries specified from 15 different users. The results show that our proposed scheme significantly outperforms ACM Portal in terms of retrieval precision as measured by most common metrics in Information Retrieval including Normalized Discounted Cumulative Gain (NDCG), Expected Reciprocal Rank (ERR) as well as a newly introduced lexicographic rule (LEX) of ranking search results. In particular, PubSearch outperforms ACM Portal by more than 77% in terms of ERR, by more than 11% in terms of NDCG, and by more than 907.5% in terms of LEX. We also re-rank the top-10 results of a subset of the original 58 user queries produced by Google Scholar, Microsoft Academic Search, and ArnetMiner; the results show that PubSearch compares very well against these search engines as well. The proposed scheme can be easily plugged in any existing search engine for retrieval of academic publications.
Information Processing and Management, 2013, Vol 49, Issue 6, p. 1326-1343
Academic search; Search and retrieval; Heuristic document re-ranking