Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to dierent models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(n log n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory.
Kdd '13 Proceedings of the 19th Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 2013, p. 275-283
streaming, influence probabilities, social networks