1 Department of Computer Science, Faculty of Science, Københavns Universitet2 The APL Section, Department of Computer Science, Faculty of Science, Københavns Universitet3 Microsoft Research4 Rutgers - The State University of New Jersey, New Brunswick5 Tel Aviv University6 AT&T Labs–Research7 Rutgers - The State University of New Jersey, New Brunswick8 Tel Aviv University9 Department of Computer Science, Faculty of Science, Københavns Universitet
Abstract Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.
Journal of Computer and System Sciences, 2014, Vol 80, Issue 7, p. 1214-1244
NetFlow; Data streams; Random sampling; IP flows; Subpopulation queries; Flow size distribution