Our previous work demonstrated the possibility of distinguishing several groups of traffic with accuracy of over 99%. Today, most of the traffic is generated by web browsers, which provide different kinds of services based on the HTTP protocol: web browsing, file downloads, audio and voice streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various types of HTTP traffic based on the content: distributed among volunteers' machines and centralized running in the core of the network. We also assess the accuracy of the centralized classifier for both the HTTP traffic and mixed HTTP/non-HTTP traffic. In the latter case, we achieved the accuracy of 94%. Finally, we provide graphical characteristics of different kinds of HTTP traffic.
Ieee Symposium on Computers and Communications (iscc), 2012, p. 882-887