Googling the Internet Unconstrained Endpoint Profiling IonutTrestian, SupranamayaRanjan, AlekandarKuzmanovic, Antonio Nucci Reviewed by Lee Young Soo
Introduction • Obtaining ‘raw’ packet trace from operational networks can be very hard. • Accurately classifying in an online fashion at high speeds is an inherently hard problem.
Unconstrained Endpoint Profiling • Introduction of a novel methodology. • No operational traces are available • Packet-level traces are available • Sampled flow-level traces are available • Internet access trend analysis for four world regions.
Methodology • Rule Generation • Querying Google using a sample ‘seed set’ ofrandom IP address from the networks in four world regions. • Constrain top N keywords that could be meaningfully used for endpoint classification.
Methodology • Web Classifier • Rapid URL search • Hit text search • Example URL : www.robtex.com/dns/32.net.ru.html
Methodology • IP tagging • URL based tagging • General hit text based tagging • Hit text based tagging for Forums • Post-date & username is in the vicinity of the IP address =>forum user • Presence of following keywords :http:\, ftp:\, ppstream:\, mms:\ => http share, ftp share, streaming node
Methodology • Examples • 184.108.40.206-inforum.insite.com • URL based tagging • 220.127.116.11-ttzai.com • Hit text based tagging for Forum
Information come from • Web logs • Proxy logs • Forums • Malicious list • Server list • P2P communication
Evaluation • When No Traces are Available. • When Packet-Level Trace are Available. • When Sampled Trace are Available.
When No Traces are Available • Applying the unconstrained endpoint approach on a subset of the IP range belonging to four ISPs shown in above table.
When Packet-Level Trace are Available • Collect most popular 5% of IP address and tag them by applying the methodology. • Use this information to classify the traffic flow.
When Sampled Trace are Available • Due to sampling, insufficient amount of data remains in the trace, and hence the graphlets approach simply does not work. • Popular endpoint are still present in the trace, despite sampling.
When Sampled Trace are Available • Endpoint approach remains largely unaffected by sampling.
Endpoint Profiling • Endpoint Clustering • Employ clustering in networking has been done before : Autoclass algorithm. • A set of tagged IP addresses from region’s network Input to the endpoint clustering algorithm.
Endpoint Profiling • Browsing, browsing and chat or mail seems to be most common behavior.
Endpoint Profiling • Traffic Locality
Conclusion • UEP • Accurately predict application and protocol usage trends when no network traces are available. • Dramatically out perform when packet traces are available. • Retain high classification capabilities when flow-level traces are available. • Profile endpoints residing at four different world regions. • Network applications and protocols used in these region. • Characteristics of endpoint classes that share similar access patterns. • Clients’ locality properties.