User login

Shane Alcock's Blog

22

Sep

2011

On the first of September this year, the New Zealand Government's Copyright Amendment Act (more colloquially known as the "Skynet law") came into effect. Briefly, the Act promises harsh penalties for Internet users who download copyrighted content illegally, culminating in the cancellation of their Internet account. This law unsurprisingly received a lot of media attention in New Zealand and there were conflicting accounts as to whether the law was having any effect on traffic levels (http://arstechnica.com/tech-policy/news/2011/09/nz-traffic-down-as-three...).

I therefore decided that this called for a quick spot of Internet measurement. I used a passive monitor that we have located inside the core network of a New Zealand ISP to capture traces of several days worth of traffic in early September. I've now started running the traces through an analysis program based on libprotoident to investigate the application protocols being used by the ISP customers, with a particular focus on P2P (which is what the Act is targeting).

The first graphs I produced turned out to be very interesting. This graph shows the inbound (i.e. originating from hosts outside the ISP's customer ranges) traffic mix for the September trace set, broken down by application category.

As a comparison, this graph shows the same traffic mix for a trace set captured from the same ISP in January this year.

We see that the proportion of traffic that is P2P (the orange segment) has decreased quite noticeably in the September dataset compared with earlier in the year. It is hard to say for certain whether this is a direct consequence of the new law, but this is a promising result nonetheless. Certainly it is enough to encourage me to start looking into this a bit further - expect more updates soon as I get more results!

19

Sep

2011

Released libtrace 3.0.12 on Monday.

Continued running performance tests for the libtrace paper. Downloaded a few pcap traces from the MAWI archive and found that libtrace was surprisingly slow at processing them. After doing some profiling, I realised the problem was actually with my test program, which was not the most efficient. Because the processing thread was so slow, it was not spending enough time reading data that the decompression thread had written into the I/O buffer. This meant that the buffer filled up and the decompression came to a halt temporarily.

Fixing the test program solved the problem, but this raises an interesting point: the effectiveness of libtrace I/O is directly tied to the user's ability to write an efficient program. It'd be nice if this was not such a major factor.

Started looking into sFlow on Friday, with an eye towards developing a way to read and write sFlow raw packet records (possibly with libtrace).

Went to Auckland on Wednesday morning to chat with Jason and Josh from Vineyard Networks. Very interesting meeting and definitely looks like there will be some opportunities to work together in the future, especially if we have students keen on getting involved in application identification.

12

Sep

2011

Libtrace 3.0.12 has been released.

This release adds a new tool called tracetopends which can be used to identify the endpoints that are contributing the most traffic in a trace. We've also improved the general performance of the protocol decoding code and fixed a few obscure bugs in that area as well.

The full list of changes in this release can be found in the libtrace ChangeLog.

You can download the new version of libtrace from the libtrace website.

12

Sep

2011

Continued working on the new libtrace paper. Most of the content is now complete and now running a whole series of performance tests using my various test programs. The results so far are pretty promising - libtrace is looking a clear winner in terms of runtime.

Added a new tool to libtrace called tracetopends which reports the busiest endpoints in a given trace, where an endpoint can be defined as a MAC, IPv4 or IPv6 address. Hopefully it should make Chris happy.

Started doing some prep for a new libtrace release to go with the paper. Updated the ChangeLog and documented the new tools properly.

05

Sep

2011

Managed to spend most of my week listening to talks:
* Matthew's practice talks on Tuesday
* Honours conference on Wednesday
* Matthew's first interview talk + more practicing on Thursday
* Matthew's second interview talk and the other competing candidates on Friday

Also, I attempted to improve the performance of my scapy program without any success. It still takes over 2 days to process a single day of Waikato trace!

Started working on a Libtrace paper version 3 - this one is going to be a lot longer and more explicit in describing the issues with trace processing that are solved by libtrace.

Reviewed a paper on modeling YouTube traffic for Computer Communications. Most of the paper replicated the analysis I had done in my own YouTube paper and the eventual model struck me as being overly simplistic, so I was not able to recommend it for publication.

29

Aug

2011

Finished fixing up my longitudinal study graphs, although I did discover that the direction tagging was the wrong way around for one of the datasets. Have re-run that analysis and updated the graphs accordingly.

Returned to my evaluation of other trace processing libraries. Managed to write a libnetdude program that replicated the results produced by the other programs, although it still cannot read from any sort of pipe without segfaulting. Also had to write my own IPv6 protocol plugin because libnetdude does not provide one. Tested it with an uncompressed pcap trace - it was the slowest of all the C libraries, despite not needing to decompress.

Started working with the python library, Scapy. Annoyingly, Scapy does not provide any mechanism for getting the header at a specific layer - instead you have to check for the existence of a specific protocol header that you're interested in. Scapy has also proved to be incredibly slow - I cannot believe anyone would use it for analysing anything except the most trivially small trace sets.

22

Aug

2011

Updated most of the webpages for the longitudinal study to include new graphs for the ISP data - http://www.wand.net.nz/~salcock/longitude/ . There are still a few missing or broken graphs, but most of it is there now.

Started developing the libnetdude version of the scan analysis program. Seems libnetdude doesn't support reading from stdin, which is going to make reading my compressed ERF trace tricky...

At home sick from Tuesday - Friday.

15

Aug

2011

Determined that the reason libpcap was outperforming libtrace when running the scan analysis was because we were CPU-bound rather than IO-bound. This meant that the faster IO of libtrace was not providing enough gain to cancel out the overhead of the libtrace function calls (compared with the direct pointer manipulation I was doing in my pcap program).

As a result, I decided to also test the libraries by doing a simple packet/byte count for each TCP and UDP port which would turn out to be IO-bound instead. In this case, libtrace was much faster than libpcap. Also implemented the two analyses using libcoral and ruby-libtrace. Libcoral was both slower and required more LOC than libtrace for both tests. Ruby-libtrace required less code for the port count (but more for the scan study, as I needed to write bindings for the flow management library I was using) but was waaaayyyy slower to run.

Finally finished running the longitudinal analysis on the various ISP traces and started working on adding the resulting graphs to my webpages. Decided that the ISP C time series graphs would be best done by plotting each year separately with an X-axis defined by the date minus the year, e.g. http://www.wand.net.nz/~salcock/longitude/graphs/icmp/icmp_in_ispc.png . This, of course, involved reworking a decent chunk of my graph generation scripts...

08

Aug

2011

Short week this week, as I was in Wellington on Thursday and Friday.

Managed to get Bro running and producing results that I could replicate with a libtrace program. Found that Bro was tracking TCP state incorrectly - it would often describe a TCP flow as both established and closed correctly when, in fact, no SYNs were observed at all. Reported the bug to the Bro team and decided to use my state classifications from now on.

Wrote a libpcap program that was equivalent to the libtrace program to compare the performance of the two. Surprisingly, the "zcat | dagconvert | libpcap" run was quite a bit faster than the "libtrace" equivalent. Profiled the libtrace program and managed to find a couple of opportunities for speeding things up, mostly through increased caching. The libpcap program is still slightly faster now, but the gap has closed significantly.

01

Aug

2011

Started getting some results from processing various Auckland and ISP traces - found one or two bugs along the way, so some re-processing has been necessary again.

Finished automating the graphing part of the analysis.

Continued working on an AS-level analysis for the trace data. Reading the routeviews BGP data is still not going well - it works in the general case but sooner or later you end up hitting a record or update that doesn't make sense and the whole thing segfaults.

Received reviews for the rejected libtrace paper. In response, I've started looking into replicating the simple Allman / Paxson study that originally used Bro for extract the required packet and flow properties. The current plan is replicate the study using each of the packet processing libraries mentioned by reviewers as equivalent to libtrace and prove once and for all that those libraries are nowhere near as good as libtrace.