Libprotoident is a library that performs application layer protocol identification for flows. Unlike many techniques that require capturing the entire packet payload, only the first four bytes of payload sent in each direction, the size of the first payload-bearing packet in each direction and the TCP or UDP port numbers for the flow are used by libprotoident. Libprotoident features a very simple API that is easy to use, enabling developers to quickly write code that can make use of the protocol identification rules present in the library without needing to know anything about the applications they are trying to identify.
Continued digging into the unknown traffic in the day-long Waikato trace I captured last week. Diminishing returns are starting to really kick in now, but I've still managed to add another 9 new protocols (including SPDY) and improved the rules for a further 8.
Worked on a series of scripts to process the results of running the Plateau detector using a variety of different possible configurations (e.g. history and trigger buffer sizes, sensitivity thresholds etc). The aim is to find the optimal set of parameters based on the ground truth we already have. Of course, some parameter combinations are going to produce events that we have never seen before so I've also had to write code to find these events and generate suitable graphs so I can use my web-app to quickly manually classify them appropriately.
Spent a fair bit of time helping Yindong with his experiments.
Continued working on adding new rules to libprotoident based on unknown flows seen with the new Waikato capture. Since getting access to fresh traffic, I've added 12 new protocols and improved the rules for another 13 existing ones.
Some of the more notable protocols that I've added are QUIC, SPDY, WeChat, Git and Speedtest. Also added a rule for the AMP throughput test, as this is one of the biggest contributors of "Unknown" traffic.
Captured a full weekday of traffic to use as a basis for working out how regularly we can take permanent captures and what sort of duration we can reasonably expect to capture for. A single day is around 116 GB (snapped and compressed). To put this in context, ~100 days of similar capture from 2007 was 491 GB -- a little over 4 days worth of traffic now.
More work on the dashboard this week:
* added the ability to remove "common" events from the recent event list and made the graphs collapsible.
* added a table that shows the most frequently occuring events in the past day, e.g. "increased latency from A to B (ipv4)".
* polished up some of the styling on the dashboard and moved the dashboard-specific CSS (of which there is now quite a lot) into its own separate file.
Started thinking about how to include loss-related events in the event groups, as these are ignored at the moment.
The new capture point came online on Wednesday, so the rest of my week was spent playing with the packet captures. This involved:
* learning to operate EndaceVision.
* installing wdcap on the vDAG VM.
* adding the ability to anonymise only the local network in wdcap.
* performing a short test capture.
* getting BSOD working again, which required the application of a little "in-flow" packet sampling to run smoothly.
* running libprotoident against the test capture to see what new rules I can add.
Brad managed to track down a newer video card for quarterpounder, so now BSOD is up and running again.
Added Meena's lpicollector to our github so now I can finally deprecate the lpi_live tool that comes with libprotoident. Spent a bit of time updating some documentation and reworking the example client scripts so that everything is a bit easier to use. Also fixed a couple of memory bugs that I may have introduced last time I worked on the collector.
Continued working with the new event groups. Found a problem where I was incorrectly preferring shorter AS path segments over longer ones when determining whether I could remove a group for being redundant. Having fixed that, many event groups now cover several ASNs so I've redesigned the event list on the dashboard to be better at displaying multiple AS names.
The source code for both BSOD and Meenakshee Mungro's reliable libprotoident collector have been added to the WAND github page. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
This is the first time we have released the libprotoident collector under the GPLv3 license. This project is a replacement for the lpi_live tool included with libprotoident, which should now be considered deprecated.
We're also more than happy to consider pull requests for code that adds useful features to either project.
WAND on GitHub
Finished updating NNTSC to deal with traceroute data. The new QueryBuilder code should make query construction a bit less convoluted within the NNTSC dbselect module. Everything seems to work OK in basic testing, so it's now just a matter of migrating over one of our production setups and seeing what breaks.
Continued working through the events on amp.wand.net.nz, looking at events for streams that fall in the 25-100ms and the 300+ms ranges. Results still look very promising overall. Tried to fix another common source of insignificant events (namely a single very large spike that moves our mean so much that subsequent "normal" measurements are treated as slightly abnormal due to their distance from the new mean) but without any tangible success.
Moved libtrace and libprotoident from svn to git and put the repositories up on github. This should make the projects more accessible, particularly to the increasing number of people who want to add support for various formats and protocols. It should also make life easier for me when it comes to pushing out bug fixes to people having specific problems and merging in code contributed by our users.
The source code for both our libtrace and libprotoident libraries is now available on GitHub. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
We're also more than happy to consider pull requests for code that adds useful features or support for new protocols / trace formats to our libraries.
Look out for more of our open-source projects to make their way onto GitHub soon!
Started going through all the NNTSC exporting code and replacing any instances of blocking sends with non-blocking alternatives. This should ultimately make both NNTSC and netevmon more stable when processing large amounts of historical data. It is also proving a good opportunity to tidy up some of this code, which had gotten a little ropey with all the hacking done on it leading up to NZNOG.
Spent a decent chunk of my week catching up on various support requests. Had two separate people email about issues with BSOD on Friday.
Wrote a draft version of this year's libtrace assignment for 513. I've changed it quite a bit from last years, based on what the students managed to achieve last year. The assignment itself should require a bit more work this time around, but should be easily doable in just C rather than requiring the additional learning curve of the STL. It should also be much harder to just rip off the examples :)
Read through the full report on a study into traffic classifier accuracy that evaluated libprotoident along with a bunch of other classifiers ( http://vbn.aau.dk/files/179043085/TBU_Extended_dpi_report.pdf ). Pleased to see that libprotoident did extremely well in the cases where it would be expected to do well, i.e. non-web applications.
Managed to write libprotoident rules for a couple of new applications, WeChat and Funshion. Released a new version of libprotoident (2.0.7).
Added support for the AMP DNS test to NNTSC, netevmon and amp-web. Wrote a new detector that looks for changes in response codes, e.g. the DNS response going from NOERROR to REFUSED or some other error state. This should also be useful for the HTTP test in the future.
Fixed a bug in the ChangepointDetector where it wasn't dealing well with streams that featured large values (i.e. >100,000). Also spent a bit more time tweaking the Plateau detector, mainly dealing with problems that show up when either the mean or the standard deviation are very small.
This release adds support for 14 new protocols including League of Legends, WhatsApp, Funshion, Minecraft, Kik and Viber. A new category for Caching has also been added.
A further 13 protocols have had their rules refined and improved including Steam, BitTorrent UDP, RDP, RTMP and Pando.
This release also fixes the bug where flows were erroneously being classified as No Payload, despite payload being present.
The full list of changes can be found in the libprotoident ChangeLog.
Short week due to remaining in Aus for a holiday after LCN.
Upon my return, I spent a bit of time trying to capture traffic for WhatsApp and other mobile messaging services. I had earlier found some flows that were possibly WhatsApp in some traffic I had captured before going away and wanted to confirm it.
It turned out to be a bit trickier to get this traffic than originally anticipated. WhatsApp required a mobile phone number to register an account so we needed to acquire a couple of new 2degrees SIM cards and receive the confirmation text messages on them. Also, the Android VM that we had created for this purpose wouldn't install WhatsApp because the image was intended for a tablet rather than a phone so we had to use Blue Stacks instead.
I also captured traffic for Kik, another similar application, and found that we were erroneously classifying Kik traffic as Apple Push notifications as they both use SSL on port 5223. Fortunately, some very subtle differences in the SSL handshake allowed me to write a rule that could reliably identify Kik traffic. Also tried to capture GroupMe traffic but could not reliably receive the text message required to register an account.
Spent most of Friday going over events reported by the Plateau detector in netevmon and made a number of tweaks which should hopefully make it quicker to pick up on obvious changes in latency time series as well as more reliable than before.
Spent most of the week in Sydney attending the LCN 2013 conference. Gave my presentation in the Workshop on Network Measurement to little fanfare.
Learned a few things at the conference:
* Named Data Networking exists and some people are taking it seriously: http://named-data.net/ . My first thoughts were that we've had enough trouble getting people to adopt a new IP version, let alone a system that completely changes how routers work.
* Lots of people are still using ns-2 to validate their research.
* The bar for publication is pretty low in some conferences / workshops, as long as you do something "innovative".
Spent most of the week preparing for my Sydney trip. Wrote the talk I will be presenting this coming Thursday and gave a practice rendition on Friday.
The rest of my time was spent fixing minor issues in Cuz -- trying not to break anything major before I go away for a week. Replaced the bad SQLAlchemy code in the ampy netevmon engine with some psycopg2 code, which should make us slightly more secure. Also tweaked some of the event display stuff on the dashboard so that useful information is displayed in a sensible format, i.e. less '|' characters all over the place.
Had a useful meeting with Lightwire on Wednesday. Was pleased to hear that their general impression of our software is good and will start working towards making it more useful to them over the summer.
Open-source payload-based trafﬁc classifiers are frequently used as a source of ground truth in the trafﬁc classification research ﬁeld. However, there have been no comprehensive studies that provide evidence that the classifications produced by these software tools are sufﬁciently accurate for this purpose. In this paper, we present the results of an investigation into the accuracy of four open-source trafﬁc classifiers (L7 Filter, nDPI, libprotoident and tstat) using packet traces captured while using a known selection of common Internet applications, including streaming video, Steam and World of Warcraft. Our results show that nDPI and libprotoident provide the highest accuracy among the evaluated trafﬁc classiﬁers, whereas L7 Filter is unreliable and should not be used as a source of ground truth.
To be published at the Workshop on Network Measurements (WNM 2013), October 2013.
Copyright (C) IEEE 2013.
Made a number of minor changes to my paper on open-source traffic classifiers in response to reviewer comments.
Modified the NNTSC exporter to inform clients of the frequency of the datapoints it was returning in response to a historical data request. This allows ampy to detect missing data and insert None values appropriately, which will create a break in the time series graphs rather than drawing a straight line between the points either side of the area covered by the missing data. Calculating the frequency was a little harder than anticipated, as not every stream records a measurement frequency (and that frequency may change, e.g. if someone modifies the amp test schedule) and the returned values may be binned anyway, at which point the original frequency is not suitable for determining whether a measurement is missing.
Added support for the remaining LPI metrics to NNTSC, ampy and amp-web. We are now drawing graphs for packet counts, flow counts (both new and peak concurrent) and users (both active and observed), in addition to the original byte counts. Not detecting any events on these yet, as these metrics are very different to what we have at the moment so a bit of thought will have to go into which detectors we should use for each metric.
Added support for the Libprotoident byte counters that we have been collecting from the red cable network to netevmon, ampy and amp-web. Now we can visualise the different protocols being used on the network and receive event alerts whenever someone does something out of the ordinary.
Replaced the dropdown list code in amp-web with a much nicer object-oriented approach. This should make it a lot easier to add dropdown lists for future NNTSC collections.
Managed to get our Munin graphs showing data using a Mbps unit. This was trickier than anticipated, as Munin sneakily divides the byte counts it gets from SNMP by its polling interval but this isn't very prominently documented. It took a little while for myself, Cathy and Brad to figure out why our numbers didn't match those being reported by the original Munin graphs.
Chased down and fixed a libtrace bug where converting a trace from any ERF format (including legacy) to PCAP would result in horrendously broken timestamps on Mac OS X. It turned out that the __BYTE_ORDER macro doesn't exist on BSD systems and so we were erroneously treating the timestamps as big endian regardless of what byte order the machine actually had.
Migrated wdcap and the LPI collector to use the new libwandevent3
Changed the NNTSC exporter to create a separate thread for each client rather than trying to deal with them all asynchronously. This alleviates the problem where a single client could request a large amount of history and prevent anyone else from connecting to the exporter until that request was served. Also made NNTSC and netevmon behave more robustly when a data source disappears -- rather than halting, they will now periodically try to reconnect so I don't have to restart everything from scratch when I want to apply changes to one component.
Finally, my paper on comparing the accuracy of various open-source traffic classifiers was accepted for WNM 2013. There's a few minor nits to possibly tidy up but it shouldn't require too much work to get camera-ready.
There is currently an increasing demand for accurate and reliable traffic classiﬁcation techniques. Libprotoident is a library developed at the WAND Network Research Group (WAND) that uses four bytes of payload for the classiﬁcation of ﬂows. Testing has shown that Libprotoident achieves similar classiﬁcation accuracy to other approaches, while also being more efficient in terms of speed and memory usage. However, the primary weakness of Libprotoident is that it lacks the visualisation component required to encourage adoption of the library.
This report describes the implementation of a reliable real-time collector for Libprotoident that will form the back-end component to support a web-based visualisation of the statistics produced by the library. The collector has been designed and implemented to support the classiﬁcation of ﬂows and exporting of application usage statistics to multiple clients over the network in separate threads, whilst operating asynchronously so as to achieve high performance when measuring multi-gigabit networks.
Spent a little time reviewing my old YouTube paper in preparation for discussing it in 513.
Tracked down and fixed a few outstanding bugs in my new and improved anomaly_ts. The main problem was with my algorithm for keeping a running update of the median -- I had a rather obscure bug when inserting a new value that was between the two values I was averaging to calculate the median that was causing all sorts of problems.
Added an API to ampy for querying the event database. This will hopefully allow us to add little event markers on our time series graphs. Also integrated my code for querying data for Munin time series into ampy.
Churned out a revised version of my L7 filter paper for the IEEE Workshop on Network Measurements. I have repositioned the paper as an evaluation of open-source payload-based traffic classifers rather than a critique of L7 filter. I also spent a fair chunk of time replacing my nice pass-fail system for representing results with the exact accuracy numbers because apparently reviewers found the former confusing.
Tried to continue my work in tidying up and releasing various trace sets, but ran into some problems with my rsyncs being flooded out over the faculty network. This was quite a nuisance so we need to be more careful in future about how we move traces around (despite it not really being our fault!).
Very short week this week, but managed to get a few little things sorted.
Added a new dataparser to NNTSC for reading the RRDs used by Munin, a program that Brad is using to monitor the switches in charge of our red cables. The data in these RRDs is a lot noisier than smokeping data, so it will be interesting to see how our anomaly detection goes with that data. Also finally got the AMP data actually being exported to our anomaly detector - the glue program that converted NNTSC data into something that can be read by anomaly_ts wasn't parsing AMP records properly.
Spent a bit of time working on adding some new rules to libprotoident to identify previously unknown traffic in some traces sent to me by one of our users.
Spent Friday afternoon talking with Brian Trammell about some mutual interests, in particular passive measurement of TCP congestion window state and large-scale measurement data collection, storage and access. In terms of the latter, it looks many of the design decisions we have reached with NNTSC are very similar to those that he had reached with mPlane (albeit mPlane is a fair bit more ambitious than what we are doing) which I think was pretty reassuring for both sides. Hopefully we will be able to collaborate more in this space, e.g. developing translation code to make our data collection compatible with mPlane.
Exporting from NNTSC is now back to a functional state and the whole event detection chain is back online. Added table and view descriptions for more complicated AMP tests; traceroute, http2 and udpstream are now all present. Hopefully we can get new AMP collecting and reporting data for these tests soon so we can test whether it actually works!
Had some user-sourced libtrace patches come in, so spent a bit of time integrating these into the source tree and testing the results. One simply cleans up the libpacketdump install directory to not create as many useless or unused files (e.g. static libraries and versioned library symlinks). The other adds support for the OpenBSD loopback DLT, which is actually a real nuisance because OpenBSD isn't entirely consistent with other OS's as to the values of some DLTs.
Helped Nathan with some TCP issues that Lightwire were seeing on a link. Was nice to have an excuse to bust out tcptrace again...
Looks like my L7 Filter paper is going to be rejected. Started thinking about ways in which it can be reworked to be more palatable, maybe present it as a comparative evaluation of open-source traffic classifiers instead.