Libprotoident is a library that performs application layer protocol identification for flows. Unlike many techniques that require capturing the entire packet payload, only the first four bytes of payload sent in each direction, the size of the first payload-bearing packet in each direction and the TCP or UDP port numbers for the flow are used by libprotoident. Libprotoident features a very simple API that is easy to use, enabling developers to quickly write code that can make use of the protocol identification rules present in the library without needing to know anything about the applications they are trying to identify.
Continued working on wdcap4. The overall structure is in place and I'm now adding and testing features one at a time. So far, I've got snapping, direction tagging, VLAN stripping and BPF filtering all working. Checksum validation is working for IP and TCP; just need to test it for other protocols.
Still adding and updating protocols in libprotoident. The biggest win this week was being able to identify Shuijing (Crystal): a protocol for operating a CDN using P2P.
Helped Brendon roll out the latest develop code for ampsave, NNTSC, ampy and amp-web to skeptic. This brings skeptic in line with what is running on prophet and will allow us to upgrade the public amplets without their measurement data being rejected.
Noticed a bug in my Plateau parameter evaluation which meant that Time Series Variability changes were being included in the set of Plateau events. Removing those meant that my results were a lot saner. The best set of parameters now gives a 83% precision rating and the average delay is now below 5 minutes. Started on a similar analysis for the next detector -- the Changepoint detector.
Continued updating libprotoident. I've managed to capture a few days of traffic from the University now, so that is introducing some new patterns that weren't present in my previous dataset. Added new rules for MongoDB, DOTA2, Line and BMDP.
Still having problems with long duration captures being interrupted, either by the DAG dropping packets or by the RT protocol FIFO filling up. This prompted me to start working on WDCap4: the parallel libtrace edition. It's a complete re-write from scratch so I am taking the time to carefully consider every feature that currently exists in WDCap and deciding whether we actually need it or whether we can do it better.
Made a video demonstrating BSOD with the current University capture point. The final cut can be seen at https://www.youtube.com/watch?v=kJlDY0XvbA4
Alistair King got in touch and requested that libwandio be separated from libtrace so that he can release projects that use libwandio without having libtrace as a dependency as well. With his help, this was pretty straightforward so now libwandio has a separate download page on the WAND website.
Continued my investigation into optimal Plateau detector parameters. Used my web-app to classify ~230 new events in a morning (less than 5 of which qualified as significant) and merged those results back into my original ground truth. Re-ran the analysis comparing the results for each parameter configuration against the updated ground truth. I've now got an "optimal" set of parameters, although the optimal parameters still only achieve 55% precision and 60% recall.
Poked around at some more unknown flows while waiting for the Plateau analysis to run. Managed to identify some new BitTorrent and eMule clients and also added two new protocols: BDMP and Trion games.
Continued digging into the unknown traffic in the day-long Waikato trace I captured last week. Diminishing returns are starting to really kick in now, but I've still managed to add another 9 new protocols (including SPDY) and improved the rules for a further 8.
Worked on a series of scripts to process the results of running the Plateau detector using a variety of different possible configurations (e.g. history and trigger buffer sizes, sensitivity thresholds etc). The aim is to find the optimal set of parameters based on the ground truth we already have. Of course, some parameter combinations are going to produce events that we have never seen before so I've also had to write code to find these events and generate suitable graphs so I can use my web-app to quickly manually classify them appropriately.
Spent a fair bit of time helping Yindong with his experiments.
Continued working on adding new rules to libprotoident based on unknown flows seen with the new Waikato capture. Since getting access to fresh traffic, I've added 12 new protocols and improved the rules for another 13 existing ones.
Some of the more notable protocols that I've added are QUIC, SPDY, WeChat, Git and Speedtest. Also added a rule for the AMP throughput test, as this is one of the biggest contributors of "Unknown" traffic.
Captured a full weekday of traffic to use as a basis for working out how regularly we can take permanent captures and what sort of duration we can reasonably expect to capture for. A single day is around 116 GB (snapped and compressed). To put this in context, ~100 days of similar capture from 2007 was 491 GB -- a little over 4 days worth of traffic now.
More work on the dashboard this week:
* added the ability to remove "common" events from the recent event list and made the graphs collapsible.
* added a table that shows the most frequently occuring events in the past day, e.g. "increased latency from A to B (ipv4)".
* polished up some of the styling on the dashboard and moved the dashboard-specific CSS (of which there is now quite a lot) into its own separate file.
Started thinking about how to include loss-related events in the event groups, as these are ignored at the moment.
The new capture point came online on Wednesday, so the rest of my week was spent playing with the packet captures. This involved:
* learning to operate EndaceVision.
* installing wdcap on the vDAG VM.
* adding the ability to anonymise only the local network in wdcap.
* performing a short test capture.
* getting BSOD working again, which required the application of a little "in-flow" packet sampling to run smoothly.
* running libprotoident against the test capture to see what new rules I can add.
Brad managed to track down a newer video card for quarterpounder, so now BSOD is up and running again.
Added Meena's lpicollector to our github so now I can finally deprecate the lpi_live tool that comes with libprotoident. Spent a bit of time updating some documentation and reworking the example client scripts so that everything is a bit easier to use. Also fixed a couple of memory bugs that I may have introduced last time I worked on the collector.
Continued working with the new event groups. Found a problem where I was incorrectly preferring shorter AS path segments over longer ones when determining whether I could remove a group for being redundant. Having fixed that, many event groups now cover several ASNs so I've redesigned the event list on the dashboard to be better at displaying multiple AS names.
The source code for both BSOD and Meenakshee Mungro's reliable libprotoident collector have been added to the WAND github page. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
This is the first time we have released the libprotoident collector under the GPLv3 license. This project is a replacement for the lpi_live tool included with libprotoident, which should now be considered deprecated.
We're also more than happy to consider pull requests for code that adds useful features to either project.
WAND on GitHub
Finished updating NNTSC to deal with traceroute data. The new QueryBuilder code should make query construction a bit less convoluted within the NNTSC dbselect module. Everything seems to work OK in basic testing, so it's now just a matter of migrating over one of our production setups and seeing what breaks.
Continued working through the events on amp.wand.net.nz, looking at events for streams that fall in the 25-100ms and the 300+ms ranges. Results still look very promising overall. Tried to fix another common source of insignificant events (namely a single very large spike that moves our mean so much that subsequent "normal" measurements are treated as slightly abnormal due to their distance from the new mean) but without any tangible success.
Moved libtrace and libprotoident from svn to git and put the repositories up on github. This should make the projects more accessible, particularly to the increasing number of people who want to add support for various formats and protocols. It should also make life easier for me when it comes to pushing out bug fixes to people having specific problems and merging in code contributed by our users.
The source code for both our libtrace and libprotoident libraries is now available on GitHub. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
We're also more than happy to consider pull requests for code that adds useful features or support for new protocols / trace formats to our libraries.
Look out for more of our open-source projects to make their way onto GitHub soon!
Started going through all the NNTSC exporting code and replacing any instances of blocking sends with non-blocking alternatives. This should ultimately make both NNTSC and netevmon more stable when processing large amounts of historical data. It is also proving a good opportunity to tidy up some of this code, which had gotten a little ropey with all the hacking done on it leading up to NZNOG.
Spent a decent chunk of my week catching up on various support requests. Had two separate people email about issues with BSOD on Friday.
Wrote a draft version of this year's libtrace assignment for 513. I've changed it quite a bit from last years, based on what the students managed to achieve last year. The assignment itself should require a bit more work this time around, but should be easily doable in just C rather than requiring the additional learning curve of the STL. It should also be much harder to just rip off the examples :)
Read through the full report on a study into traffic classifier accuracy that evaluated libprotoident along with a bunch of other classifiers ( http://vbn.aau.dk/files/179043085/TBU_Extended_dpi_report.pdf ). Pleased to see that libprotoident did extremely well in the cases where it would be expected to do well, i.e. non-web applications.
Managed to write libprotoident rules for a couple of new applications, WeChat and Funshion. Released a new version of libprotoident (2.0.7).
Added support for the AMP DNS test to NNTSC, netevmon and amp-web. Wrote a new detector that looks for changes in response codes, e.g. the DNS response going from NOERROR to REFUSED or some other error state. This should also be useful for the HTTP test in the future.
Fixed a bug in the ChangepointDetector where it wasn't dealing well with streams that featured large values (i.e. >100,000). Also spent a bit more time tweaking the Plateau detector, mainly dealing with problems that show up when either the mean or the standard deviation are very small.
This release adds support for 14 new protocols including League of Legends, WhatsApp, Funshion, Minecraft, Kik and Viber. A new category for Caching has also been added.
A further 13 protocols have had their rules refined and improved including Steam, BitTorrent UDP, RDP, RTMP and Pando.
This release also fixes the bug where flows were erroneously being classified as No Payload, despite payload being present.
The full list of changes can be found in the libprotoident ChangeLog.
Short week due to remaining in Aus for a holiday after LCN.
Upon my return, I spent a bit of time trying to capture traffic for WhatsApp and other mobile messaging services. I had earlier found some flows that were possibly WhatsApp in some traffic I had captured before going away and wanted to confirm it.
It turned out to be a bit trickier to get this traffic than originally anticipated. WhatsApp required a mobile phone number to register an account so we needed to acquire a couple of new 2degrees SIM cards and receive the confirmation text messages on them. Also, the Android VM that we had created for this purpose wouldn't install WhatsApp because the image was intended for a tablet rather than a phone so we had to use Blue Stacks instead.
I also captured traffic for Kik, another similar application, and found that we were erroneously classifying Kik traffic as Apple Push notifications as they both use SSL on port 5223. Fortunately, some very subtle differences in the SSL handshake allowed me to write a rule that could reliably identify Kik traffic. Also tried to capture GroupMe traffic but could not reliably receive the text message required to register an account.
Spent most of Friday going over events reported by the Plateau detector in netevmon and made a number of tweaks which should hopefully make it quicker to pick up on obvious changes in latency time series as well as more reliable than before.
Spent most of the week in Sydney attending the LCN 2013 conference. Gave my presentation in the Workshop on Network Measurement to little fanfare.
Learned a few things at the conference:
* Named Data Networking exists and some people are taking it seriously: http://named-data.net/ . My first thoughts were that we've had enough trouble getting people to adopt a new IP version, let alone a system that completely changes how routers work.
* Lots of people are still using ns-2 to validate their research.
* The bar for publication is pretty low in some conferences / workshops, as long as you do something "innovative".
Spent most of the week preparing for my Sydney trip. Wrote the talk I will be presenting this coming Thursday and gave a practice rendition on Friday.
The rest of my time was spent fixing minor issues in Cuz -- trying not to break anything major before I go away for a week. Replaced the bad SQLAlchemy code in the ampy netevmon engine with some psycopg2 code, which should make us slightly more secure. Also tweaked some of the event display stuff on the dashboard so that useful information is displayed in a sensible format, i.e. less '|' characters all over the place.
Had a useful meeting with Lightwire on Wednesday. Was pleased to hear that their general impression of our software is good and will start working towards making it more useful to them over the summer.
Open-source payload-based trafﬁc classifiers are frequently used as a source of ground truth in the trafﬁc classification research ﬁeld. However, there have been no comprehensive studies that provide evidence that the classifications produced by these software tools are sufﬁciently accurate for this purpose. In this paper, we present the results of an investigation into the accuracy of four open-source trafﬁc classifiers (L7 Filter, nDPI, libprotoident and tstat) using packet traces captured while using a known selection of common Internet applications, including streaming video, Steam and World of Warcraft. Our results show that nDPI and libprotoident provide the highest accuracy among the evaluated trafﬁc classiﬁers, whereas L7 Filter is unreliable and should not be used as a source of ground truth.
To be published at the Workshop on Network Measurements (WNM 2013), October 2013.
Copyright (C) IEEE 2013.
Made a number of minor changes to my paper on open-source traffic classifiers in response to reviewer comments.
Modified the NNTSC exporter to inform clients of the frequency of the datapoints it was returning in response to a historical data request. This allows ampy to detect missing data and insert None values appropriately, which will create a break in the time series graphs rather than drawing a straight line between the points either side of the area covered by the missing data. Calculating the frequency was a little harder than anticipated, as not every stream records a measurement frequency (and that frequency may change, e.g. if someone modifies the amp test schedule) and the returned values may be binned anyway, at which point the original frequency is not suitable for determining whether a measurement is missing.
Added support for the remaining LPI metrics to NNTSC, ampy and amp-web. We are now drawing graphs for packet counts, flow counts (both new and peak concurrent) and users (both active and observed), in addition to the original byte counts. Not detecting any events on these yet, as these metrics are very different to what we have at the moment so a bit of thought will have to go into which detectors we should use for each metric.
Added support for the Libprotoident byte counters that we have been collecting from the red cable network to netevmon, ampy and amp-web. Now we can visualise the different protocols being used on the network and receive event alerts whenever someone does something out of the ordinary.
Replaced the dropdown list code in amp-web with a much nicer object-oriented approach. This should make it a lot easier to add dropdown lists for future NNTSC collections.
Managed to get our Munin graphs showing data using a Mbps unit. This was trickier than anticipated, as Munin sneakily divides the byte counts it gets from SNMP by its polling interval but this isn't very prominently documented. It took a little while for myself, Cathy and Brad to figure out why our numbers didn't match those being reported by the original Munin graphs.
Chased down and fixed a libtrace bug where converting a trace from any ERF format (including legacy) to PCAP would result in horrendously broken timestamps on Mac OS X. It turned out that the __BYTE_ORDER macro doesn't exist on BSD systems and so we were erroneously treating the timestamps as big endian regardless of what byte order the machine actually had.
Migrated wdcap and the LPI collector to use the new libwandevent3
Changed the NNTSC exporter to create a separate thread for each client rather than trying to deal with them all asynchronously. This alleviates the problem where a single client could request a large amount of history and prevent anyone else from connecting to the exporter until that request was served. Also made NNTSC and netevmon behave more robustly when a data source disappears -- rather than halting, they will now periodically try to reconnect so I don't have to restart everything from scratch when I want to apply changes to one component.
Finally, my paper on comparing the accuracy of various open-source traffic classifiers was accepted for WNM 2013. There's a few minor nits to possibly tidy up but it shouldn't require too much work to get camera-ready.
There is currently an increasing demand for accurate and reliable traffic classiﬁcation techniques. Libprotoident is a library developed at the WAND Network Research Group (WAND) that uses four bytes of payload for the classiﬁcation of ﬂows. Testing has shown that Libprotoident achieves similar classiﬁcation accuracy to other approaches, while also being more efficient in terms of speed and memory usage. However, the primary weakness of Libprotoident is that it lacks the visualisation component required to encourage adoption of the library.
This report describes the implementation of a reliable real-time collector for Libprotoident that will form the back-end component to support a web-based visualisation of the statistics produced by the library. The collector has been designed and implemented to support the classiﬁcation of ﬂows and exporting of application usage statistics to multiple clients over the network in separate threads, whilst operating asynchronously so as to achieve high performance when measuring multi-gigabit networks.