User login

Search Projects

Project Members

Shane Alcock admin

Libtrace

Libtrace is a library for both capturing and processing packet traces. It supports a variety of common trace formats, including pcap, ERF, live DAG capture, native Linux and BSD sockets, TSH and legacy ERF formats. Libtrace also supports reading and writing using several different compression formats, including gzip, bzip2 and lzo. Libtrace uses a multi-threaded approach for decompressing and compressing trace files to improve trace processing performance on multi-core CPUs.

The libtrace API provides functions for accessing the headers in a packet directly,
up to and including the transport header.

Libtrace can also output packets using any supported output trace format, including
pcap, ERF, DAG transmit and native sockets.

Libtrace is bundled with several tools for performing common trace processing and analysis tasks. These include tracesplit, tracemerge, traceanon, tracepktdump and tracereport (amongst others).

17

Feb

2014

Continued redevelopment of the NNTSC exporting code to be more robust and reliable. Replaced the live data pipes used by the dataparsers to push live data to the exporter with a RabbitMQ queue, which seems to be working well.

Modified the way that subscribing to streams worked to try and solve a problem we were having where data that arrived while historical data was being queried was not being pushed out to interested clients. Now, we store any live data that arrives for streams that are still being queried and push that out as soon as we get an indication from the query thread that the query has finished.

Unfortunately, we can still miss historical measurements if they haven't been committed to the database at the time when the final query begins. This often crops up if netevmon is resubscribing after NNTSC has been restarted, resulting in us missing out on the last historical measurement before the subscribe message arrives. Still looking for an elegant way to solve this one.

Added a version check message to the NNTSC protocol. This message is sent by the server as soon as a client connects and the client API has been updated to require its internal version to match the one received from the server. If not, the client stops and prints a message telling the user to update their client API. This should be helpful to the students who were previously getting weird broken behaviour with no apparent explanation whenever I made an API change to the production NNTSC on prophet.

Chased down a build issue with libtrace on FreeBSD 10. Turns out we had made the dist tarball with an old version of libtool which was stupidly written to never build shared libraries if the OS matched FreeBSD 1* (because FreeBSD 1.X didn't support shared libraries). Easy enough to fix, I just have to remember to make the libtrace distribution on something other than Debian Squeeze. Will start working on a new libtrace release in the near future so I don't keep getting emails from FreeBSD users.

10

Feb

2014

Started going through all the NNTSC exporting code and replacing any instances of blocking sends with non-blocking alternatives. This should ultimately make both NNTSC and netevmon more stable when processing large amounts of historical data. It is also proving a good opportunity to tidy up some of this code, which had gotten a little ropey with all the hacking done on it leading up to NZNOG.

Spent a decent chunk of my week catching up on various support requests. Had two separate people email about issues with BSOD on Friday.

Wrote a draft version of this year's libtrace assignment for 513. I've changed it quite a bit from last years, based on what the students managed to achieve last year. The assignment itself should require a bit more work this time around, but should be easily doable in just C rather than requiring the additional learning curve of the STL. It should also be much harder to just rip off the examples :)

Read through the full report on a study into traffic classifier accuracy that evaluated libprotoident along with a bunch of other classifiers ( http://vbn.aau.dk/files/179043085/TBU_Extended_dpi_report.pdf ). Pleased to see that libprotoident did extremely well in the cases where it would be expected to do well, i.e. non-web applications.

05

Sep

2013

Tidied up a lot of the javascript within amp-web. Moved all of the external scripts (i.e. stuff not developed by us) into a separate lib directory and ensured that everything used consistent and specific terminology.

Added config options to amp-web for specifying the location of the netevmon and amp meta-data databases. Previously we had assumed these were on the local machine, which proved troublesome when Brad tried to get Cuz running on warlock.

Capped the maximum range of the summary graph to prevent users from zooming out into empty space.

Fixed some byte-ordering bugs in libpacketdump's RadioTap and 802.11 header parsing on big endian architectures.

05

Aug

2013

Added support for the AMP ICMP collection to ampy and amp-web, so we are now able to plot graphs of the test data Brendon has been collecting.

Spent a decent chunk of an afternoon working through the DPDK build system with Richard S., trying to make the DPDK libraries build as position-independent code so that we can link libtrace against them nicely.

Reworked a large amount of code in amp-web to move the collection-specific code out of the core source files and into separate little modules for each collection. This means that the core code should be much easier to follow and work on. Adding support for new collections should also be simpler and require less inside knowledge of how the whole system works.

05

Jul

2013

Added support for the Libprotoident byte counters that we have been collecting from the red cable network to netevmon, ampy and amp-web. Now we can visualise the different protocols being used on the network and receive event alerts whenever someone does something out of the ordinary.

Replaced the dropdown list code in amp-web with a much nicer object-oriented approach. This should make it a lot easier to add dropdown lists for future NNTSC collections.

Managed to get our Munin graphs showing data using a Mbps unit. This was trickier than anticipated, as Munin sneakily divides the byte counts it gets from SNMP by its polling interval but this isn't very prominently documented. It took a little while for myself, Cathy and Brad to figure out why our numbers didn't match those being reported by the original Munin graphs.

Chased down and fixed a libtrace bug where converting a trace from any ERF format (including legacy) to PCAP would result in horrendously broken timestamps on Mac OS X. It turned out that the __BYTE_ORDER macro doesn't exist on BSD systems and so we were erroneously treating the timestamps as big endian regardless of what byte order the machine actually had.

Migrated wdcap and the LPI collector to use the new libwandevent3

Changed the NNTSC exporter to create a separate thread for each client rather than trying to deal with them all asynchronously. This alleviates the problem where a single client could request a large amount of history and prevent anyone else from connecting to the exporter until that request was served. Also made NNTSC and netevmon behave more robustly when a data source disappears -- rather than halting, they will now periodically try to reconnect so I don't have to restart everything from scratch when I want to apply changes to one component.

Finally, my paper on comparing the accuracy of various open-source traffic classifiers was accepted for WNM 2013. There's a few minor nits to possibly tidy up but it shouldn't require too much work to get camera-ready.

29

Jun

2013

Spent the week working on a Intel DPDK capture format for libtrace. This has involved a lot of trial and error testing what works and doesn't, finding where I'm hitting limits etc. Capturing packets is working well however certain aspects such as Timestamping packets and handling MAC checksums are not as straight forward. I discussed with Shane the best approach to work around these problems; decided it was best to take the safer approach that requires the least modifications to DPDK itself (at least for default behavior).

28

Jun

2013

Had a week of catching up on a few jobs I had put off in lieu of getting NNTSC, netevmon and amp2 ready for the Lightwire release.

Re-worked BSOD server to use a separate thread for communicating with clients, so that the packets can be sent to clients immediately rather than waiting for a break in the input stream. Unfortunately, this hasn't stopped the bursty appearance of packets on the client like I had hoped, so this requires further investigation. I suspect the flow management inside BSOD server isn't as optimal as it could be and may end up replacing this with libflowmanager.

With that in mind, I've modified libflowmanager to support multiple flow expiry 'plugins', as opposed to having a single defined expiry policy that all libflowmanager programs had to use. This will allow us to replicate BSOD's old expiry policy (flows expire after 20 seconds of inactivity) if we want to, although I would probably see how it goes with the classic libflowmanager policy first.

Received some bug reports for libtrace from Matt Brown as a result of Mayhem being run against the entirety of Debian. Perry had more or less patched them right away so I worked on releasing a new version of libtrace incorporating those fixes. The new release went out on Friday and also includes the rawerf fix from several weeks back. Had a few issues with both Fedora and FreeBSD that slowed down the testing process, so the release process took a bit longer than anticipated.

28

Jun

2013

Libtrace 3.0.18 has been released.

This release fixes several bugs that have been reported in 3.0.17. In particular, this release fixes several crash bugs in the libtrace tools that were reported by the Mayhem team at Carnegie Mellon University. It also addresses a rare bug where the compression auto-detection could trigger a false positive on uncompressed ERF traces by including a new format URI (rawerf:) that can be used to force libtrace to treat the traces as uncompressed. We have also tightened up the compression auto-detection somewhat to reduce the likelihood of the bug occurring.

It is highly recommended that you explicitly use the rawerf: format if you are working with large numbers of uncompressed ERF traces.

The full list of changes in this release can be found in the libtrace ChangeLog.

You can download the new version of libtrace from the libtrace website.

20

May

2013

Spent much of my week working on getting BSOD ready to be wheeled out at Open Day once again. During this process, I managed to find and fix a couple of bugs in the server that were now causing nasty crashes. I also tracked down a bug in the client where the UI elements aren't redrawn properly if the window is resized. Normally this hasn't been a big problem, but newer versions of Gnome like to try and silently resize full-screen apps and this meant that our UI was disappearing off the bottom of the screen. As an interim fix, I've disabled resizing in BSOD client but we really should be trying to handle resize events properly.

Received a bug report for libtrace about the compression detection occasionally giving a false positive for uncompressed ERF traces. This is because the ERF header has no identifying 'magic' at the start, so every now and again the first few bytes (where the timestamp is stored) end up matching the bytes we use to identify a gzip header. I've strengthened the gzip check to use an extra byte so the chance of this happening now is 1 in 16 million. I've also added a special URI format called rawerf: so users can force libtrace to treat traces as uncompressed ERF.

Started working on trying to get amp-web to plot graphs of interface byte counts. I've managed to draw a line on the graph, but much of the graph styling is still using the smokeping style. I'm now looking at rewriting the javascript for the graph styling to be a bit more generic and configurable, rather than having one (mostly copied) javascript file for each of our metrics.

Friday was mostly consumed with looking after our displays at Open Day. BSOD continued to impress quite a few people and we were reasonably busy most of the day, so it seemed a worthwhile exercise.

29

Apr

2013

Finished up the 513 marking (eventually!) and released the marks to the students.

Released a new version of libtrace -- 3.0.17.

Started working on releasing some new public trace sets. Waikato 8 is now available on WITS and the DSL traffic from our 2009 ISP traces will hopefully soon follow. In the process, I found a couple of little glitches in traceanon that I was able to fix before the libtrace release.

Decided that our anomaly detection code does not handle time series that switch from constant to noisy and back again particularly well. A classic example is latency to Google: during working hours it is noisy, but it is constant other times. We detect the switch, but only after a long time. I would like to detect this change sooner and report it as an event (although not necessarily alert on it). I've started looking into an alternative method of detecting the change in time series style based on a pair of sliding windows: one for the last hour, one for the previous 12 hours before that. It is working better, but is currently a bit too sensitive to the effect of an individual outlier.

24

Apr

2013

Libtrace 3.0.17 has finally been released.

This release adds some new convenience functions to the libtrace API and fixes a number of bugs, many of which have been reported by users.

The major changes in this release are:
* Added API functions for getting the IP address from a packet as a string.
* Added API functions for calculating packet checksums at the IP and transport layers.
* Fixed major bug where the event API was not working with int: inputs.
* Fixed broken checksum calculations in tracereplay.
* Fixed bug where IP headers embedded inside ICMP messages were not being anonymised by traceanon.
* Added API support for working with ICMPv6 headers.

The full list of changes in this release can be found in the libtrace ChangeLog.

You can download the new version of libtrace from the libtrace website.

22

Apr

2013

Fixed the bugs in the anomaly_ts / eventing chain that I introduced last week. We're back reporting events again on the web dashboard.

Wrote ampy modules for retrieving smokeping and munin data from NNTSC so that Brendon could plot graphs of those time series. Doing this showed up some (more) problems in the graphing which Brendon eventually tracked down to being related to how aggregation was being performed within the NNTSC database.

Spent a large chunk of my week marking the 513 libtrace assignment. It is a much bigger class than previous years (over 30 students) so it was pretty time consuming to mark. In general, it was pleasing to see most students had gotten the basics of passive measurement worked out and hopefully they got some valuable experience from it. My biggest disappointment was how many students didn't read the instructions carefully -- especially those who missed the requirement to write original programs rather than blindly copying huge chunks of the example code.

03

Apr

2013

Exporting from NNTSC is now back to a functional state and the whole event detection chain is back online. Added table and view descriptions for more complicated AMP tests; traceroute, http2 and udpstream are now all present. Hopefully we can get new AMP collecting and reporting data for these tests soon so we can test whether it actually works!

Had some user-sourced libtrace patches come in, so spent a bit of time integrating these into the source tree and testing the results. One simply cleans up the libpacketdump install directory to not create as many useless or unused files (e.g. static libraries and versioned library symlinks). The other adds support for the OpenBSD loopback DLT, which is actually a real nuisance because OpenBSD isn't entirely consistent with other OS's as to the values of some DLTs.

Helped Nathan with some TCP issues that Lightwire were seeing on a link. Was nice to have an excuse to bust out tcptrace again...

Looks like my L7 Filter paper is going to be rejected. Started thinking about ways in which it can be reworked to be more palatable, maybe present it as a comparative evaluation of open-source traffic classifiers instead.

18

Mar

2013

Short week this week, as I spent Thursday and Friday in Wellington at the cricket.

Wrote an assignment on libtrace for 513, along with some "model" answers.

Continued reading and editing Meenakshee's report.

Had a vigorous discussion with Brendon about what he needs the NNTSC export protocol to do to support his AMP graphing needs. Turns out the protocol needs a couple of new features, namely binning/aggregation and a no-more-data indicator, which I started working on adding. So far, this has mostly involved taking some of the working code from my anomaly detector feeder program, which is an NNTSC client, and turning it into a NNTSC client API.

Put out a request to our past students for their Honours reports so that they can be published on the website. Thanks to those who have responded.

11

Mar

2013

Added a data parser module to NNTSC to process the tunnel user count data that we got from Lightwire. Managed to get the data going all the way through to the event detection program which spat out a ton of events. Spent a bit of time combing through them manually to see whether the reported events were actually worth reporting -- in a lot of cases they weren't, so I've refined the old Plateau and Mode algorithms a bit to hopefully resolve the issues. I also employed the Plunge detector on all time series types, rather than just libprotoident data, and this was useful in reporting the most interesting behaviours in the tunnel user data (i.e. all the users disappearing).

Added a couple of new features to the libtrace API. The first was the ability to ask libtrace to give you the source or destination IP address as a string. This is quite handy because normally processing IP addresses in libtrace involves messing around with sockaddrs which is not particularly n00b-friendly. The second API feature was the ability to ask libtrace to calculate the checksum at either layer 3 or 4 based on the current packet contents. This was already done (poorly) inside the tracereplay tool, but is now part of the libtrace API. This is quite useful for checksum validation or if you've modified the packet somehow (e.g. modified the IP addresses) and want to recalculate the checksum to match.

Also spent a decent bit of time reading over chapters from Meenakshee's report and offering plenty of constructive criticism.

04

Mar

2013

The NNTSC export protocol is complete now and happily exports live data to any clients that have subscribed to data streams that are being collected. Using this, I've been able to get the anomaly detection tool chain working with our SmokePing data right up to the eventing phase. Fixed a minor bug in the eventing code that would result in badly-formed event groups if the events do not strictly arrive in chronological order (which can happen if you are working with multiple streams of historical data).

Fixed a few libtrace bugs this week - the main one being trace_event being broken for int: inputs. It was just a matter of the callback function being registered inside the wrong #ifdef block but took a little while to track down.

Spent the latter part of my week tidying up my libtrace slides in preparation for a week of teaching 513 later this month.

07

Jan

2013

Just a lonely two day week while everyone else was still on holiday.

Released a new version of libtrace (3.0.16) - now Richard's ring buffer code is out amongst the wide world and hopefully our users won't find too many bugs in it.

Got back into writing my paper on L7 Filter. Most of the content is there now, although I'm not entirely convinced that the way I have structured the paper is quite right. It's much more readable the way I have it now, but it looks more like a bulleted list than a technical paper.

Meenakshee's LPI collector worked pretty well running on some trace files over the break, which was pleasing. Next step is to get it working on our newly functional ISP capture point. Tested the capture point out by running some captures over the weekend - aside from a bug in the direction tagging everything looks good, so we have at least one working capture point.

03

Jan

2013

Libtrace 3.0.16 has been released.

This release includes the new ring: format which is a much more efficient version of the existing int: format. More details on how ring: works and how much better it is than int: can be found here.

People currently using int: are encouraged to give ring: a try - at best, there should be no obvious difference between the two aside from your program using a lot less CPU time. If there are problems, bugs or strange behaviour, please let us know (email contact at wand.net.nz) so we can fix it in the next release.

This release also fixes the problems that occur when trying to capture packets using 'pcapint:any' as input and write them to disk using a different (i.e. non pcap) format and the double free bug that would occur when calling trace_destroy after using trace_event to read packets from a trace file.

The full list of changes in this release can be found in the libtrace ChangeLog.

You can download the new version of libtrace from the libtrace website.

10

Dec

2012

Libtrace:
Managed to get native BPF socket capture exporting correctly over the RT protocol. Changed the build system to make it possible to export captures taken using a native socket interface over RT to a machine running a different OS to the capture host, e.g. capture using Linux Native, export to a FreeBSD box.

WDCap:
WDCap now builds and runs on both Mac OS X and FreeBSD. Also changed the way the disk output module names files, based on some code submitted by Alistair King. You now specify your output filename format using strftime-style conversion modifiers, which offers a bit more flexibility to users rather than them being stuck with our particular file naming convention.

lpi_collector:
Continued working closely with Meenakshee on the new collector. Designed a binary format for exporting our collector messages called the libprotoident collector protocol (or LPICP for short).

L7 Filter:
Finished collecting traces for most of the protocols I wanted to test with L7 Filter and collated the initial results. Wrote a blog post about it (https://secure.wand.net.nz/content/case-against-l7-filter) and started working on a paper.

03

Dec

2012

Back into the swing of things this week. Continued collecting traces of various popular Internet applications to use for validating L7 Filter. So far, L7 Filter is very disappointing - it cannot even correctly classify some basic HTTP flows and often misclassifies SSL traffic as Skype.

Worked with Meenakshee to develop a proper LPI collector that we can run on passive monitors and write live application stats to a database (ideally using Nathan's code). The new collector will use libwandevent and export its results over the network rather than via stdout. To help with this, I extracted the counter / statistic management code from the old lpi_live tool and tidied it up for more general purpose use. Updated lpi_live to use the extracted code.

Spent my spare moments looking over Richard's new ring buffer code for Linux native interfaces in libtrace. In particular, my aim has been test it in situations outside of the standard libtrace paradigm, e.g. using trace_event(), trace_copy_packet() and exporting over the RT protocol.

Alistair from CAIDA has updated libtrace and wdcap for capturing using the BSD native interface (something we never did, so the code was missing or half-assed). I've started integrating his changes back into both code-bases and will also look at the problem of decoding RT packets that were capturing using a native interface that is not supported by the recipient machine, e.g. BPF packets exported to a Linux host.