Libtrace is a library for both capturing and processing packet traces. It supports a variety of common trace formats, including pcap, ERF, live DAG capture, native Linux and BSD sockets, TSH and legacy ERF formats. Libtrace also supports reading and writing using several different compression formats, including gzip, bzip2 and lzo. Libtrace uses a multi-threaded approach for decompressing and compressing trace files to improve trace processing performance on multi-core CPUs.
The libtrace API provides functions for accessing the headers in a packet directly,
up to and including the transport header.
Libtrace can also output packets using any supported output trace format, including
pcap, ERF, DAG transmit and native sockets.
Libtrace is bundled with several tools for performing common trace processing and analysis tasks. These include tracesplit, tracemerge, traceanon, tracepktdump and tracereport (amongst others).
The long-awaited libtrace 4 is now available for public consumption! This version of libtrace includes an all new API that resulted from Richard Sanger's Parallel Libtrace project, which aimed to add the ability to read and process packets in parallel to libtrace. Libtrace can now also better leverage any native parallelism in the packet source, e.g. multiple streams on DAG, DPDK pipelines or packet fanout on Linux interfaces.
At this stage, we are considering the software to be a beta release, so we reserve the right to make any major API-breaking changes we deem necessary prior to a final release but I'm fairly confident that the beta release will be fairly close to the final product. At the same time, now is a good time to try the new API and let us know if there are any problems with it, as it will be difficult to make API changes once libtrace 4 moves out of beta.
Please note that the old libtrace 3 API is still entirely intact and will continue to be supported and maintained throughout the lifetime of libtrace 4. All of your old libtrace 3 programs should still build and run happily against libtrace 4; please let us know if this turns out to not be the case so we can fix it!
Learn about the new API and how parallel libtrace works by reading the Parallel Libtrace HOWTO.
Download the beta release from the libtrace website.
Send any questions, bug reports or complaints to contact [at] wand [dot] net [dot] nz
Fixed the issues with BSD interfaces in parallel libtrace. Ended up implementing a "bucket" data structure for keeping track of buffers that contain packets read from a file descriptor. Each bucket effectively maintains a reference counter that is used to determine when libtrace has finished with all the packets stored in a buffer. When the buffer is no longer needed, it can be freed. This allows us to ensure packets are not freed or overwritten without needing to memcpy the packet out of the buffer it was read into.
Added bucket functionality to both RT and BSD interfaces. After a few initial hiccups, it seems to be working well now.
Continued testing libtrace with various operating systems / configurations. Replaced our old DAG configuration code that uses a deprecated API call to use the CSAPI. Just need to get some traffic on our DAG development box so I can make sure the multiple-stream code works as expected.
Managed to add another two protocols to libprotoident: Google Hangouts and Warthunder.
Finished the parallel libtrace HOWTO guide. Pretty happy with it and hopefully it should ease the learning curve for users who want to move over to the parallel API once released.
Continued working towards the beta release of libtrace4. Started testing on my usual variety of operating systems, fixing any bugs or warnings that cropped up along the way. It looks like there are definitely some issues with using the parallel API with BSD interfaces, so that will need to be resolved before I can do the release.
Now that I've got a full week of Waikato trace, I've been occasionally looking at the output from running lpi_protoident against the whole week and seeing if there are any missing protocols I can identify and add to libprotoident. Managed to add another 6 new protocols this week, including Diablo 3 and Hearthstone.
Met with Rob and Stephen from Endace on Thursday morning and had a good discussion about how we are using the Endace probe and what we can do to get more out of it.
Fixed the bug that was causing my disk-writing wdcap to crash. The problem was a slight incongruity in the record length stored in the ERF header and the amount of actual packet available, so we were touching bad memory occasionally. Since fixing that, wdcap has been happily capturing since Tuesday morning without any crashes or dropped packets.
Continued working towards a releasable libtrace4. Polished up the parallel tools and replaced the old rijndael implementation in traceanon with the libcrypto code from wdcap. Made sure all the examples build nicely and added a full skeleton example that implements most of the common callbacks. Fixed all the outstanding compiler warnings and made sure all the header documentation was correct and coherent.
Started developing a HOWTO guide for parallel libtrace that introduces the new API, one step at a time, using a simple example application. I'm about half-way through writing this.
Added anonymisation to the new wdcap, using libcrypto to generate the AES-encrypted bits that we use as a mask to anonymise IP addresses. This has two major benefits: 1) we don't need to maintain or support our own implementation of AES and 2) libcrypto is much more likely to be able to detect and use AES CPU instructions.
Deployed the new wdcap and libtrace on the Endace box. I've managed to get wdcap running but the client that is writing the packets to disk has a tendency to crash after an hour or two of capture, so there's still a wee way to go before it's ready. The wdcap server that is capturing off the DAG card and exporting to the client has been pretty robust.
Started on polishing up libtrace4 for a beta release. There's still quite a bit of fiddly outstanding work to do before we can release. Fixed the tick bug that I discovered a couple of weeks back and finished up the new callback API that I had proposed to Richard a few months back that he had mostly implemented. Ported tracertstats and tracestats to use the new API.
Continued working on the new wdcap. Wrote and tested both the disk and RT output modules. Re-designed the RT protocol to fix a number of oversights in the original design -- unfortunately this means that RT from wdcap4 is not compatible with libtrace3 and vice-versa.
Added parallel support for RT input to libtrace4 -- prior to this, RT packets were not read in a way that was thread-safe so using RT with the parallel API would result in some major races. The RT packets are now copied into memory owned by the libtrace packet before being returned to the caller; this adds an extra memcpy but ensures concurrent reads won't overwrite each other. Using a well-managed ring buffer could probably get rid of the memcpy, so I'll probably look into that once I've got everything else up and running.
Spent Wednesday at the Honours conference.
Continued working on the new WDCap. Most of my time was spent writing the packet batching code that groups processed packets into batches and then passes them off to the output threads.
Along the way I discovered a bug with parallel libtrace when using ticks with a file that causes the ordered combiner to get stuck. Spent a bit of time with Richard trying to track this one down.
Listened to our Honours student's practice talks on Thursday and Friday afternoon. Overall, was fairly impressed with the projects and what the students had achieved and am looking forward to seeing their improved talks on Wednesday.
Continued working on wdcap4. The overall structure is in place and I'm now adding and testing features one at a time. So far, I've got snapping, direction tagging, VLAN stripping and BPF filtering all working. Checksum validation is working for IP and TCP; just need to test it for other protocols.
Still adding and updating protocols in libprotoident. The biggest win this week was being able to identify Shuijing (Crystal): a protocol for operating a CDN using P2P.
Helped Brendon roll out the latest develop code for ampsave, NNTSC, ampy and amp-web to skeptic. This brings skeptic in line with what is running on prophet and will allow us to upgrade the public amplets without their measurement data being rejected.
Noticed a bug in my Plateau parameter evaluation which meant that Time Series Variability changes were being included in the set of Plateau events. Removing those meant that my results were a lot saner. The best set of parameters now gives a 83% precision rating and the average delay is now below 5 minutes. Started on a similar analysis for the next detector -- the Changepoint detector.
Continued updating libprotoident. I've managed to capture a few days of traffic from the University now, so that is introducing some new patterns that weren't present in my previous dataset. Added new rules for MongoDB, DOTA2, Line and BMDP.
Still having problems with long duration captures being interrupted, either by the DAG dropping packets or by the RT protocol FIFO filling up. This prompted me to start working on WDCap4: the parallel libtrace edition. It's a complete re-write from scratch so I am taking the time to carefully consider every feature that currently exists in WDCap and deciding whether we actually need it or whether we can do it better.
Made a video demonstrating BSOD with the current University capture point. The final cut can be seen at https://www.youtube.com/watch?v=kJlDY0XvbA4
Alistair King got in touch and requested that libwandio be separated from libtrace so that he can release projects that use libwandio without having libtrace as a dependency as well. With his help, this was pretty straightforward so now libwandio has a separate download page on the WAND website.
Continued my investigation into optimal Plateau detector parameters. Used my web-app to classify ~230 new events in a morning (less than 5 of which qualified as significant) and merged those results back into my original ground truth. Re-ran the analysis comparing the results for each parameter configuration against the updated ground truth. I've now got an "optimal" set of parameters, although the optimal parameters still only achieve 55% precision and 60% recall.
Poked around at some more unknown flows while waiting for the Plateau analysis to run. Managed to identify some new BitTorrent and eMule clients and also added two new protocols: BDMP and Trion games.
Short week as I was on leave on Thursday and Friday.
Continued tweaking the event groups produced by netevmon. My main focus has been on ensuring that the start time for a group lines up with the start time of the earliest event in the group. When this doesn't happen, it suggests that there is an incongruity in the logic for updating events and groups based on a new observed detection. Now the problem happens rarely -- which is good from the perspective that I am making progress but it is also bad because it takes a lot longer for a bad group to occur so testing and debugging is much slower.
Spent a bit of time rewriting Yindong's python trace analysis using C++ and libflowmanager. My program was able to run much faster and use a lot less memory, which should mean that wraith won't be hosed for months while Yindong waits for his analysis to run.
Added a new API function to libtrace to strip VLAN and MPLS headers from packets. This makes the packets easier to analyse with BPF filters as you don't need to construct complicated filters to deal with the possible presence of VLAN tags that you don't care about.
Installed libtrace on the Endace probe and managed to get it happily processing packets from a virtual DAG without too much difficulty.
Packet capture is commonly used in networks to monitor the traffic their users are producing. This allows network operators to detect and monitor threats. Libtrace is a library which provides a simple programming interface for the capture and analysis of network packets.
This project aims to implement extend libtrace to support the processing of network packets in parallel, to better libtrace’s performance. We name the solution developed in this project parallel libtrace. An overview of parallel libtrace and the design process is presented. The challenges encountered in the design of parallel libtrace are described in more detail. By designing a user friendly and efficient library we have been able to improve the performance of some libtrace applications.
Spent much of my week keeping an eye on BTM and dealing with new connections as they come online. Had a couple of false starts with the Wellington machine, as the management interface was up but was not allowing any inbound connections. This was finally sorted on Thursday night (turning the modem on and off again did the trick), so much of Friday was figuring out which Wellington connections were working and which were not.
A few of the BTM connections have a lot of difficulty running AMP tests to a few of the scheduled targets: AMP fails to resolve DNS properly for these targets but using dig or ping gets the right results. Did some packet captures to see what was going on: it looks like the answer record appears in the wrong section of the response and I guess libunbound doesn't deal with that too well. The problem seems to affect only connections using a specific brand of modem, so I am imagining there is some bug in the DNS cache software on the modem.
Continued tracing my NNTSC live export problem. It soon became apparent that NNTSC itself was not the problem: instead, the client was not reading data from NNTSC, causing the receive window to fill up and preventing NNTSC from sending new data. A bit of profiling suggested that the HMM detector in netevmon was potentially the problem. After disabling that detector, I was able to keep things running over the long weekend without any problems.
Fixed a libwandio bug in the LZO writer. Turns out that the "compression" can sometimes result in a larger block than the original uncompressed data, especially when doing full payload capture. In that case, you are supposed to write out the original block instead but we were mistakenly writing the compressed block.
Continued hunting for the bug in the NNTSC live exporter with mixed success. I've narrowed it down to definitely being the per-client queue that is the problem and it doesn't appear to be due to any obvious slowness inserting into the queue. Unfortunately, the problem seems to only occur once or twice a day so it takes a day before any changes or additional debugging take effect.
Went back to working on the mechanical Turk app for event detection. Finally finished a tutorial that shows most of the basic event types and how to classify them properly. Got Brendon and Brad to run through the tutorial and tweaked it according to their feedback. The biggest problem is the length of the tutorial -- it takes a decent chunk of our survey time to just run through the tutorial so I'm working on ways to speed it up a bit (as well as event classification in general). These include adding hot-keys for significance rating and using an imagemap to make the "start time" graph clickable.
Spent a decent chunk of my week trying to track down an obscure libtrace bug that affected a couple of 513 students, which would cause the threaded I/O to segfault whenever reading from the larger trace file. Replicating the bug proved quite difficult as I didn't have much info about the systems they were working with. After going through a few VMs, I eventually figured out that the bug was specific to 32 bit little-endian architectures: due to some lazy #includes, the size of an off_t was either 4 to 8 bytes between different parts of the libwandio source code which resulted in some very badly sized reads. The bug was found and fixed a bit too late for those affected students unfortunately.
Continued developing code to group events by common AS path segments. Managed to add an "update tree" function to the suffix tree implementation I was using and then changed it to use ASNs rather than characters to reduce the number of comparisons required. Also developed code to query NNTSC for an AS path based on the source, destination and address family for a latency event, so all of the pieces are now in place.
In testing, I found a problem where live NNTSC exporting would occasionally fall several minutes behind the data that was being inserted in the database. Because this would only happen occasionally (and usually overnight), debugging this problem has taken a very long time. Found a potential cause in a unhandled E_WOULDBLOCK on the client socket so I've fixed that and am waiting to see if that has resolved the problem.
Did some basic testing of libtrace 4 for Richard, mainly trying to build it on the various OS's that we currently support. This has created a whole bunch of extra work for him due the various ways in which pthreads are implemented on different systems. Wrote my first parallel libtrace program on Friday -- there was a bit of a learning curve but I got it working in the end.
Tested and released a new libtrace version (3.0.22). The plan is that this will be the last release of libtrace3 (barring urgent bug fixes) and Richard and I can now focus on turning parallel libtrace into the first libtrace4 release.
Continued working on adding detector configuration to netevmon. Everything seems to be in place and working now, so will look to extend the configuration to cover other aspects of netevmon (e.g. eventing or anomalyfeed) in the near future.
Spent a bit of time on Friday refactoring some of Meena's eventing code to hopefully perform a bit better and avoid attempts to insert duplicate events into the database. The code currently catches the duplicate exception and continues on, but it would be much nicer if we weren't relying on the database to tell us that the event already exists.
Libtrace 3.0.22 has been released today.
This is (hopefully) the final release of libtrace version 3, as we are now turning our attention to preparing to release libtrace 4 a.k.a. 'Parallel Libtrace'.
This release includes the following changes / fixes:
* Added protocol decoding support for GRE and VXLAN.
* DPDK format now supports 1.7.1 and 1.8.0 versions of DPDK.
* DAG format now supports DAG 5.2 libraries.
* Fixed degraded performance introduced to ring: in 3.0.21
* DAG dropped packet count no longer includes packets observed while libtrace was not using the DAG card.
* Fixed bad PCI addressing in DPDK format.
* libwandio now reports an error when reading from a truncated gzip-compressed file, so it is now consistent with zlib-based tools.
The full list of changes in this release can be found in the libtrace ChangeLog.
You can download the new version of libtrace from the libtrace website.
Generated some fresh DS probabilities based on the results of the new DistDiff detector. Turns out it isn't quite as good as I was originally hoping (credibility is around 56%) but we'll see how the additional detector pans out for us in the long run.
Started adding a proper configuration system to netevmon, so that we can easily enable and tweak the individual detectors via a config file (as opposed to the parameters all being hard-coded). I'm using libyaml to parse the config file itself, using Brendon's AMP schedule parsing code as an example.
Spent a day looking into DUCK support for libtrace, since the newest major DAG library release had changed both the duckinf structure and the name and value of the ioctl to read it. When trying to test my changes, I found that we had broken DUCK in both libtrace and wdcap so managed to get all that working sensibly again. Whether this was worthwhile is a bit debatable, seeing as we don't really capture DUCK information anymore and nobody else had noticed this stuff was broken for months :)
Continued the painful process of migrating my python prototype for mode detection over to C++ for inclusion in netevmon. Managed to get the embedded R portion working correctly, which should be the trickiest part.
Spent a bit of time with our new libtrace testbed, getting the DAG 7.5G2s configured and capturing correctly. Ran into some problems getting the card to steer packets captured on each interface into separate stream buffers, as the firmware we are currently running doesn't appear to support steering.
Finished and submitted my PAM paper, after incorporating some feedback from Richard.
Fixed a minor libwandio bug where it was not giving any indication that a gzipped file was truncated early and content was missing.
Managed to get a new version of the amplet code from Brendon installed on my test amplet. Set up a full schedule of tests and found a few bugs that I reported back to the developer. By the end of the week, we were getting closer to having a full set of tests working properly -- just one or two outstanding bugs in the traceroute test.
Got netevmon running again on the test NNTSC. Noticed that we are getting a lot of false positives for the changepoint and mode detectors for test targets that are hosted on Akamai. This is because the series is fluctuating between two latency values and the detectors get confused as to which of the values is "normal" -- whenever it switches between them, we get an erroneous event. Added a new time series type to combat this: multimodal, where the series has 2 or 3 clear modes that it is always switching between. Multimodal series will not run the changepoint or mode detectors, but I hope to add a special multimode detector that alerts if a new and different mode appears (or an old mode disappears).