User login

Shane Alcock's Blog

08

Apr

2013

Very short week this week, but managed to get a few little things sorted.

Added a new dataparser to NNTSC for reading the RRDs used by Munin, a program that Brad is using to monitor the switches in charge of our red cables. The data in these RRDs is a lot noisier than smokeping data, so it will be interesting to see how our anomaly detection goes with that data. Also finally got the AMP data actually being exported to our anomaly detector - the glue program that converted NNTSC data into something that can be read by anomaly_ts wasn't parsing AMP records properly.

Spent a bit of time working on adding some new rules to libprotoident to identify previously unknown traffic in some traces sent to me by one of our users.

Spent Friday afternoon talking with Brian Trammell about some mutual interests, in particular passive measurement of TCP congestion window state and large-scale measurement data collection, storage and access. In terms of the latter, it looks many of the design decisions we have reached with NNTSC are very similar to those that he had reached with mPlane (albeit mPlane is a fair bit more ambitious than what we are doing) which I think was pretty reassuring for both sides. Hopefully we will be able to collaborate more in this space, e.g. developing translation code to make our data collection compatible with mPlane.

03

Apr

2013

Exporting from NNTSC is now back to a functional state and the whole event detection chain is back online. Added table and view descriptions for more complicated AMP tests; traceroute, http2 and udpstream are now all present. Hopefully we can get new AMP collecting and reporting data for these tests soon so we can test whether it actually works!

Had some user-sourced libtrace patches come in, so spent a bit of time integrating these into the source tree and testing the results. One simply cleans up the libpacketdump install directory to not create as many useless or unused files (e.g. static libraries and versioned library symlinks). The other adds support for the OpenBSD loopback DLT, which is actually a real nuisance because OpenBSD isn't entirely consistent with other OS's as to the values of some DLTs.

Helped Nathan with some TCP issues that Lightwire were seeing on a link. Was nice to have an excuse to bust out tcptrace again...

Looks like my L7 Filter paper is going to be rejected. Started thinking about ways in which it can be reworked to be more palatable, maybe present it as a comparative evaluation of open-source traffic classifiers instead.

25

Mar

2013

Turns out that once again, the current design of NNTSC didn't quite meet all of the requirements for storing AMP data. The more complicated traceroute and HTTP tests needed multiple tables for storing their results, which wasn't quite going to work with the "one stream table, one data table" design I had implemented.

Managed to come up with a new design that will hopefully satisfy Brendon while still allowing for a consistent querying approach. Implemented the data collection side of this, including creating tables for the traceroute test. This was a bit trickier than planned, because SQLAlchemy doesn't natively support views and also the traceroute view was rather complicated.

Currently working on updating the exporting side to use id numbers to identify collections rather than names, since there is no longer any guarantee that the data will be located in a table called "data_" + module + module_subtype.

Also spent a fair bit of time reading over Meenakshee's report and covering it in red pen. Pretty happy with how it is coming together.

18

Mar

2013

Short week this week, as I spent Thursday and Friday in Wellington at the cricket.

Wrote an assignment on libtrace for 513, along with some "model" answers.

Continued reading and editing Meenakshee's report.

Had a vigorous discussion with Brendon about what he needs the NNTSC export protocol to do to support his AMP graphing needs. Turns out the protocol needs a couple of new features, namely binning/aggregation and a no-more-data indicator, which I started working on adding. So far, this has mostly involved taking some of the working code from my anomaly detector feeder program, which is an NNTSC client, and turning it into a NNTSC client API.

Put out a request to our past students for their Honours reports so that they can be published on the website. Thanks to those who have responded.

11

Mar

2013

Added a data parser module to NNTSC to process the tunnel user count data that we got from Lightwire. Managed to get the data going all the way through to the event detection program which spat out a ton of events. Spent a bit of time combing through them manually to see whether the reported events were actually worth reporting -- in a lot of cases they weren't, so I've refined the old Plateau and Mode algorithms a bit to hopefully resolve the issues. I also employed the Plunge detector on all time series types, rather than just libprotoident data, and this was useful in reporting the most interesting behaviours in the tunnel user data (i.e. all the users disappearing).

Added a couple of new features to the libtrace API. The first was the ability to ask libtrace to give you the source or destination IP address as a string. This is quite handy because normally processing IP addresses in libtrace involves messing around with sockaddrs which is not particularly n00b-friendly. The second API feature was the ability to ask libtrace to calculate the checksum at either layer 3 or 4 based on the current packet contents. This was already done (poorly) inside the tracereplay tool, but is now part of the libtrace API. This is quite useful for checksum validation or if you've modified the packet somehow (e.g. modified the IP addresses) and want to recalculate the checksum to match.

Also spent a decent bit of time reading over chapters from Meenakshee's report and offering plenty of constructive criticism.

04

Mar

2013

The NNTSC export protocol is complete now and happily exports live data to any clients that have subscribed to data streams that are being collected. Using this, I've been able to get the anomaly detection tool chain working with our SmokePing data right up to the eventing phase. Fixed a minor bug in the eventing code that would result in badly-formed event groups if the events do not strictly arrive in chronological order (which can happen if you are working with multiple streams of historical data).

Fixed a few libtrace bugs this week - the main one being trace_event being broken for int: inputs. It was just a matter of the callback function being registered inside the wrong #ifdef block but took a little while to track down.

Spent the latter part of my week tidying up my libtrace slides in preparation for a week of teaching 513 later this month.

25

Feb

2013

The new NNTSC now supports the LPI collector and installs nicely. Still waiting on Brendon to get his AMP message decoding code finished to his satisfaction and that will be the next thing to get working on the data collection side.

Also started developing a new data query / export mechanism that allows clients to connect and register their interest in particular data streams to receive ongoing live data as it is collected by NNTSC. The old approach for this involved the client explicitly stating the ID numbers for the streams they wanted data for, which was pretty suboptimal because it required knowledge that should really be internal to the database.

The other problem is that now we have different data table layouts for each different type of data stream, we need to inform clients about that structure and how to interpret the data stream.

All of this meant that I've had to design and implement an entirely new protocol for NNTSC data export. It's a request/response based protocol - the client can request the list of collections (i.e. the different data stream types), details about a specific collection (i.e. how the stream and data tables are laid out) and the list of streams associated with a given collection. It can then subscribe to a given stream, giving a start and end time for the time period required. If the time period includes historical data, that is immediately extracted from the database and sent. If the time period includes future data, the client and stream id is remembered so that new data can be sent to the client as it rolls into NNTSC.

Currently at the stage where we're getting historical data out of the database and preparing to send it to the client, with live data being the next main task to do.

18

Feb

2013

The development of NNTSC took another dramatic turn this week. After conferring with Brendon, we realised that the current design of the data storage tables was not going to support the level of querying and analysis that he wanted for AMP data. This spurred me to quickly write up a prototype for a new NNTSC from scratch that allowed each different data collection method to specify exactly how the data table should look. This means that instead of having one unified data table with the inflexible schema of (stream id, timestamp, data value), we now have an AMP ICMP test data table that is (stream id, timestamp, pkt size, rtt, loss, error code, error type) and a Smokeping data table that is (stream_id, timestamp, uptime, loss, median, ping1, ... ping20).

We've also done away with the central queue and simply given each data parser its own connection to our database. This fixes a problem I was having where trying to read data from a file too fast was causing the queue to fill up and run the machine out of RAM.

Smokeping data collection is now working with the new NNTSC, so I now need to write the data parsing modules for each of the other input sources we used to support as well as re-do all the nice installation script stuff I had done for the previous version of NNTSC.

11

Feb

2013

Made some significant modifications to the structure of NNTSC so that it can be packaged and installed nicely. It is now no longer dependent on scripts or config files being in specific locations and handles configuration errors robustly rather than crashing into a python exception. Still got a few bugs and tidy-ups still to do, particularly relating to processes hanging around even after killing the main collector.

Managed to get some tunnel user counts from Scott at Lightwire to run through the event detection code. Added a new module to NNTSC for parsing the data, but have not quite got the data into the database for processing yet.

Spent a decent chunk of time helping Meenakshee write and practice her talk for Thursday. Once the talk was done, we got back into the swing of development by fixing some obvious problems with the current collector.

04

Feb

2013

Made a few modifications to Brendon's detectors which make them perform better across a variety of AMP time-series. In particular, the Plateau detector no longer uses a fixed percentage of the trigger buffer mean as its event threshold - instead it uses several standard deviations from the history buffer. Also fixed some problems we were having with being in an event and treating all the following measurements that are similar to those that triggered the event as anomalous. This is a problem in cases where the "event" is actually the time series moving to a new normality: our algorithm just kept us in the event state the whole time!

Once I was happy with that, got the eventing code up and running against the events reported by the anomaly detection stage. Had to make a couple of modifications to the protocol used to communicate between the two to get it working properly (there were some hard-coded entries in Brendon's database that needed a more automated way of being inserted). Tried to get the graphing / visualisation stuff going after that, but there are quite a few issues there so that may have to wait a bit.

Started looking into packaging and documenting the usage of all the tools in the chain that we've now got working. First up was Nathan's code, which is proving a bit tricky so far because a) it's python so no autotools and b) his code is rather reliant on other scripts being in certain locations relative to the script being run.

Added another protocol to libprotoident: League of Legends.