User login

Search Projects

Project Members

Storage of Network Monitoring and Measurement Data

I am designing and building a system to allow for storage and retrieval of large amounts of network measurement and monitoring data. I need to make a flexible system that will be capable of dealing with a wide range of data such as polled data and data from flows as well as being fast enough to cope with high rates of live network data. The end goal is to provide this information to an anomaly detecting algorithm that can detect changes in the network and alert system administrators of the exact problem as well as presenting the information using graphs via a web interface.

29

Jul

2013

Table partitioning is now up and running inside of NNTSC. Migrated all the existing data over to partitioned tables.

Enabled per-user tracking in the LPI collector and updated Cuz to deal with multiple users sensibly. Changed the LPI collector to not export counters that have a value of zero -- the client now detects which protocols were missing counters and inserts zeroes accordingly. Also changed NNTSC to only create LPI streams when the time series has a non-zero value occur, which avoids the problem of creating hundreds of streams per user which are entirely zero because the user never uses that protocol.

Added ability to query NNTSC for a list of streams that had been added since a given stream was created. This is needed to allow ampy to keep up to date with streams that have been added since the connection to NNTSC was first made. This is not an ideal solution as it adds an extra database query to many ampy operations, but I'm hoping to come up with something better soon.

Revisited and thoroughly documented the ShewhartS-based event detection code in netevmon. In the process, I made a couple of tweaks that should reduce the number of 'unimportant' events that we have been getting.

22

Jul

2013

Somewhat disrupted week this week, due to illness.

Replaced the template-per-collection for the graph pages with a single template that uses TAL to automatically add the right dropdowns to the page for the collection being shown on that page. Added callback code to allow proper switching between LPI metrics when browsing the graphs -- it isn't perfect but it wasn't worth putting too much effort into it when we're probably going to completely change the graph selection method at some point.

Added code to ampy to query data from the AMP ICMP test. Also added an API function that returns details about all of the streams associated with a collection -- this will be used to populate the matrix with just one request rather than having to make a request for every stream.

Worked on getting NNTSC to use table partitioning so that we can avoid having to select from massive unwieldy data tables. Seems to working well with my test database but the big challenge is to migrate the existing 'production' database over to a partitioned setup.

15

Jul

2013

Made a number of minor changes to my paper on open-source traffic classifiers in response to reviewer comments.

Modified the NNTSC exporter to inform clients of the frequency of the datapoints it was returning in response to a historical data request. This allows ampy to detect missing data and insert None values appropriately, which will create a break in the time series graphs rather than drawing a straight line between the points either side of the area covered by the missing data. Calculating the frequency was a little harder than anticipated, as not every stream records a measurement frequency (and that frequency may change, e.g. if someone modifies the amp test schedule) and the returned values may be binned anyway, at which point the original frequency is not suitable for determining whether a measurement is missing.

Added support for the remaining LPI metrics to NNTSC, ampy and amp-web. We are now drawing graphs for packet counts, flow counts (both new and peak concurrent) and users (both active and observed), in addition to the original byte counts. Not detecting any events on these yet, as these metrics are very different to what we have at the moment so a bit of thought will have to go into which detectors we should use for each metric.

05

Jul

2013

Added support for the Libprotoident byte counters that we have been collecting from the red cable network to netevmon, ampy and amp-web. Now we can visualise the different protocols being used on the network and receive event alerts whenever someone does something out of the ordinary.

Replaced the dropdown list code in amp-web with a much nicer object-oriented approach. This should make it a lot easier to add dropdown lists for future NNTSC collections.

Managed to get our Munin graphs showing data using a Mbps unit. This was trickier than anticipated, as Munin sneakily divides the byte counts it gets from SNMP by its polling interval but this isn't very prominently documented. It took a little while for myself, Cathy and Brad to figure out why our numbers didn't match those being reported by the original Munin graphs.

Chased down and fixed a libtrace bug where converting a trace from any ERF format (including legacy) to PCAP would result in horrendously broken timestamps on Mac OS X. It turned out that the __BYTE_ORDER macro doesn't exist on BSD systems and so we were erroneously treating the timestamps as big endian regardless of what byte order the machine actually had.

Migrated wdcap and the LPI collector to use the new libwandevent3

Changed the NNTSC exporter to create a separate thread for each client rather than trying to deal with them all asynchronously. This alleviates the problem where a single client could request a large amount of history and prevent anyone else from connecting to the exporter until that request was served. Also made NNTSC and netevmon behave more robustly when a data source disappears -- rather than halting, they will now periodically try to reconnect so I don't have to restart everything from scratch when I want to apply changes to one component.

Finally, my paper on comparing the accuracy of various open-source traffic classifiers was accepted for WNM 2013. There's a few minor nits to possibly tidy up but it shouldn't require too much work to get camera-ready.

26

Jun

2013

There is currently an increasing demand for accurate and reliable traffic classification techniques. Libprotoident is a library developed at the WAND Network Research Group (WAND) that uses four bytes of payload for the classification of flows. Testing has shown that Libprotoident achieves similar classification accuracy to other approaches, while also being more efficient in terms of speed and memory usage. However, the primary weakness of Libprotoident is that it lacks the visualisation component required to encourage adoption of the library.

This report describes the implementation of a reliable real-time collector for Libprotoident that will form the back-end component to support a web-based visualisation of the statistics produced by the library. The collector has been designed and implemented to support the classification of flows and exporting of application usage statistics to multiple clients over the network in separate threads, whilst operating asynchronously so as to achieve high performance when measuring multi-gigabit networks.

Author(s): 
Meenakshee Mungro

24

Jun

2013

Added manpages to netevmon to get it ready for Debian packaging. During this process, fixed a few little oversights in the netevmon script and the existing documentation.

Re-wrote much of the NNTSC API in ampy. The main goal was to reduce the amount of duplicated code in modules for individual NNTSC collections that was better suited to a more general NNTSC API. In the process I also changed the API to only use a single "NNTSC Connection" instance rather than creating and destroying one for every AJAX request. The main benefit of this is that we don't have to ask the database about collections and streams every time we make a request now -- instead we get them once and store that info for subsequent use. This will hopefully make the graph interface feel a bit more responsive.

Updated amp-web to use the new NNTSC API in ampy. I also spent a bit of time on Friday testing the web graphs on various browsers and fixing a few of the more obvious problems. Unsurprisingly, IE 10 was the biggest source of grief.

Added a new time series type to anomaly_ts -- JitterVariance. This time series tracks the standard deviation of the latencies reported by the individual smokeping pings. Using this, I've added a new event type designed to detect when the standard deviation has moved away from being near zero, e.g. the pings have started reporting variable latency. This helps us pick up on situations where the median stays roughly the same but the variance clearly indicates some issues. It also serves as a good early indicator of upcoming Plateau or Mode events on the median latency.

17

Jun

2013

Finished preparing NNTSC for packaging. Wrote an init script for the NNTSC collector and ensured that all of the subprocesses are cleaned up when the main collector process is killed. Wrote some manpages, updated the other documentation and added some licensing to NNTSC before handing it off to Brendon for packaging.

Also moved towards packaging netevmon. Again, lots of messing around with daemonisation and ensuring that the monitor can be started and stopped nicely without anyone having to manually hunt down processes.

Spent the rest of my time working on the interaction between amp-web and History.js. Only one entry is placed in the history for each visited graph now and selecting a graph from the history will actually show you the right graph. Navigating to a graph via the history will also now update the dropdown lists to match the currently viewed graph. When using click and drag to explore a graph, clicking once on the graph will return to the previous zoom level (this was already present, but only worked for exploring the detailed graph, not the summary one).

10

Jun

2013

Spent most of my week working on making the various components of NNTSC and netevmon backgroundable so that they are a lot easier to run long-term. This was pretty straightforward for the C++ programs but the python scripts have been a bit trickier, especially in terms of getting the logging going to the right place.

Also fixed a few of the outstanding issues with amp-web. In particular, I fixed the problems we were having with the X-axis of the summary graph being garbled and ensured that the summary graph will always show a sensible time period based on the region shown in the detailed view. These changes also meant I could remove the summary timestamps from the page URL, which cleans that up quite a bit.

04

Jun

2013

Finished fixing the URLs in amp-web so that they are ordered sensibly and can support NNTSC streams that are defined using more than just "source" and "target". I also changed the ordering of the timestamps in the URL so that we can specify a start and end time for the detailed graph only (sensible defaults for the summary graph are meant to be chosen in this case). This is really handy when creating URLs that link to graphs showing events.

Started looking into what needed to be done to prepare NNTSC and netevmon for packaging and a possible distribution for our friends at Lightwire. Spent a decent chunk of time writing a README that should describe exactly how to get a NNTSC instance up and running.

NNTSC and netevmon both have tracs now and I've added a series of tickets to each with the aim of getting a release ready for Lightwire by the end of the month.

15

Apr

2013

Another short week, due to being away on Tuesday and Wednesday.

Started writing up a decent description of the design and implementation of NNTSC, which would hopefully make for a decent blog post. It also means that the entire thing is stored somewhere other than in my head...

Revisited the eventing side of our anomaly detection process. Had a long but eventually productive discussion with Brendon about what information needs to be stored in the events database to be able to support the visualisation side. We decided that, given the NNTSC query mechanism, events should have information about the collection and stream that they belong to so that we can easily filter them based on those parameters. We used to use "source" and "destination" for this, but streams are defined using more than just a source and destination now.

Updated anomalyfeed, anomaly_ts and eventing to support the new info that needs to be exported all the way to the eventing program. In the process, I moved eventing into the anomaly_ts source tree (because they shared some common header files) and wrangled automake into building them properly as separate tools. Got to the stage where everything was building happily, but not running so good :(

08

Apr

2013

Very short week this week, but managed to get a few little things sorted.

Added a new dataparser to NNTSC for reading the RRDs used by Munin, a program that Brad is using to monitor the switches in charge of our red cables. The data in these RRDs is a lot noisier than smokeping data, so it will be interesting to see how our anomaly detection goes with that data. Also finally got the AMP data actually being exported to our anomaly detector - the glue program that converted NNTSC data into something that can be read by anomaly_ts wasn't parsing AMP records properly.

Spent a bit of time working on adding some new rules to libprotoident to identify previously unknown traffic in some traces sent to me by one of our users.

Spent Friday afternoon talking with Brian Trammell about some mutual interests, in particular passive measurement of TCP congestion window state and large-scale measurement data collection, storage and access. In terms of the latter, it looks many of the design decisions we have reached with NNTSC are very similar to those that he had reached with mPlane (albeit mPlane is a fair bit more ambitious than what we are doing) which I think was pretty reassuring for both sides. Hopefully we will be able to collaborate more in this space, e.g. developing translation code to make our data collection compatible with mPlane.

03

Apr

2013

Exporting from NNTSC is now back to a functional state and the whole event detection chain is back online. Added table and view descriptions for more complicated AMP tests; traceroute, http2 and udpstream are now all present. Hopefully we can get new AMP collecting and reporting data for these tests soon so we can test whether it actually works!

Had some user-sourced libtrace patches come in, so spent a bit of time integrating these into the source tree and testing the results. One simply cleans up the libpacketdump install directory to not create as many useless or unused files (e.g. static libraries and versioned library symlinks). The other adds support for the OpenBSD loopback DLT, which is actually a real nuisance because OpenBSD isn't entirely consistent with other OS's as to the values of some DLTs.

Helped Nathan with some TCP issues that Lightwire were seeing on a link. Was nice to have an excuse to bust out tcptrace again...

Looks like my L7 Filter paper is going to be rejected. Started thinking about ways in which it can be reworked to be more palatable, maybe present it as a comparative evaluation of open-source traffic classifiers instead.

25

Mar

2013

Turns out that once again, the current design of NNTSC didn't quite meet all of the requirements for storing AMP data. The more complicated traceroute and HTTP tests needed multiple tables for storing their results, which wasn't quite going to work with the "one stream table, one data table" design I had implemented.

Managed to come up with a new design that will hopefully satisfy Brendon while still allowing for a consistent querying approach. Implemented the data collection side of this, including creating tables for the traceroute test. This was a bit trickier than planned, because SQLAlchemy doesn't natively support views and also the traceroute view was rather complicated.

Currently working on updating the exporting side to use id numbers to identify collections rather than names, since there is no longer any guarantee that the data will be located in a table called "data_" + module + module_subtype.

Also spent a fair bit of time reading over Meenakshee's report and covering it in red pen. Pretty happy with how it is coming together.

18

Mar

2013

Short week this week, as I spent Thursday and Friday in Wellington at the cricket.

Wrote an assignment on libtrace for 513, along with some "model" answers.

Continued reading and editing Meenakshee's report.

Had a vigorous discussion with Brendon about what he needs the NNTSC export protocol to do to support his AMP graphing needs. Turns out the protocol needs a couple of new features, namely binning/aggregation and a no-more-data indicator, which I started working on adding. So far, this has mostly involved taking some of the working code from my anomaly detector feeder program, which is an NNTSC client, and turning it into a NNTSC client API.

Put out a request to our past students for their Honours reports so that they can be published on the website. Thanks to those who have responded.

12

Mar

2013

Despite the limitations of current network monitoring tools, there has been
little investigation into providing a viable alternative. Network operators need
high resolution data over long time periods to make informed decisions about
their networks. Current solutions discard data or do not provide the data
in a practical format. This report addresses this problem and explores the
development of a new solution to address these problems.

Author(s): 
Nathan Overall

11

Mar

2013

Added a data parser module to NNTSC to process the tunnel user count data that we got from Lightwire. Managed to get the data going all the way through to the event detection program which spat out a ton of events. Spent a bit of time combing through them manually to see whether the reported events were actually worth reporting -- in a lot of cases they weren't, so I've refined the old Plateau and Mode algorithms a bit to hopefully resolve the issues. I also employed the Plunge detector on all time series types, rather than just libprotoident data, and this was useful in reporting the most interesting behaviours in the tunnel user data (i.e. all the users disappearing).

Added a couple of new features to the libtrace API. The first was the ability to ask libtrace to give you the source or destination IP address as a string. This is quite handy because normally processing IP addresses in libtrace involves messing around with sockaddrs which is not particularly n00b-friendly. The second API feature was the ability to ask libtrace to calculate the checksum at either layer 3 or 4 based on the current packet contents. This was already done (poorly) inside the tracereplay tool, but is now part of the libtrace API. This is quite useful for checksum validation or if you've modified the packet somehow (e.g. modified the IP addresses) and want to recalculate the checksum to match.

Also spent a decent bit of time reading over chapters from Meenakshee's report and offering plenty of constructive criticism.

04

Mar

2013

The NNTSC export protocol is complete now and happily exports live data to any clients that have subscribed to data streams that are being collected. Using this, I've been able to get the anomaly detection tool chain working with our SmokePing data right up to the eventing phase. Fixed a minor bug in the eventing code that would result in badly-formed event groups if the events do not strictly arrive in chronological order (which can happen if you are working with multiple streams of historical data).

Fixed a few libtrace bugs this week - the main one being trace_event being broken for int: inputs. It was just a matter of the callback function being registered inside the wrong #ifdef block but took a little while to track down.

Spent the latter part of my week tidying up my libtrace slides in preparation for a week of teaching 513 later this month.

25

Feb

2013

The new NNTSC now supports the LPI collector and installs nicely. Still waiting on Brendon to get his AMP message decoding code finished to his satisfaction and that will be the next thing to get working on the data collection side.

Also started developing a new data query / export mechanism that allows clients to connect and register their interest in particular data streams to receive ongoing live data as it is collected by NNTSC. The old approach for this involved the client explicitly stating the ID numbers for the streams they wanted data for, which was pretty suboptimal because it required knowledge that should really be internal to the database.

The other problem is that now we have different data table layouts for each different type of data stream, we need to inform clients about that structure and how to interpret the data stream.

All of this meant that I've had to design and implement an entirely new protocol for NNTSC data export. It's a request/response based protocol - the client can request the list of collections (i.e. the different data stream types), details about a specific collection (i.e. how the stream and data tables are laid out) and the list of streams associated with a given collection. It can then subscribe to a given stream, giving a start and end time for the time period required. If the time period includes historical data, that is immediately extracted from the database and sent. If the time period includes future data, the client and stream id is remembered so that new data can be sent to the client as it rolls into NNTSC.

Currently at the stage where we're getting historical data out of the database and preparing to send it to the client, with live data being the next main task to do.

30

Oct

2012

Well the year is pretty much over now. Just submitting the last few assignments.

520 Report hand in went well. I gauged the time well and happily submitted on time. Although my word count was lower than everyone else I feel the reduction in words was made up by a stronger focus and keeping my arguments concrete. I guess I'll find out if that was wise in due course.

It's been an awesome year being part of WAND and everything that entails. Thanks to everyone involved and good luck in the future.

21

Oct

2012

Been working pretty much constantly on my report for the past few weeks now. At this point I have a bit of tidying up before I write my conclusion and hand it in on tuesday. Overall I'm pretty happy with the report. I've received really good feedback from Brendon and Scott regarding how to improve and clean up my writing. At this point I'm on schedule to finish on time.