Finished reformatting the data to remove some mess and unnecessary
layers of nesting that had crept in while trying different things. It
should now be set up to deal properly with representing multiple lines,
split up or grouped by however the backend wants to do so. Updated all
the tests to use the new data format.
Spent an afternoon with Shane and Brad designing how we are going to
represent graphs with multiple lines, in a way that will let us merge
and split data series based on how the user wants to view the data.
Tidied up the autogenerated colours for the smokeping graphs to use
consistent series colours across the summary and detail views, while
also being able to use the default smokeping colouring if there is only
a single series being plotted.
I added multiple table support to RouteFlow, and am now trying to add my 591 work on top of that, but it is taking longer than I expected.
Having multiple tables does simplify the structure of things and fixes most of the interface issues I had with the older version of this code, but in spite of this I am having a lot of problems getting this to work.
The new psycopg2-based query system was generally working well but using significant amounts of memory. This turned out to be due to the default cursor being client-side, which meant that the entire result was being sent to the querier at once and stored in memory. I changed the large data queries to use a server-side cursor which immediately solved the memory problem. Instead, results are now shipped to the client in small chunks as needed -- since the NNTSC database and exporter process are typically located on the same host, this is not likely to be problematic.
Netevmon now tries to use the measurement frequency reported by NNTSC for the historical data wherever possible rather than trying to guesstimate the frequency based on the time difference between the first two measurements. The previous approach was failing badly with our new one stream per tested address approach for AMP as individual addresses were often tested intermittently. If there is no historical data, then a new algorithm is used that simply finds the smallest difference in the first N measurements and uses that.
Changed the table structure for storing AMP traceroute data. The previous method was causing too many problems and required too much special treatment to query efficiently. In the end, we decided to bite the bullet and re-design the whole thing, at the cost of all of the traceroute data we had collected over the past few months (actually, it is still there but would be painful to convert over to the new format).
Had a long but fruitful meeting with Brendon and Brad where we worked out a 'view' system for describing what streams should be displayed on a graph. Users will be able to create and customise their own views and share them easily with other users. Stream selections will be described using expressions rather than explicitly listing stream ids as it is now (although listing specific streams will still be possible).
This will allow us to create a graph showing a single line aggregating all streams that match the expression: "collection=amp-icmp AND source=ampz.waikato.ac.nz AND destination=www.google.com AND family=ipv4". Our view could also include a second line for IPv6. By using expressions, we can have the view automatically update to include new streams that match the criteria after the view was created, e.g. new Google addresses.
Finished all of my chapters this week and have had them reviewed, just a matter now of tidying up a few places and then I'll be done.
I gave the final version of my thesis to the printers so they can do the hardbound version.
Moved the multiple series line graphs back to using the smokegraph
module, but with colouring based on the series rather than to indicate
loss. This appears to work well for the smaller data series that I've
tested on, though I have yet to get a sensibly aggregated set of data
for those graphs with very large numbers of streams.
The new graphs with arbitrary numbers of data series had caused event
labels to be triggered on mouseover for almost all series except the
first, which I fixed. Only a dummy series will trigger mouse events, so
that it doesn't try to display information about every single data point
on the graph. Through profiling I also found many extraneous loops and
checks for events that could be prevented by properly disabling events
on the summary graph as well.
Also spent some time reading and critiquing honours reports, not long to go!
I have been monitoring the Caida run of non 5-tuple field analysis. I have started to download the completed warts sets.
A dump of load balancer ID and next hops was included in the data analysis of the warts files from Caida and Planetlab. A perl program using this data was written to count load balancers found by only one of the vantage points. This is to help determine how affective the coverage we have is.
Data from the fourth scamper run has been processed, comparing it to the first run. These results have been incorporated into the conference slides. Furthermore the introduction of the slide set has been extended and more graphs have been added. Some pruning has also been carried out.
The large analysis of Caida runs on wraith is still running. This should provide information about how many new load balancers are added with each new vantage point. This is using quite a lot of memory as the data from all the vantage points is held at the same time, so wraith is ideal for the purpose.
Over the last two weeks, I have been working on the TEntropy detector.
During the first week, I used anomaly_ts and anomaly_feed and produced output for a number of different streams by using a combination of different metrics, string lengths, sliding window sizes, and range delimiters. After producing strings for each sliding window sample, a python script calls the external t_Entropy function with the string as a parameter to obtain the average t-Entropy for each string and pipes the output to a file. I then wrote another Python script to produce a Gnuplot script for producing time-series graphs so that I could inspect the results. At this point, it was apparent that the t-Entropy detector was a feasible option and hence, I had to start implementing the actual t-entropy calculations within Netevmon.
Spent last week going over the T_Entropy library that I found called Fast Low Memory T-transform (flott), which is used to compute the T-complexity of a string which in turn is used to compute the t-Entropy. Unfortunately, the library consisted of around a dozen .c and header files, which made it somewhat tricky to determine which parts I would need. So, I spent around 3 days looking over the source code and trying to understand it before starting to work on adding the necessary bits to a new detector. Found the function that is used for calculating the actual t-complexity, t-information and t-entropy values, so have been working on duplicating those calculations. However, there are a number of other initialisation functions that are required before the t-* can be calculated, so I have to look into them at some point.
Also had a bunch of marking to do, so couldn't spend all week working on the flott adaptation.
So I did some tests with how counters are affected by changing flows in open vswitch. There is a bug in earlier versions of ovs that would have ruined everything, but fortunately it's fixed in the latest version. It also seems like every now and then one packet will not get counted if you change flows. This seems to happen less than once in a million packets though, so I cant imagine it being a big deal.
Other than that I have been adding the changes from my 591 project onto the latest release of vandervecken and working out the best approach to adding support for multiple tables to RFServer.
Finished migrating our database query code in NNTSC from SQLAlchemy to psycopg2.
Released libwandevent 3.0 and updated netevmon to use it instead of the deprecated libwandevent 2 API.
Continued to be stymied by performance bottlenecks when querying large amounts of historical data from NNTSC using netevmon. The problems all relate to attempts to export live data at the same time breaking down, which eventually caused the data collection to block waiting to write live data to the exporter. Because the data collection was blocked, no new data was being collected or written to the database.
The first new problem I found was (surprisingly) caused by our trigger function that writes new data into the right partition. Because there is no "CREATE IF NOT EXISTS" for triggers in postgres, we were dropping the trigger and then re-creating it whenever we switched to a new partition. However, you can't drop a trigger from a table without having an exclusive lock on the table. If the table is under heavy query load (e.g. from netevmon) then the DROP TRIGGER command will block until the querying ends. The solution was reasonably straightforward -- check the metadata tables for the existence of the trigger and only create it if it doesn't exist.
The other problem was that our select queries were happening in the same thread as the reading of live data from the exporter. Despite last week's improvements, the queries can still take a little while and live data was building up while the query was taking place. Furthermore, we were only reading one message from the live data queue before returning to querying so we would never catch up once we fell behind. To fix this, I've implemented a worker thread pool for performing the select queries instead so we can export live data while a query is ongoing.