Attempting to run netevmon against a decent quantity of historical data has been causing significant performance problems and even preventing NNTSC from processing and storing new measurements. After a bit of hackish profiling, I realised that the biggest problem was the time taken to query for traceroute data. Unlike most of the other existing data tables, the traceroute data is spread across three tables which are joined to create a view that we query from.
Unfortunately, the join was not smart enough to recognise that the traceroute test ids it was looking for all fell within a certain set of table partitions. Instead, it would sequentially scan millions of rows across all of the test tables. After a lot of messing around with the SQL used to create the view, I found that the best approach was to instead use a procedure that figured out the test ids that fell within the time period being queried for and returned a table constructed using constraints on the test ids as well as timestamp and stream ids.
This managed to get the query time for several weeks worth of data down from 12 seconds to 2 seconds. The next problem was using the procedure within SQLAlchemy in place of a "data table", as SQLAlchemy treats the returned table as a Result object rather than a Table object. This meant that there weren't any Column objects available for us to operate on, e.g. apply aggregation functions for generating graph data.
At this point, it became apparent that SQLAlchemy was more of a hindrance than a benefit and I decided we would be better off replacing it with the much simpler but more intuitive psycopg2, at least for the database querying side of NNTSC. Spent the remainder of my week writing and testing the new query code.
Continued work on the changepoint detection algorithm.
Finished tidying up the code, commenting and optimising. Pushed the detector to the git repository.
For now sensitivity seems to be reasonable and from my testing events are lining up closely with the other detectors. Extremely noisy data sets still have what visually look like false positives, however these are most likely valid detections.
Concluded running the tests early this week, so as a result have started drafting up the evaluation chapter of my report. Handed off a couple of my draft chapters to get reviewed so that I can start to refactor them.
Still working on the tEntropy detector, but have made good progress this week. Ironed out any bugs that I found, and have output in the correct format. Then, I spent a great deal of time collecting output for 8 different streams, each with different character bin sizes and string lengths. Also wrote a python script which takes the output files for different streams (which includes the string used for entropy measurements) and passes it to an external script which calculates an average t-entropy measurement for each timestamp. So, I now have a bunch of output files with entropy values that need to be plotted to determine which combination of string lengths and character bin sizes would be most optimal.
After a brief look at a couple of graphs, it seemed that a greater string length(50) had no benefits over using a smaller string size (20). The patterns were practically similar for each string length and differed very little, which implies that the additional computational cost of calculating the t-entropy for 50characters for every single timestamp is not worth it.
Data collection on Caida and Planetlab has been monitored and data has been downloaded when complete.
Programs have been written to analyse the data from the first half of the 'per flow like fields' collection on Caida. Results have been added to a draft slide set for the students conference.
The Internet simulator has finished its most recent runs.
I set up an snmp poller to look at consistency of packet counts at two ends of a link and it looks fairly promising. I'm currently setting up an openflow system for some more extensive testing.
I've also been looking at the latest versions of RouteFlow as well to get an idea of what has changed since I last worked on it. And investigating what could be done to include multiple table support and other things needed for what I am doing.
Continued work on the change-point detection algorithm.
Found the issue causing loss of the probability mass, extreme outlying points have 0 probability of being in any of the previous runs. So for now I'm considering this as a change point.
Refactored the normal distribution into its own class.
The fix to losing probability broke increased the sensitivity of the algorithm again. So I'm now keeping copies of the run distribution before adding a new datapoint, so this can be reverted back to if the threshold on the required number of consecutive datapoints isn't reached.
Started tidying up the code and optimising.
Started to mock up an interface to the graphs that would allow multiple
data series to be shown at once and hopefully still be fairly simple to
select what to display. Had a good look around the bootstrap library to
see what it is capable of. Will continue with this once we have the
capability to plot multiple series on a graph.
Moved on to making use of the new datastreams available once Shane split
apart those that had the same test parameters (including destination
name) but different target addresses. Updated the matrix to be able to
display summaries of groups of streams while also being able to display
information on individual ones. While looking at the tooltip graph
generation I found it to be very slow for targets with multiple
addresses - turns out the aggregated queries were being separated at the
last moment so we were hitting the same data in the database once per
series. Got this mostly aggregated again in those situations where the
query durations are similar.
Added a simple method to fetch AMP schedule files over HTTP. Currently
runs on startup, but will be scheduled to run regularly. Will hopefully
not be too much work to move to HTTPS and use information in the client
certificates to serve different files to different clients.
Changed the stream definition for both the AMP ICMP and AMP traceroute collections in NNTSC to include the address that was tested to. This means that we can more easily analyse the behaviour of specific paths and show each one as a separate line on our graphs.
Added an LRU-based detector to netevmon, mainly for use with the traceroute data. The detector maintains an LRU of values that it has seen recently (e.g. hop counts) and creates an event anytime it has to add a new value to the LRU. This will also be used to check for changes in the full path returned by the traceroute test.
Continued working on script to implement pre-session end warnings. Started on report.