So, I have been completely revising RFServer at the moment. Getting rid of most of the remnants of mongodb to hopefully have something a lot simpler to deal with.
This is mostly because I noticed about 4 thousand or so bugs with the code in my 591 to deal with network partitions. Hopefully, this will help reduce that number a little..
Monitored Caida and Planetlab as more warts data is produced. Caida is collecting data with an updated driver for measuring the efficiency of several methods of choosing the flow ID when running MDA traceroute.
Data from the second round of non flow ID field analysis has been processed to raw data and then summarised using a spread sheet. There seem to be a small number routers that can segregate packets to different successor nodes based on a number of non standard fields (not the classic 5-tuple).
The slide set for the PhD conference has been updated. More graphs with error bars for 95% confidence have been produced, including a graph of the percentage of paths with a particular type of load balancer, and this was done for the three usual packet types.
Fixed a couple of bugs in the event grouping code that meant it was
running much slower than it should when groups got large. It should now
be a lot smarter about excluding attributes from the grouping process if
there is no way that using them could result in better groups.
Had a good meeting with Lightwire on Wednesday and got good feedback
about our software. Spent some time talking with Nathan trying to fix
issues they were having with it, and putting together
packages/instructions so that they can install AMP alongside their other
monitoring. This is looking much more complicated than it should be, so
will have to see how much of this can be taken care of in pre/post
install scripts etc. Most of the work is in setting up the server
though, so only needs to be performed once.
Spent most of the week preparing for my Sydney trip. Wrote the talk I will be presenting this coming Thursday and gave a practice rendition on Friday.
The rest of my time was spent fixing minor issues in Cuz -- trying not to break anything major before I go away for a week. Replaced the bad SQLAlchemy code in the ampy netevmon engine with some psycopg2 code, which should make us slightly more secure. Also tweaked some of the event display stuff on the dashboard so that useful information is displayed in a sensible format, i.e. less '|' characters all over the place.
Had a useful meeting with Lightwire on Wednesday. Was pleased to hear that their general impression of our software is good and will start working towards making it more useful to them over the summer.
I am/have submitting/have submitted my hardbound thesis.
Finished off gathering field testing data on Caida. Began the next run which is testing the efficiency of several flow ID selection modes. It was necessary to update the driver which was previously used to collect this data type on Yoyo.
More work has been done on the slide set for the conference. More statistical tests have been carried out and more graphs have been produced from the raw data. Further pruning of the slide set has been carried out.
Managed to get a working implementation of Flott which does the necessary initialisation and calculations for obtaining the t-entropy of a given string! It took longer than expected though - I was right about the objects and functions that I would need out of the original source code, but missed a number of lines in different places which meant that the tokens and values used in the calculations were incorrect, thus resulting in an incorrect output. So, I spent many, many hours adding debugging output in my implementation and the original code after each iteration/processing and compared the results to figure out what had gone wrong. I was then able to produce a t-entropy value that was very close to the original program's output. After going over the original code again, there was a scaling factor that I had missed and that fixed the last issue.
Over the next week, the plan is to refactor the code and finalise it for addition to Netevmon.
Finished reformatting the data to remove some mess and unnecessary
layers of nesting that had crept in while trying different things. It
should now be set up to deal properly with representing multiple lines,
split up or grouped by however the backend wants to do so. Updated all
the tests to use the new data format.
Spent an afternoon with Shane and Brad designing how we are going to
represent graphs with multiple lines, in a way that will let us merge
and split data series based on how the user wants to view the data.
Tidied up the autogenerated colours for the smokeping graphs to use
consistent series colours across the summary and detail views, while
also being able to use the default smokeping colouring if there is only
a single series being plotted.
I added multiple table support to RouteFlow, and am now trying to add my 591 work on top of that, but it is taking longer than I expected.
Having multiple tables does simplify the structure of things and fixes most of the interface issues I had with the older version of this code, but in spite of this I am having a lot of problems getting this to work.
The new psycopg2-based query system was generally working well but using significant amounts of memory. This turned out to be due to the default cursor being client-side, which meant that the entire result was being sent to the querier at once and stored in memory. I changed the large data queries to use a server-side cursor which immediately solved the memory problem. Instead, results are now shipped to the client in small chunks as needed -- since the NNTSC database and exporter process are typically located on the same host, this is not likely to be problematic.
Netevmon now tries to use the measurement frequency reported by NNTSC for the historical data wherever possible rather than trying to guesstimate the frequency based on the time difference between the first two measurements. The previous approach was failing badly with our new one stream per tested address approach for AMP as individual addresses were often tested intermittently. If there is no historical data, then a new algorithm is used that simply finds the smallest difference in the first N measurements and uses that.
Changed the table structure for storing AMP traceroute data. The previous method was causing too many problems and required too much special treatment to query efficiently. In the end, we decided to bite the bullet and re-design the whole thing, at the cost of all of the traceroute data we had collected over the past few months (actually, it is still there but would be painful to convert over to the new format).
Had a long but fruitful meeting with Brendon and Brad where we worked out a 'view' system for describing what streams should be displayed on a graph. Users will be able to create and customise their own views and share them easily with other users. Stream selections will be described using expressions rather than explicitly listing stream ids as it is now (although listing specific streams will still be possible).
This will allow us to create a graph showing a single line aggregating all streams that match the expression: "collection=amp-icmp AND source=ampz.waikato.ac.nz AND destination=www.google.com AND family=ipv4". Our view could also include a second line for IPv6. By using expressions, we can have the view automatically update to include new streams that match the criteria after the view was created, e.g. new Google addresses.