Continued working on the new WDCap. Most of my time was spent writing the packet batching code that groups processed packets into batches and then passes them off to the output threads.
Along the way I discovered a bug with parallel libtrace when using ticks with a file that causes the ordered combiner to get stuck. Spent a bit of time with Richard trying to track this one down.
Listened to our Honours student's practice talks on Thursday and Friday afternoon. Overall, was fairly impressed with the projects and what the students had achieved and am looking forward to seeing their improved talks on Wednesday.
Wrote a simple program to try to capture packet traces of long running
HTTP tests and started it running. Captured a few successfully from the
local test amplets, but it doesn't deal well with the short intervals
between tests on the real amplet I want to test. Put a couple of traces
into a HAR viewer which showed most of the delay in that case being
connection delay to a number of related servers. Doesn't work with HTTPS
and doesn't quite show the detail I want, so may need to write some more
Fixed the Debian packaging to include all the python scripts in the
server package, which took a lot longer than it should have (noone ever
mentions the --single-version-externally-managed option to setuptools).
Made a few other minor packaging updates to init scripts, man pages,
sample configs etc. Brought the Centos packaging scripts up to date as well.
Worked with Shane to get skeptic updated to a more recent version of the
server-side amp code. It should now be able to process data from more
recent amplet clients (we are getting close to being able to upgrade the
current NZ mesh).
Looked into string similarity measures like jaccard index and dice's coefficient to use as a metric for comparing events. Then looked into how to display the information so started making a website with Django to display window similarity and then from there once a window is selected it is broken down into the events it is made of for further examination. Also examined googles diff match patch API for visualising the difference between 2 strings.
With the new data from Brad that comes from multiple machines I made a script to parse this into a database to use in conjunction with the site. Aim is to get this finished off in time for the presentation.
I have created a script for generating some random data based on the data supplied by Brad. This is for development purposes so I can implement certain features where otherwise I would be unable. I have added an `application` field (containing HTTP, DNS etc) and a `flows` field. I will also add some IP addresses of some common applications (youtube, gmail) to get the data looking like it is from a router with a default gateway to the Internet. Controlling the database also allows me to decrease the amount of entries in the database to improve the performance of the application during the presentations.
Responded to critique and updated the chapters: "Analysis of fields other than the classic five tuple" and "Detection of black holes in load balancers". Changes included improved description of tables and interpretation of figures. The chapters were reread once the changes were made.
Continued working on wdcap4. The overall structure is in place and I'm now adding and testing features one at a time. So far, I've got snapping, direction tagging, VLAN stripping and BPF filtering all working. Checksum validation is working for IP and TCP; just need to test it for other protocols.
Still adding and updating protocols in libprotoident. The biggest win this week was being able to identify Shuijing (Crystal): a protocol for operating a CDN using P2P.
Helped Brendon roll out the latest develop code for ampsave, NNTSC, ampy and amp-web to skeptic. This brings skeptic in line with what is running on prophet and will allow us to upgrade the public amplets without their measurement data being rejected.
Wrote some more unit tests to check that AMP tests were correctly
reporting data using protocol buffers, and that the data coming out
matched what was put in.
Updated the build system to properly reflect the new requirements for
protocol buffers and Debian packaging dependencies.
Did some initial testing with individual tests to make sure that nntsc
would accept the data, and fixed a couple of issues that I found (mostly
signed vs unsigned mismatches). Ran a proper client with a full test
schedule and checked the results against existing data to make sure that
everything was working as expected.
I made some changes to the reporting of the oflops module that I'm using to log to a separate file. This also now happens at run time rather than being saved in memory and printed at the end of the test. I've also fixed a couple of bugs and consistency issues.
I looked into issues with queuing within libfluid and as a result I've exposed the socket file descriptor and will no longer send packet-outs if the descriptor is full.
I've explored the implications of adding extra rules at a higher priority above the match, given that OpenFlow 1.3 expects the cookie from the matching rule to be included in the packet-in message. However initial testing seems to show almost no overhead added to OVS and that some switches do not actually include the cookie. As such it seems to be a somewhat unfair and insightful test.
Brad set me up with access to patch ports allowing me to plugin different switches into different machines. I've written a python script to run my testing and done some initial trail runs, this seems to be functioning well. I've also moved my testing to a more powerful machine, to ensure this is not a factor in results.
Updates to the chapter of my thesis on load balancing by non classic 5-tuple fields have been carried out. The question arose as to what my strongest conclusions were. A further analysis was carried on by finding all of the cases where there were no cases of a node seen as load balancing occurring as non load balancing i.e. there were hits but no misses at each occurrence of an interface in trace data. These cases of non per-packet hits with no misses were summarised and presented as stronger evidence of this type of load balancing, than when non load balancing cases of the interface were found. Two cases of load balancing on a number of non 5-tuple fields occurred where there were no misses at all.
Made good progress with the application this week. Left to go are location statistics, application statistics, flows count and IP-based differentiation. I will also add an 'interval' fields to my configuration file so the user can decide how long to keep flows for.
Planning to start user evaluations next week ready for the presentations.