Finished my site with Django. First I added the functionality to display a logfile to the page to get familiar with Django, plus it can be extended later for filtering a single file etc.
Next I implemented the 60 second window search between two files, this takes the first file and goes through each window comparing it to the window in the second file; specifically it compares the similarity using dice's coefficient of each event to each other event then averages that to display the average similarity for that window.
Next I linked each window to a separate page where the window is broken down into its singular events (each step breaks it down further). So at this stage we can compare one event to the events in the second files window using the same similarity score.
In the vein of breaking the files down into smaller parts for each step to process these huge files in manageable steps, in the future I think it would be nice to be able to visualise the difference between two events; for example Google's diff-match-patch API is a good example of showing the differences between two strings.
And of course working on my presentation for the honours conference.
Carried out corrections on the Doubletree chapters of my thesis draft. These study the IS0 simulator and the BISD simulator. IS0 was created by Tony McGregor and adapted to sources windows in this research. BISD (Basic Internet Simulator of Doubletree) was created as part of this research and examines the same problem only it was used to make use of data containing repeated destinations. This was not the case for the Team data downloaded from CAIDA. This meant that the usefulness of both stop sets could be determined together. BISD is based on the trace by trace warts analysis software used with scamper.
Continued working on the new WDCap. Most of my time was spent writing the packet batching code that groups processed packets into batches and then passes them off to the output threads.
Along the way I discovered a bug with parallel libtrace when using ticks with a file that causes the ordered combiner to get stuck. Spent a bit of time with Richard trying to track this one down.
Listened to our Honours student's practice talks on Thursday and Friday afternoon. Overall, was fairly impressed with the projects and what the students had achieved and am looking forward to seeing their improved talks on Wednesday.
Wrote a simple program to try to capture packet traces of long running
HTTP tests and started it running. Captured a few successfully from the
local test amplets, but it doesn't deal well with the short intervals
between tests on the real amplet I want to test. Put a couple of traces
into a HAR viewer which showed most of the delay in that case being
connection delay to a number of related servers. Doesn't work with HTTPS
and doesn't quite show the detail I want, so may need to write some more
Fixed the Debian packaging to include all the python scripts in the
server package, which took a lot longer than it should have (noone ever
mentions the --single-version-externally-managed option to setuptools).
Made a few other minor packaging updates to init scripts, man pages,
sample configs etc. Brought the Centos packaging scripts up to date as well.
Worked with Shane to get skeptic updated to a more recent version of the
server-side amp code. It should now be able to process data from more
recent amplet clients (we are getting close to being able to upgrade the
current NZ mesh).
Looked into string similarity measures like jaccard index and dice's coefficient to use as a metric for comparing events. Then looked into how to display the information so started making a website with Django to display window similarity and then from there once a window is selected it is broken down into the events it is made of for further examination. Also examined googles diff match patch API for visualising the difference between 2 strings.
With the new data from Brad that comes from multiple machines I made a script to parse this into a database to use in conjunction with the site. Aim is to get this finished off in time for the presentation.
I have created a script for generating some random data based on the data supplied by Brad. This is for development purposes so I can implement certain features where otherwise I would be unable. I have added an `application` field (containing HTTP, DNS etc) and a `flows` field. I will also add some IP addresses of some common applications (youtube, gmail) to get the data looking like it is from a router with a default gateway to the Internet. Controlling the database also allows me to decrease the amount of entries in the database to improve the performance of the application during the presentations.
Responded to critique and updated the chapters: "Analysis of fields other than the classic five tuple" and "Detection of black holes in load balancers". Changes included improved description of tables and interpretation of figures. The chapters were reread once the changes were made.
Continued working on wdcap4. The overall structure is in place and I'm now adding and testing features one at a time. So far, I've got snapping, direction tagging, VLAN stripping and BPF filtering all working. Checksum validation is working for IP and TCP; just need to test it for other protocols.
Still adding and updating protocols in libprotoident. The biggest win this week was being able to identify Shuijing (Crystal): a protocol for operating a CDN using P2P.
Helped Brendon roll out the latest develop code for ampsave, NNTSC, ampy and amp-web to skeptic. This brings skeptic in line with what is running on prophet and will allow us to upgrade the public amplets without their measurement data being rejected.
Wrote some more unit tests to check that AMP tests were correctly
reporting data using protocol buffers, and that the data coming out
matched what was put in.
Updated the build system to properly reflect the new requirements for
protocol buffers and Debian packaging dependencies.
Did some initial testing with individual tests to make sure that nntsc
would accept the data, and fixed a couple of issues that I found (mostly
signed vs unsigned mismatches). Ran a proper client with a full test
schedule and checked the results against existing data to make sure that
everything was working as expected.
I made some changes to the reporting of the oflops module that I'm using to log to a separate file. This also now happens at run time rather than being saved in memory and printed at the end of the test. I've also fixed a couple of bugs and consistency issues.
I looked into issues with queuing within libfluid and as a result I've exposed the socket file descriptor and will no longer send packet-outs if the descriptor is full.
I've explored the implications of adding extra rules at a higher priority above the match, given that OpenFlow 1.3 expects the cookie from the matching rule to be included in the packet-in message. However initial testing seems to show almost no overhead added to OVS and that some switches do not actually include the cookie. As such it seems to be a somewhat unfair and insightful test.
Brad set me up with access to patch ports allowing me to plugin different switches into different machines. I've written a python script to run my testing and done some initial trail runs, this seems to be functioning well. I've also moved my testing to a more powerful machine, to ensure this is not a factor in results.