Libtrace 3.0.19 has been released.
The main purpose of this release is to fix a problem that prevented the libtrace 3.0.18 release from building on FreeBSD 10. A number of other minor bugs were also fixed, such as some libpacketdump decoding errors on big-endian CPUs and a bug in the ring: format that led to set_capture_length changing the wire length instead of the capture length.
This release also incorporates a patch from Martin Bligh that adds support for reading pcap traces that support nanosecond timestamp resolution via the pcapfile: URI.
The full list of changes in this release can be found in the libtrace ChangeLog.
You can download the new version of libtrace from the libtrace website.
Spent the past 2 weeks collecting more samples of event groups and updated the data in the spreadsheet, so I'll have a better idea of which groups have an insufficient sample size. Andrew had already finalised and entered the data for his HMMDetector for the old streams, so I made sure to include HMM events in the newer streams I analysed (afrinic, lacnic, trademe and apnic).
I also realised I had mislabelled some events detected by the Changepoint detector whenever a loss in measurements occured, so I spent some time double-checking the events and the graphs and updating the appropriate severity value. We decided to exclude them from the detector probability values, since they are a different type of event (similar to LossDetector and Noisy-to-Constant/Constant-To-Noisy updates).
I'll collect more samples (if needed!) and update the values used by the different detectors and fusion methods, and finally move on to validating the output produced by the fusion methods next.
Slow final week, mostly spent fixing lots of minor issues here and there. I also added tooltips to the smokeping graph and did some work on improving the usefulness of information in legend tooltips. I have some extra clean up to do as I still have a few open branches, so I'll address those over the next week or so.
Wrote a report to hand off to the faculty, created a slideshow and wrote some notes for the presentation on Monday (which went very well).
The warts analysis was modified to provide data to the megatree mathematical model. Megatree involves local and distributed approaches to avoiding the mapping of the same load balancer more than once, and is based on Doubletree. In particular subsets of the 70000 destinations sets were created to create model data for varying numbers of destinations. Regression analysis was carried out to provide model structure.
The fast mapping analysis has been updated to include a full MDA trace at the beginning and end of the data collection cycle. This is to check for route and load balancer changes. Our fast mapping protocol uses six runs of paris traceroute between the MDA runs. There is still some more debugging and design to carry out.
Spent some time working on things to help keep the amplet code clean and
tidy. Added stricter compilation options and fixed up some cases where
these triggered warnings. Started working on unit tests for amplet based
on the built in automake target "check". Wrote very simple unit tests
for the icmp and traceroute tests as well as the nametable management.
While writing the nametable unit tests I found and fixed a bug that
would limit the nametable to only a single item.
Briefly had a look at different database options available to us that
might perform better with our data than postgres. There are still
further optimisations we can make to how we store our data in postgres,
but it will be interesting to see how they compare to something like
cassandra, HBase or riak.
Continued redevelopment of the NNTSC exporting code to be more robust and reliable. Replaced the live data pipes used by the dataparsers to push live data to the exporter with a RabbitMQ queue, which seems to be working well.
Modified the way that subscribing to streams worked to try and solve a problem we were having where data that arrived while historical data was being queried was not being pushed out to interested clients. Now, we store any live data that arrives for streams that are still being queried and push that out as soon as we get an indication from the query thread that the query has finished.
Unfortunately, we can still miss historical measurements if they haven't been committed to the database at the time when the final query begins. This often crops up if netevmon is resubscribing after NNTSC has been restarted, resulting in us missing out on the last historical measurement before the subscribe message arrives. Still looking for an elegant way to solve this one.
Added a version check message to the NNTSC protocol. This message is sent by the server as soon as a client connects and the client API has been updated to require its internal version to match the one received from the server. If not, the client stops and prints a message telling the user to update their client API. This should be helpful to the students who were previously getting weird broken behaviour with no apparent explanation whenever I made an API change to the production NNTSC on prophet.
Chased down a build issue with libtrace on FreeBSD 10. Turns out we had made the dist tarball with an old version of libtool which was stupidly written to never build shared libraries if the OS matched FreeBSD 1* (because FreeBSD 1.X didn't support shared libraries). Easy enough to fix, I just have to remember to make the libtrace distribution on something other than Debian Squeeze. Will start working on a new libtrace release in the near future so I don't keep getting emails from FreeBSD users.
Sam Russel solved my vlan problem for me. The issue was I was using vlan 1 and the pronto treats it as native. I tried my tests again using vlan 2 and everything worked perfectly.
I have however found a bigger issue with the pronto. The stats counters are extremely inaccurate. Much moreso than the problems I have been having with openvswitch.
Speaking of openvswitch, Joe has been trying to fix the counter inaccuracy for me, so I have been trying to help with that. I'm starting to come to grips with how the stats reporting is performed in OVS, but I still havent got any ideas about what is causing the problem with the counters.
A start has been made on mathematically modelling a program along the lines of doubletree to probe load balancers once. Initial attempts to model the distributions of the local and global sets are underway and some initial results have been collected.
A run of my version of fastmapping has been started. I am rechecking the validity of this approach to attempting to observe black holes in load balancers.
Looked into the DPDK format again since last time a couple of new releases had been made and this had broken the build process due to it relying on patches. The main problems came from the addition of new NICs and features as separate libraries and a removed function. I've found a good solution to this which I expect will continue work for future releases. I discovered DPDK has an option to compile all of the libraries into one and instead of patches we will require a specific build command. For some reason I couldn't seem to get autotools to detect the newest DPDK library yet the older version library works correctly I cannot seem to figure out why.
Also started to hack together the parallel version of DPDK, I discovered I might not need to use the DPDK to start the threads however I need to look into how safe this really is.
Caught up with Perry about any suggestions and requests he had
* Using empty tick packets to ensure packets are always received on any given thread at a timed interval for live (or tracetime) traces otherwise after a certain number of packets.
* Discussed having a message type for packets rather than a separate argument to the per packet function and details of how to best wait for either a message or a file descriptor without slowing performance.
* Link Up/Down messages
* Locking the creation of the filter is not enough because complex filters might use memory so one filer must be compiled per thread.
* Having a separate format for every thread which would allow snaplength filters etc to be set separately (I'm still not 100% sold on this)
* Wrapping single threaded formats inside another to split the trace out
* Trace Swiss Army Knife - all tools in one
* Wanted a copy on SVN so he could see the progress so far
The plan for this week is to tidy up and get a copy into a SVN branch and get the DPDK build fix working and back into trunk and update wiki docs.
This week I have captured a number of traces of BitTorrent traffic, both encrypted and unencrypted for training the model.
I have also been working on pre-processing these traces to remove flows with packet loss or retransmission as these are unsuitable for training the model.
I also now have the traces sorted into flows with libflowmanager, and a list of packet sizes and arrival times associated to each flow.