I have written a chapter on efficient discovery of load balancer diamond divergence points at the expense of internal diamond nodes. This used a restricted set of flow IDs. There is now also a corresponding section in the discussion chapter.
I also wrote a chapter on improving the efficiency of MDA analysis by changing the way in which the source port flow ID is chosen. The three methods were incrementing, random and sequential bit flipping. Sequential bit flipping did not appear to offer any advantage in discovering the successor set of a load balancer sooner. However it was hard to know what to expect as there are likely to be a number of different hashing algorithms used by different types of routers.
Continued fine-tuning the event groupings produced by netevmon. Major changes I made include:
* When displaying groups on the dashboard, merge any event groups that have the exact same members.
* Avoid including events in a group if they don't have a reasonable D-S score, even if there are other similar significant events happening at the same time. This gets rid of a lot of pointless (and probably unrelated events) from each group and also ensures groups expire promptly. This change has introduced a few flow-on effects: the insignificant events still need to be members of the group (in case they do eventually become significant) but shouldn't affect any of the group statistics -- particularly the group start time.
* Allow events that occur within one expiry period before the first event in a group to be included in that group -- threaded netevmon doesn't export events in a strict chronological order any more, so we need to be careful not to throw away out-of-order events.
* Have a fallback strategy if there is no AS path information available for a given source, dest pair (e.g. there is no traceroute test scheduled or the traceroute test has failed for some reason). Instead, we will create 2 groups: one for the source and one for the target.
* Polished up the styling of the dashboard and event group list and fixed a few UI issues that Brendon had suggested after looking at it on Friday.
Switched from using Netflow to sFlow since it turns out it is the most convenient way to get the information I need with the equipment available. Brad gave me some info on the switch which is exporting me traffic so I can differentiate between incoming and outgoing flows. I am having to manually check the interfaces in my parser program to get direction information. This isn't the most flexible way of doing it but the versions of Netflow and sFlow that are available do not support direction information and only handle incoming packets, not outgoing ones. I plan on making a configuration file which will contain the interface on which packets are sent to the Internet for the network on which my application is being installed. My parser program will use this to ascertain which flows are outgoing.
Currently I maintain 2 databases; one for ingress flows and one for egress. This is a result of me having to coordinate the individual flows which sFlow exports for each ingress interface. To reduce the amount of entries in my databases, I will only collect information from a couple of incoming interfaces on the local network. I will listen to all incoming packets on the uplink interface to the Internet.
Shane said altering lpicollector to export MAC addresses would be too much trouble so I skipped that idea. I still plan on using it later on to get application layer information and associate them with the flow exports produced by sFlow.
This week I will do as much of the application as possible with the data I have.
Found a paper where they cluster event logs with word vector pairs; this approach compares each pair to each other pair in the supplied logs allowing it to cluster lines with similar parts. There is also a toolkit associated with the paper that allows you to specify the input files and the support for making a pair then outputs the clusters where the support is reached, the outlier clusters can also be outputted. This will need to be investigated further to see if it is a good possible solution.
Had a meeting with Antti Puurula about possible approaches, where we discussed outputting a ranking into lists of safe and unsafe events. It was discussed on how this could be evaluated with a Mean Average Precision measurement and then a few algorithms that could be used for scoring events like clustering if the feature space could be separated, supervised learning if all the data was tagged or his recommended a supervised learning where the user manually updates the list of safe/unsafe and the classifier updates iteratively.
Then non language features like time stamps were discussed on how to integrate them as well by having another algorithm like niavebayes handling continuous features. This way we could identify events happening within a certain time period of one another to tie events between files.
Short week as I was off sick on Monday and Tuesday.
Spent some time looking into using a headless web testing environment
as an alternative to the current HTTP test. This would give us
don't (due to them being generated programmatically or obfuscated). Not
all of the headless testing software appears to give full access to the
events that I'm interested in, while some are written such that they
will be awkward to integrate into an AMP test. Currently looking at
embedded Chromium as most likely to be useful.
Started refactoring some of the configuration parsing code in amplet to
remove some unnecessary globals and remove some cruft from the main loop
that didn't really need to be there.
Updated the website authentication to make it easy to toggle on and off,
as we don't want to protect the public site. Merged this and the rest of
the recent changes (raw data fetching etc) back into the develop branch.
Spent some time looking into what appear to be periodic MTU issues on
one of our test connections that are preventing the throughput test from
running. Confusing matters is that I'm not sure how well the route cache
deals with network namespaces - it sometimes appears as if it is all
shared between all connections, but sometimes it doesn't. It's possible
these symptoms would go away with a newer kernel version (route cache
was removed, better network namespace support).
Brad managed to track down a newer video card for quarterpounder, so now BSOD is up and running again.
Added Meena's lpicollector to our github so now I can finally deprecate the lpi_live tool that comes with libprotoident. Spent a bit of time updating some documentation and reworking the example client scripts so that everything is a bit easier to use. Also fixed a couple of memory bugs that I may have introduced last time I worked on the collector.
Continued working with the new event groups. Found a problem where I was incorrectly preferring shorter AS path segments over longer ones when determining whether I could remove a group for being redundant. Having fixed that, many event groups now cover several ASNs so I've redesigned the event list on the dashboard to be better at displaying multiple AS names.
Have a copy of ubiquiOS in hand. No further progress on the project.
Looking forward to integrating ubiquiOS and improving all the low-level OS functionality (software timers, memory allocation and debug output spring to mind!).
The source code for both BSOD and Meenakshee Mungro's reliable libprotoident collector have been added to the WAND github page. Developers can freely clone these projects and make their own modifications or additions to the source code, while keeping up with any changes that we make between releases.
This is the first time we have released the libprotoident collector under the GPLv3 license. This project is a replacement for the lpi_live tool included with libprotoident, which should now be considered deprecated.
We're also more than happy to consider pull requests for code that adds useful features to either project.
WAND on GitHub
This week I've been continuing to look at OFLOPS (a testing framework for OpenFlow switches) and enquiring about hardware to test upon. Josh Bailey has put me in contact with the original OFLOPS authors, so I will work with them to figure out the best way to update OFLOPS.
I've been looking into what the best library is for constructing/parsing OpenFlow messages on top of a measurement platform such as OFLOPs. I'm looking for something in a low level language such as C/C++ for performance reasons. However I also need to support at least OpenFlow 1.0 and OpenFlow 1.3. Hopefully the differences between the two will be abstracted as much as possible allowing the same code to create both OF 1.0 and 1.3 messages (and any future versions). Ideally the code is going to be concise etc.
I'm currently looking at libfluid, (floodlight's) loxigen and OFConnect. I'm currently writing some simple cases in each to see how easy they are to use.