I continued fixing/improving RouteFlow. I moved my mutlitable fastpath into the multitable RouteMod rather than having a separate option for it. With Brad's help we got this working on the Pica8, turns out we needed to turn on combinated mode otherwise VLAN tagging can be overzealous. I found and corrected a bug with the inter-switch link (ISL) implementation for the multitable case. I also found an issue with ARP neighbourhood entries being added before the interface is properly up (has received a mapping packet) which results in the flow not being installed. This requires some retry system, which RouteFlow currently does not have.
I also looked at OFLOPS-turbo this week. It appears some fixes/improvements have been made over the original OFLOPS in addition to support for 10Gbit netfpgas. However, there are still some aspects that need work such as supporting newer versions of OpenFlow, reducing CPU usage, some places in the code spins on gettimeofday() rather than sleeping etc. It appears I might have to make some large changes to add the functionality that I want.
The IS0 thesis chapter update was completed with the new validation results and checked.
The chapter on load balancer prevalence thesis chapter was updated to rely on validation from the per destination load balancer analysis with limited flow IDs. Basically this analysis relies on finding diamond divergence points confidently without discovering all successors or nested load balancers confidently. This vastly reduces the amount of traffic required while still providing some key information.
A literature search for more relevant papers was carried out. Discussion and references were added to the related work thesis chapter as appropriate.
Built the classifier with WEKA using weka.classifiers.bayes.NaiveBayesMultinomialText using the ArffLoader class to load instances instead of the data set to help save memory.
Then used test data set to build a ranked list to check the output. Just need to polish that and evaluate the classifier.
Kept working with Chromium to try to get complete information on object
fetch timings. It looks like I should be able to get full timing
information for every object if I can set the Timing-Allow-Origin
header. Currently stymied by the library crashing in its memory freelist
implementation when I try to modify HTTP response headers.
Had a closer look into the behaviour of wget to try to confirm some test
methodology used by some other data sources I'm looking at. Turns out
that wget actually measures only the amount of time spent reading from
the socket and ignores everything else, reporting a very misleading time
Investigated further into some MTU issues we were seeing to confirm the
behaviour we were reporting. Something in the path only has a 1400 byte
MTU but doesn't always send packet too big messages, which is causing
lots of connection failures.
Spent some time proofreading reports.
Made a new dump of up to date data for analysis, including all test
types this time. Spent some time talking to Ray about it.
Generated some graphs to try to show the comparison in latency between
two connections. Some connections that should be quite similar are a few
milliseconds different to the same target at the same time, but quite
consistent across all targets and all times. Connections that are known
to be different also exhibited a lot of similar latency to some targets,
but across multiple targets and over time there are clear differences.
Spent some more time trying to get to grips with the embedded Chromium
library and how to implement my own versions of the URL fetching functions.
Another short week -- this time on leave Monday and Tuesday.
Started integrating traceroute events into the event grouping part of netevmon. Changed the focus of the path change detection away from which ASN appears at each hop; instead, we look for new transitions between ASNs as this will mean we don't trigger events when it takes 4 hops to get through REANNZ instead of the usual 3.
Developed a system for assigning traceroute events to ASN groups. PathChange events are matched with the ASNs for both the old and new path transition, e.g. a change from REANNZ->AARNET to REANNZ-Vocus will be assigned to the ["REANNZ", "AARNET", "Vocus"] groups. A TargetUnreachable event will be matched with the ASNs that are now missing from the traceroute as well as the last observed ASN. A TargetReachable event uses the same "identify common path elements" approach that latency events use (for want of a better system right now).
Fixed some more event detection and grouping bugs as I've found them. One fix was to make sure we at least create a group for the test target if the AS path for the stream does not reach the destination.
Spent some time on Friday proof-reading the BTM report.
I think I've missed two of these...
I've spent a bunch of time re-wrorking my packet scheduler this past week. I wanted to allow for link-layer retransmission of packets which ht epacket scheduler was not originally designed to do. It compiles, and messages get sent however the upper MAC layers manage their own re-transmissions and ACK messages no longer get passed up and so nothing works just yet. This is a simple case of re-working the state machine to accept the new packet states; ACK, NO ACK and NOT SENT (in the case the packet fails CSMA and never leaves the radio).
ubiquiOS (which seems to be the topic of my last report) is well integrated and working. Before the packet scheduler re-work, I had 3 nodes able to simultaneously associate with the gateway node (pan coordinator). After association, the nodes would ping the gateway and measure the response times. With a beacon period of 500ms, the response time averaged at 500ms with 1% packet loss.
Adding link layer retransmission should improve that packet loss figure.
After that, I want to start working on AES. I'll hard code the keys for now, and try to install them at a sensible time. At which point, I can celebrate because that's the infrastructure I needed to start my project! I can then remove the hard coded keys, and replace them with authentication and key exchange.
Stretch goal is to add 6LoWPAN on top - which shouldn't be too difficult, as I've essentially made a pipe that correctly sized packets can be piped into.
REST API works a charm, freeradius can authenticate users and add OpenFlow rules to allow the customer to talk to a router.
VLANs are hard, it breaks everything and I'm yet to find a solution for this. Matching on 2 vlans is still not a thing and handling a single vlan, stripping it and throwing it at a second table doesn't work. It seems OVS can only process one VLAN per run through the switch, which sucks.
Talked to Chris at Lightwire about his solution, which is hack the code until it works. I'm probably not going to be able to get Lightwire IP out of him for the project, and it sounds hard. I'll have to come up with some other solution.
This week has focused on the check-paths routine for validating the IS0 simulator. I recently found that this routine was getting into an endless loop, though it did not exhibit runaway memory usage. The problem was that it was looking for a next hop and replying not found, and so on. It turned out that it was not reading missing hop identifiers from the hops-dest file of hop information, and so there were gaps in some traces when read into the hash tables. Another problem was that reading the TTL value from the log file looked for an equals sign that didn't exist, and so the whole line was read as the TTL value causing the hash table to be corrupted. Finally the simulator message string was too short to print some of the longer messages. This also resulted in some errors when analysing the trace log file, as for example some addresses were truncated and were flagged as incorrect by the program.
Short week as I was on leave on Thursday and Friday.
Continued tweaking the event groups produced by netevmon. My main focus has been on ensuring that the start time for a group lines up with the start time of the earliest event in the group. When this doesn't happen, it suggests that there is an incongruity in the logic for updating events and groups based on a new observed detection. Now the problem happens rarely -- which is good from the perspective that I am making progress but it is also bad because it takes a lot longer for a bad group to occur so testing and debugging is much slower.
Spent a bit of time rewriting Yindong's python trace analysis using C++ and libflowmanager. My program was able to run much faster and use a lot less memory, which should mean that wraith won't be hosed for months while Yindong waits for his analysis to run.
Added a new API function to libtrace to strip VLAN and MPLS headers from packets. This makes the packets easier to analyse with BPF filters as you don't need to construct complicated filters to deal with the possible presence of VLAN tags that you don't care about.
Installed libtrace on the Endace probe and managed to get it happily processing packets from a virtual DAG without too much difficulty.