Gave up on the ordered combiner. I tried a number of different things I thought would optmise my ring buffer's performance only to be beaten by the deque with the same sorting algorithm.
I put my attention back to the ring buffer - both for TPACKET_V3 and also to fix support for TPACKET_V2 when using parallel libtrace. There is almost enough difference in the working of TPACKET_V3 to warrant a new format, however the code is much the same but with different structures. We want to keep support for TPACKET_V2, falling back if TPACKET_V3 is not available. One suggestion has been to use memcpy to overwrite the functions for linux_ring, replacing them with linux_ringv3 functions if TPACKET_V3 support is present.
Before I started implementing things, I decided to take some time to benchmark the existing implementation. linux_ring is broken in parallel libtrace, so I only benchmarked the performance of the single threaded standard libtrace. Using ostinato on a standard interface (not using DPDK) I was able to get a maximum datarate of around 430Mbits with 64 byte packets. This turned out to be enough, as I was seeing around 40% packet loss. I kept turning down the data rate until packet loss was no longer seen and this was at around 340Mbits.
Next I started building a simple packet counter program that made use of TPACKET_V3. This has not been completed yet, but I have started to understand the details of TPACKET_V3 versus TPACKET_V2.
Started work on batching packet processing from the DPDK format, or for that matter all parallel formats. Because I had previously allowed batching from single threaded formats that are split by libtrace this has not been to difficult to fit in.
However performance was lacking and I have been working on trimming the code down as much as possible and finding the bottlenecks. For this perf has been very useful.
One large factor that I noticed slowing performance was work being done on all packets in a batch at various stages in the pipeline (after they are kicked out of CPU cache) so I have been moving all processing as close to when the user reads packets.
Rewrote some of the eventing script to handle multiple streams as input and store their data independantly since it had only ever been tested with a single stream at a time. Shane introduced some new magnitude calculations and I ran anomaly_ts with the ground truth series as input to get the new magnitude scores. Plotted different versions of those values and were reasonably satisfied with the results, although the slow duration/high magnitude/FPs were still there. Took a break from playing with magnitudes and moved on to event grouping. Spent some time rethinking the grouping strategy and started working on event grouping by target with multiple series.
Got certificate request verification working, so now signed certificates
will only be sent to those clients that can prove their identity and
their right to retrieve that certificate. Requests are fingerprinted and
saved on the server to await signing.
Had a quick audit of the key/certificate management code with Brad, and
found a few places where security needs to be tightened. In particular,
careful attention needs to be paid to making sure that insecure ciphers
can't be used. Because we have full control over all the client and
server software, we can limit all communication to using only TLS >= 1.2
with a small set of strong ciphers. Spent some time investigating
exactly which ciphers we want to use, and how to enable them.
Also spent a lot of time reading the OpenSSL wiki and various
patches/changelogs to determine when certain fixes were made. The Debian
packages are fairly up to date, but are still missing some well known
changes that are needed to improve security, so I implemented them
within the AMP environment.
The IS0 simulations completed and a graph was created. This event based simulator ran a large data set and increased the number of destinations per AS to 20. This made it take much longer to run, as it used much more of the Internet for the analysis. The results make the sources window setting of 500 look like the best option compared to some larger ones, and also to no window.
I added a 'related work' chapter to my thesis and also some information on dynamic routing. In particular I commented on the apparent likelihood of black holes in load balancers. It seems likely that for OSPF IGP networks that they would last a maximum of 40 seconds, as this is how long an adjacent node will wait for a Hello packet before removing the node from its routing table.
Got Wireshark working using a python script to capture packets from the USB dongle and pipe them to Wireshark.
Set up the network so that both VMs shared the same wifi hot spot. From my phone I could access the routers web interface via WiFi connection. Been looking into 6lbr router code and 6lbr-demo code to get an idea of how to implement the 6lbr border router as a bridge and devices as a CoAP server. Next week I plan to get the devices communicating using the CoAP protocol with DTLS using contiki example code, each device in the network will act as a web server to display sensor data on request from a CoAP client (which will be my host VM using the Firefox browser and Copper add on (Copper is a protocol handler for CoAP to interact with the Internet of Things)).
Added code to NNTSC to be able to receive and parse measurements via the collectd network protocol. This will allow us to start adding support for specific collectd metrics based on the requirements of our industry partners, particularly data that is collected using SNMP.
Spent a couple of days updating the latency event ground truth to include the two new detectors. Managed to get about half-way through the streams in the data set in that time, as I had made some minor modifications to other detectors that meant their detection results had also changed.
Much of Friday was spent investigating the Changepoint detector in more detail, as it had started giving a few new false positives. Still not sure whether this is a problem with our implementation or the underlying algorithm itself so this is going to require a bit more investigation, unfortunately.
This week I spent analysing the performance of the ordered combiner used to combine the output of multiple capture streams into one (for writing to a file for example). Richard had written a deque based on a doubly linked list which I had assumed could be improved with an array backed ring buffer. However I was unable to obtain the performance seen with the deque for unknown reasons.
Optimising the ring buffer involved isolating and testing to try to improve the insertion and removal speed. Even with no dynamic memory and no modulo operations, the ring buffer performed slower than the doubly linked list.
I also came up with the idea of using a heap to sort the entries, rather than using selection sort. This will involve keeping track of the smallest entry still left in the queues to ensure we don't pull any sorted values larger than that entry. I'm not sure if I'll continue with this, as the existing solution is already very good.
This week I have spent most of my time focusing on the statistical detection methods. It is widely considered unfeasible to maintain a full flow table for measuring flow statistics as high bit rate connections would not be able to process each packet in time.
On routers, techniques such as "Bloom Filters" can be used. This uses a hash of the 4-tuple (assuming TCP only) for each flow and using it in a hash map. Samples are taken at particular intervals, and if a flow appears in two samples it is declared as an elephant.
Spike detection is also used in the TCP case, which leverages the fact that the majority of mice flows complete during their Slow Start phase.
By monitoring the outgoing buffer of a router, and carefully selecting a threshold value it can be predicted whether the link is currently hosting an elephant flow.
It can do this before the buffer becomes full, due to the fact that once the threshold is reached the next transfer will contain twice the number of packets as the previous one.
This allows the device to react before any loss occurs, and can move the next burst of packets into a lower priority buffer.
I have also begun investigating machine learning techniques, including a paper which compares the performance of a number of simple classifiers from the WEKA suite.
Rather than simply comparing prediction accuracy (which is often very similar between classifiers) it also considers the time required to make a prediction, and the time required to build the classifier.
Next week I shall continue the investigation into machine learning techinques, including papers which propose solutions to detecting protocols as these could potentially be used to predict elephant flows.
Got the USB dongle packet sniffer working with SmartRF sniffer software to detect packets on my 6LoWPAN test bed.
Installed 6LBR (boundary router) onto a VM attached to a cc2538 device acting as a slip radio - this required hardware specific code as it wasn't supported by 6lbr, as well as updating the Makefile and source files to compile with cc2538. Also fiddled around with the configuration settings (RF channel, PAN ID, drivers) of my device running 6lbr-demo example code, 6lbr router (on VM) and slip radio to get everything working together.
To check the ethernet side of my boundary router I set up another VM (host VM) which could ping to the router VM and access the 6lbr router website. This displayed information about devices on the network, packets sent and an RPL (routing protocol for low-power and lossy networks) tree (DODAG). From the host VM I can ping packets to the device on the 6LoWPAN network using the address given for the device on the 6lbr website.