User login

Blogs

02

Mar

2018

Libtrace 4.0.3 has been released.

New features in this release include:
* A new capture format (dpdkndag:), which allows nDAG records to be intercepted and decoded using a DPDK interface.
* The message queue data structure API is now publicly exported, so can be used in user code.
* The toeplitz hasher data structure API is now publicly exported, so can be used in user code.
* Added a new API function: trace_get_perpkt_thread_id(), which will return the ID number of the running packet processing thread.
* Upgraded DAG code to use the 64 bit API, so libtrace will work with large streams.

The following bug fixes are also included:
* ERF provenance records will no longer cause libtrace to halt.
* Captures from GRE tunnel interfaces should now work correctly.
* Packets captured using DPDK will no longer lose any payload after the first 1024 bytes.
* Fixed a couple of nDAG packet corruption problems.
* Ensure all key fields are correctly initialised when doing DPDK output.
* Fixed assertion failure when libwandio has an unexpected error.

We've also further improved the performance of the nDAG format.

The full list of changes in this release can be found in the libtrace ChangeLog.

You can download the new version of libtrace from the libtrace website.

20

Feb

2018

Carrying on from last week, I finished creating a multi-link failure scenario for a Fat Tree topology of k=4. I then collected recovery time stats which I have cleaned up and graphed. While collecting the recovery stats for the topology I found a bug, which I have fixed, that was causing the VM to crash. The simulation framework stops the pktgen generation process by sending a SIGINT to its PID. This PID was occasionally incorrect due to the way it was being recorded, causing termination of a process that was critical to the system, thus crashing the VM.

Closer to the end of the week I started investigating further, separating the switches into Mininet namespaces. I found out that this feature is not supported by Mininet as OpenVSwitch needs to be exposed in order to establish a connection to the OpenFlow controller. The only way to fix this behaviour would be to modify Mininet itself which doesn't seem like such a good idea at this point time. At the end of the week, I started looking at adding latency to the control channel to be able to better simulate real network conditions.

13

Feb

2018

Spent most of the short week trying to track down some issues that were preventing RabbitMQ shovels from connecting after an erlang upgrade. The issue appears to be around Server Name Indication (SNI) being enabled and the SSL upgrade taking place on an already connected socket, so only the peer address is available and not the peer name. I don't appear to be able to use SNI directly with the shovel parameters but I can set it for the erlang RabbitMQ client that gets used for the shovel.

13

Feb

2018

Continued to integrate the AMP Chromium test better into the rest of the AMP framework, specifically into the build system so that the appropriate Chromium libraries can be specified and found.

Spent most of the week in Queenstown at the NZNOG conference. Worked on Faucet briefly at the Facebook Hackathon on Wednesday, with the conference proper the rest of the week.

13

Feb

2018

Started integrating the chromium test into AMP, which means being able to run it both standalone (providing my own main function) and as part of the AMP client, and reporting data in the correct protobuf formats.

The way chromium forks and executes itself repeatedly (zygote processes) caused some confusion around why argument parsing was failing, as it was passing through the getopt multiple times with unexpected arguments. It now accepts and ignores the arguments used with zygote processes, letting them pass through to headless chromium.

Found that the timings available to javascript had improved since I last looked, and that I could now fetch the information for the initial page in exactly the same format as the objects in the page, saving a heap of near-duplicate code and getting more accurate information.

13

Feb

2018

Finally successfully linked the chromium libraries with my own test runner after adding a few more compiler flags to match those used by chromium. After getting it linked and running, found it would crash on the first callback made when the page completed loading, complaining about missing vector functions.

Had to rebuild my chromium source as I had been using a debug build, which generated debug versions of some STL containers and caused crashes when other parts of code expected regular containers. It now links and runs and outputs useful data!

13

Feb

2018

Spent the short week trying again to get a useful set of Chromium libraries that I can link my own AMP tests against. Digging into the ninja build configuration I've extracted a list of all the useful libraries that go into building a headless program, as well as the build and link flags that I need to use. Still having issues with the final step where I link with my code, but it's getting much closer to working.

13

Feb

2018

Found and fixed a bug in the BGP prefix code that meant sorting/comparison of prefixes wasn't using some of the attributes that were important for determining differences. Removed some unnecessary special cases and code paths when exporting routes from a peer, which makes the code less complicated and metric collection easier. Added more metrics tracking when route import/export last occurred, how long updates are taking, etc.

12

Feb

2018

This week I spent some time looking at larger topologies that are common in datacenter and carrier networks. I implemented a module which allows creating a dynamic k-arry FatTree topology in Mininet (common in DC networks). I then modified the controllers to find host location in the topology dynamically. This used to be statically specified, however, when implementing the new FatTree topo the pre-specified location no longer worked. Host discovery uses a similar mechanism and approach to RYUs link discovery (LLDP packets) without a liveness check mechanism. So we will only use the packets to figure out where the hosts are in our topology.

I then spent the remainder of the week looking at writing failure scenarios for a FatTree topology of k=4. I have worked on several diagrams that show how path splicing will behave for specific link failure scenarios and also the paths taken for the reactive controller. I am currently in the processing of finishing off a multi-link failure scenario for the new topology and then collecting and processing recovery stats for it.

07

Feb

2018

This week I modified the simulation framework to allow me to perform multi-link timing tests. The failure config file has been modified to allow specifying multiple prober locations. After a bit of initial planning and testing, when dealing with multi-link failures we need to be careful when selecting the logger locations as we may not be able to time that particular link failure. Multi-link failures are now performed sequentially, i.e we take down the first link and time it, then we take down the next and so on.

I then modified the disabled multi-link failure scenario, collected recovery times for it, cleaned up and processed the stats. With this, I have completed the set of recovery time stats for all current network and failure scenarios I have created.