Continued looking through the Intel DPDK code for the best way to implement the format. A threads id assigned by DPDK is needed for memory pools to work if caches are enabled. A thread is assigned a number between 0-31 which is stored in TLS(__thread storage type), there is a corresponding 32 element array of local caches present in every memory pool used as local caches. It would be nicer if every memory pool had these caches stored in TLS but this would not be possible since they are dynamically created objects.
The two options here are to accept the performance hit and disable the caches (The cache memory would still be allocated). This seems like a very bad idea because this would mean every packet read requires a lock to the shared memory structure, rather than in the caching case where the cache will be filled from the main memory whenever required in bulk transactions. Instead it seems best (at least for the parallel implementation) to enable the cache.
This brings the second problem with the DPDK system, threads they are completely handled by DPDK including starting them. Under this system at most a single physical core can have a single software thread associated with it. This is an issue because the main thread is used to schedule operations rather than read packets meaning 1 thread cannot be used with DPDK.
There are a couple of ways with dealing with this, simply accept this as a limitation or create threads with the correct setup within libtrace.
So I'm working on solving all these problems by introducing a register thread function. That way we can register threads to the DPDK library, currently this seems like it would just be a matter of setting the thread id and binding to a CPU core. This would allow caches to be enabled, the reducer thread to be used and any other threads the user decides to add. Additionally this would allow us to include the existing message passing system with user created threads. In the future any other format that required initialisation of TLS would also be able to use this system.
These ideas of buffer handling and batch processing seem very worthwhile experimenting with within my libtrace codes memory handling.
I finished my last assignment for the semester this week (hooray) but after a few more days without any real success I sat down with Richard Sanger for several hours and we worked through the issue I'd been having communicating with my motes on their globally routable IP addresses. The mote would not respond to neighbour discovery solicitations so we first manually added an entry into the IPv6 neighbours table on the host PC to direct traffic to the mote, and then traced through the UDP server's code to see how it was assigning IP addresses internally. The conclusion we came to was that adding the address using the ADDR_TENTATIVE type rather than ADDR_AUTOCONF allowed the server to receive packets bound for its global address (we saw that the link-local IP was assigned as tentative), but the response sent by the server was not able to reach the host PC. The response packet appears to be dropped before it is even sent out of the server because the server has no route to the host PC.
I'm still baffled by the fact that this seems to work for pretty much everyone else straight out of the box, but having had Richard look at it with me now I'm at least confident that I'm not just overlooking something completely obvious. In our trawl through the code we were specifically looking for cases where its platform implementation might have differed from others (such as the Sky mote used as an example in the exercise I originally followed) but we found nothing to be suspicious of.
I also tested using netcat that I could communicate with the UDP server beyond just pinging (working with the link-local address again).
In any case, I'm now satisfied that I have looked at this in sufficient depth and can continue on to implementing my RPL border router. I suspect (and very much hope) that enabling RPL will just make this a non-issue anyway.
Over the weekend I set myself up in the lab and brought in the new motes that I hadn't touched before, but I realised I was unable to flash them because their firmware was too old. This wasn't something I'd come up against with my original motes because I had been playing around with them for a while and in the process some software had upgraded them automatically. The stm32w_flasher utility is supposed to do it, but the Linux version fails, so I had to do it from my Windows machine. At least it was worth figuring this out to be able to note later.
I'll be away next week (7-14 July) spending some time in various parts of the north island with family.
Libtrace 3.0.20 has been released today.
This release fixes several bugs that have been reported by users, adds support for LZMA compression to libwandio and adds an API function for getting the fragment offset for an IP packet.
The bugs fixed in this release are:
* Fixed broken snaplen option for ring: input.
* Fixed trace_get_source_port and trace_get_destination_port returning bogus port numbers when given a fragmented packet.
* Fixed timestamp byte ordering on big endian architectures.
* Removed assert failure if a bad compression level or method is provided when configuring an output trace. A libtrace error is raised instead.
* Fixed broken compiler feature checking in configure script. Compiler features are also detected for compilers other than gcc, e.g. clang.
* Fixed potential segfaults in OSPF libpacketdump parser if the packet is truncated midway through the OSPF header.
The OSPF bug fix unfortunately resulted in the 'len' field in the libtrace_ospf_t structure being renamed to 'ospf_len' -- if you are using libtrace to process OSPF packets, please make sure you update your code accordingly.
The full list of changes in this release can be found in the libtrace ChangeLog.
You can download the new version of libtrace from the libtrace website.
Started planning the best way to approach changing the traceroute test
to be faster and more network friendly. Making it more event driven and
sending packets when we know they have left the network should help
speed up the test, rather than probing in waves and having to wait at
each TTL for all responses. Before changing the test in this way it made
sense to move from the deprecated libwandevent2 to libwandevent3, which
I did. I've also made the first few changes in the traceroute test to
use an event based approach.
Read up a bit on doubletree and had a look into how some other
traceroute implementations dealt with it. Will hopefully be able to
apply some of the ideas around stop sets to the updated traceroute test
too. Tidied up a bit more low hanging fruit in the amplet packaging and
Spent some time proofreading reading student theses to provide feedback.
Carrying on from last week, storing a cache entry per stream turned out to be a bad idea. Some matrix meshes consist of 100s of streams so we spend a lot of time looking up cache entries. As a result, I rewrote the caching code to store one dictionary per collection, mapping stream ids to tuples containing the timestamps. This gets looked up once per query, so only one cache operation is required to generate a matrix.
Updating the cache when we have to query for missing values is a bit annoying, as we cannot simply update the dictionary and put it back in the cache once the query is complete as the data inserting process may have updated other cache entries with new 'most recent data' timestamps while we were fulfilling our query. Instead, we have to re-fetch the dictionary, update the one stream we're changing and then immediately store the dictionary again.
Updated ampy to no longer keep track of active streams and removed support for ACTIVE_STREAMS queries from the NNTSC protocol.
Merged Perry's lzma support into libwandio. Started working towards a new libtrace release -- managed to build and pass all tests on our various development boxes so should be able to push out a release next week.
Spent a day reading over Meenakshee's thesis. Suggested a series of mostly minor edits and changes but overall it is looking pretty good.
I spent a few days setting up the black hole in load balancer detector on Planetlab. Because there are two scamper drivers used, these are required to write and read the same file, and this did not work initially because of the permissions that are set automatically for a root owned process. I eventually changed from a linux managed service to using a script where the drivers were no longer root owned processes, and could conveniently open the files that they had created. The black hole detector looks for Paris Traceroute traces that are shorter than the original MDA traceroute and runs further Paris data collection from the second driver. I intend to run 20000 destinations from 15 sources every week or so.
The Doubletree simulator that uses MDA data as a source of simulated Traceroute data ran successfully, and was then extended to cover more factor combinations and restarted.
The Megatree simulator that uses the same data has had some many to few cases built in and is running again. It has run for somewhat longer than before, nearly a week so far. It seems likely that it will not be run in full but on selected factor combinations as they are developed and debugged.
Two programs to count destinations in warts data have been written and set going. One uses the MDA data and the other the CAIDA traceroute data from the website. These count destinations of traces including if the destination is repeated and how often this occurs.
Mostly writing, got brad to run a bunch of tests on the openstack cloud for me. But I havent done anything with the results yet. Chapters are coming together pretty well. Who know, I might even finish by the deadline.
Added more tests to hit some edge cases to ensure that things such as the hashing functions are applied correctly and that single threaded code path is sane. Part of this involved tidying up an edge case around pausing a trace, when 1+ thread/s have already finished and pause is called. Now this only waits for threads that haven't finished, and those finishing will signal a condition to ensure that thread current states are visible.
I ran the code through valgrind and tided a minor memory leak. Removed the hasher thread and the call to the hashing function from the singled threaded code path. Removed some configuration options that are no longer needed.
Starting looking into the DPDK format again, it looks like threads will keep a local cache of free buffers for reuse, I think this is bound to an internal DPDK thread id. This could be an issue with different threads trying to free buffers if they are not registered to DPDK. However this can be turned off. It sounds like extra threads can reasonably safely be introduced as long as they get a unique internal DPDK number.
I removed the DPDK restriction of one libtrace program per machine, now it is one per port, however still limited to one port either reading or writing (but not both) for a single running libtrace application.
I haven't progressed significantly further than last week. Still stuck at the moment with connecting to motes from the host PC, but after looking at the link with Wireshark (which I should have thought to do much earlier) I realised that I was receiving RPL DODAG Information Solicitations from the link-local IP of the mote, so I worked out that I can connect to the mote on its link-local address if I direct it through the tunnel interface. However, I still can't connect to the mote on its globally routable IP.
This is really frustrating since everything I've read has indicated that it should just work - except for this one mailing list thread authored by someone who appears to have an identical problem. Instead of solving the problem, the author found a working bridge in an application called uDPWS, which runs on top of Contiki. I might start by looking into this next week to see what the differences between the bridges are.
Fixed a crash when changing the name of test processes, where getopt was
being unhappy after having argv changed underneath it, despite being
given a different array to operate on after forking. Logging has also
been made more sensible, with all amp processes using a fixed prefix
rather than using the full process name.
Spent some time comparing results of the new timestamping mechanism
against iputils ping. Timestamps are looking much more stable now in all
situations. There was a consistent small offset between the amp and ping
values, which appears to mostly be due to one timestamping packets
immediately before sending them and the other immediately after.
Changing amp to record timestamps at the same time as ping removes this
offset. Testing between a pair of hosts directly connected at gigabit
gives very similar results for both approaches, with identical quartiles
and only 0.2 microseconds difference in mean.
Tidied up packaging scripts for Debian and Centos, removing some default
configuration files that were being installed but are no longer needed.
Updated Centos init scripts to be more similar to the new Debian ones
that allow multiple clients to be run.