I put 6lbr on the Raspberry Pi this week, following their fairly simple hello world guide, which involves connecting the Pi with a remote host directly via Ethernet. I can access the web server exposed on the 6lbr host, but 6lbr still seems unable to find or communicate with other nearby motes. I have been following the guides as closely as possible to try and eliminate extra variables - interestingly to start with, I tried on a regular network (redcables) which didn't work, so will come back to that at some point later.
The slip-radio application used by the tunnel mote compiles fine out of the box, although I needed to hardcode in its RF channel (supposedly it can be changed from within the web app, but that seems to have no effect as evidenced in the logs). The 6lbr-demo application required a slight modification so that it wouldn't overflow the RAM on the device: essentially it needs to be added to a list of known low-resource devices, but for now I've hardcoded in the low-resource values.
After more research manually looking through mailing list messages I found that one other person has tried using the stm32w chip with 6lbr and ran into similar problems. It is suggested that a platform-dependent command handler method needs to be written but ultimately no further progress is documented. From what I could tell, the command handler is only used to change the RF channel and PAN ID at runtime, and these should already be set to the correct values anyway. Implementing the command handler looked reasonably trivial so to confirm that not having the handler wasn't causing my problem, I implemented it myself. It looks like it is working, and indeed hasn't changed anything.
I've sent an email to the 6lbr mailing list to check what stage support for stm32w is at and to see whether it's an issue. Soliciting help seems like the best way forward right now!
The results from the black hole in load balancer detector on Planetlab were downloaded using an automated process. Warts analysis was also carried out in an automated fashion. The results were promising as once again there appear to be black holes occurring in load balancers. In some cases these are very transitory i.e. Occurring in only one Paris trace. I probably need to find a way to start counting these different discontinuity scenarios in this ICMP probed data.
I have written code to analyse windows of many sources. In particular I am only using sources that occur more than twice. A window is a number of sources that run simultaneously, before another group of the same number run. This means that the Doubletree and Megatree analyses running from a given source are likely to see come of the same nodes or load balancer more than once. This is like the situation that we expect to see in Atlas and so it is a desirable simulation. An advantage of these modes is that they are likely to finish fairly quickly.
There are a number of non event based Doubletree and Megatree simulations running that are taking some time. Hopefully they are bug free so that they will only have to run the once. I have saved some time by writing files of the 'many' set destination addresses that occur more than once and more than twice. This avoids a lot of sorting when each address is added to a data array, one by one, when this information is used in analysis.
Finished updating the traceroute test to use libwandevent3 to schedule
packets and track timeouts. The aim was to make each action
self-contained and easily understandable, to aid in adding the extra
complexity of stop sets and AS lookups later on. Modified the probing
algorithm to start partway through the path and probe forward, then
probe backwards from the initial point - we can probe forward into paths
that we likely haven't seen before, and then stop probing on the reverse
when we see familiar addresses.
Spent some more time reading student theses to provide feedback.
Released libtrace 3.0.20 on Monday.
Got as much of our development environment up and running again after the power outage over the weekend. There are still a few non-critical things that needed some assistance from Brad that I wasn't able to get going on Monday, but they can wait until next week when we're both here.
On leave from Tuesday for the rest of the week.
The black hole detector is running and will need to have the data sets downloaded soon. This collects load balancer data at the beginning and end of the first drivers run and at the end of the second drivers run. The second driver is only triggered for a given destination address, when Paris traceroute runs between the MDA load balancer Traceroute runs stop short of the destination. Because I am running the detector on 15 Planetlab nodes I am looking for ways to automate repeated download and analysis tasks.
Because Spectre and Wraith were shut down on Saturday, I restarted the scamper warts analysis simulator jobs that I had running. This however allowed me to adopt some recent changes to these analyses. In particular I have been adding analysis of windows of many sources. In this mode it is not always possible to get local results as I am not able to use the same sources repeatedly on a consistent basis. These many sources are actually the destinations in the warts data sets, as the data is used in reverse direction.
The scamper warts analysis based Internet simulator for Doubletree has been adapted to the data set that the event based Internet simulator uses. This was collected by Caida with Paris Traceroute using ICMP probes. It is currently being debugged and once this is done a run will be carried out.
It now time for my six monthly report. This has been drafted and loaded onto the website. Once my chief supervisor has had a look at it, it will submitted.
Continued looking through the Intel DPDK code for the best way to implement the format. A threads id assigned by DPDK is needed for memory pools to work if caches are enabled. A thread is assigned a number between 0-31 which is stored in TLS(__thread storage type), there is a corresponding 32 element array of local caches present in every memory pool used as local caches. It would be nicer if every memory pool had these caches stored in TLS but this would not be possible since they are dynamically created objects.
The two options here are to accept the performance hit and disable the caches (The cache memory would still be allocated). This seems like a very bad idea because this would mean every packet read requires a lock to the shared memory structure, rather than in the caching case where the cache will be filled from the main memory whenever required in bulk transactions. Instead it seems best (at least for the parallel implementation) to enable the cache.
This brings the second problem with the DPDK system, threads they are completely handled by DPDK including starting them. Under this system at most a single physical core can have a single software thread associated with it. This is an issue because the main thread is used to schedule operations rather than read packets meaning 1 thread cannot be used with DPDK.
There are a couple of ways with dealing with this, simply accept this as a limitation or create threads with the correct setup within libtrace.
So I'm working on solving all these problems by introducing a register thread function. That way we can register threads to the DPDK library, currently this seems like it would just be a matter of setting the thread id and binding to a CPU core. This would allow caches to be enabled, the reducer thread to be used and any other threads the user decides to add. Additionally this would allow us to include the existing message passing system with user created threads. In the future any other format that required initialisation of TLS would also be able to use this system.
These ideas of buffer handling and batch processing seem very worthwhile experimenting with within my libtrace codes memory handling.
I finished my last assignment for the semester this week (hooray) but after a few more days without any real success I sat down with Richard Sanger for several hours and we worked through the issue I'd been having communicating with my motes on their globally routable IP addresses. The mote would not respond to neighbour discovery solicitations so we first manually added an entry into the IPv6 neighbours table on the host PC to direct traffic to the mote, and then traced through the UDP server's code to see how it was assigning IP addresses internally. The conclusion we came to was that adding the address using the ADDR_TENTATIVE type rather than ADDR_AUTOCONF allowed the server to receive packets bound for its global address (we saw that the link-local IP was assigned as tentative), but the response sent by the server was not able to reach the host PC. The response packet appears to be dropped before it is even sent out of the server because the server has no route to the host PC.
I'm still baffled by the fact that this seems to work for pretty much everyone else straight out of the box, but having had Richard look at it with me now I'm at least confident that I'm not just overlooking something completely obvious. In our trawl through the code we were specifically looking for cases where its platform implementation might have differed from others (such as the Sky mote used as an example in the exercise I originally followed) but we found nothing to be suspicious of.
I also tested using netcat that I could communicate with the UDP server beyond just pinging (working with the link-local address again).
In any case, I'm now satisfied that I have looked at this in sufficient depth and can continue on to implementing my RPL border router. I suspect (and very much hope) that enabling RPL will just make this a non-issue anyway.
Over the weekend I set myself up in the lab and brought in the new motes that I hadn't touched before, but I realised I was unable to flash them because their firmware was too old. This wasn't something I'd come up against with my original motes because I had been playing around with them for a while and in the process some software had upgraded them automatically. The stm32w_flasher utility is supposed to do it, but the Linux version fails, so I had to do it from my Windows machine. At least it was worth figuring this out to be able to note later.
I'll be away next week (7-14 July) spending some time in various parts of the north island with family.
Libtrace 3.0.20 has been released today.
This release fixes several bugs that have been reported by users, adds support for LZMA compression to libwandio and adds an API function for getting the fragment offset for an IP packet.
The bugs fixed in this release are:
* Fixed broken snaplen option for ring: input.
* Fixed trace_get_source_port and trace_get_destination_port returning bogus port numbers when given a fragmented packet.
* Fixed timestamp byte ordering on big endian architectures.
* Removed assert failure if a bad compression level or method is provided when configuring an output trace. A libtrace error is raised instead.
* Fixed broken compiler feature checking in configure script. Compiler features are also detected for compilers other than gcc, e.g. clang.
* Fixed potential segfaults in OSPF libpacketdump parser if the packet is truncated midway through the OSPF header.
The OSPF bug fix unfortunately resulted in the 'len' field in the libtrace_ospf_t structure being renamed to 'ospf_len' -- if you are using libtrace to process OSPF packets, please make sure you update your code accordingly.
The full list of changes in this release can be found in the libtrace ChangeLog.
You can download the new version of libtrace from the libtrace website.
Started planning the best way to approach changing the traceroute test
to be faster and more network friendly. Making it more event driven and
sending packets when we know they have left the network should help
speed up the test, rather than probing in waves and having to wait at
each TTL for all responses. Before changing the test in this way it made
sense to move from the deprecated libwandevent2 to libwandevent3, which
I did. I've also made the first few changes in the traceroute test to
use an event based approach.
Read up a bit on doubletree and had a look into how some other
traceroute implementations dealt with it. Will hopefully be able to
apply some of the ideas around stop sets to the updated traceroute test
too. Tidied up a bit more low hanging fruit in the amplet packaging and
Spent some time proofreading reading student theses to provide feedback.
Carrying on from last week, storing a cache entry per stream turned out to be a bad idea. Some matrix meshes consist of 100s of streams so we spend a lot of time looking up cache entries. As a result, I rewrote the caching code to store one dictionary per collection, mapping stream ids to tuples containing the timestamps. This gets looked up once per query, so only one cache operation is required to generate a matrix.
Updating the cache when we have to query for missing values is a bit annoying, as we cannot simply update the dictionary and put it back in the cache once the query is complete as the data inserting process may have updated other cache entries with new 'most recent data' timestamps while we were fulfilling our query. Instead, we have to re-fetch the dictionary, update the one stream we're changing and then immediately store the dictionary again.
Updated ampy to no longer keep track of active streams and removed support for ACTIVE_STREAMS queries from the NNTSC protocol.
Merged Perry's lzma support into libwandio. Started working towards a new libtrace release -- managed to build and pass all tests on our various development boxes so should be able to push out a release next week.
Spent a day reading over Meenakshee's thesis. Suggested a series of mostly minor edits and changes but overall it is looking pretty good.