Updated the AMP dataparser in NNTSC to process more messages in a single batch before committing. This should improve speed when working through a large message backlog, as well as save on some I/O time during normal operation. This change required some modification to the way we handle disconnects and other errors, as we now have to re-insert all the previously uncommitted messages so we can't just disconnect and retry the current message.
Tried to bring our database cursor management in line with suggested best practice, i.e. closing cursors whenever we're done with them.
Improved exporting performance by limiting frequency calculations to the first 200 rows and using a RealDictCursor rather than a DictCursor to fetch query results. The RealDictCursor means we don't need to convert results into dictionaries ourselves -- they are already in the right format so we can avoid touching most rows by simply chucking them straight into our result.
Spent some time helping Meena write a script to batch-process her event data. This should allow us to easily repeat her event grouping and significance calculations using various parameters without requiring manual intervention. Found a few bugs along the way which have now been fixed.
Was planning to work the short week between Easter and Anzac day but fell ill with a cold instead.
I've been looking more deeply into Contiki this week in an attempt to become more familiar with the code. One issue I found was that Contiki only seems to support one of the five buttons on the MB950 (application board) out of the box, so I wanted to see if it were possible to fix this. It seems as though some effort has gone into writing the code such that it can be extended to support more buttons easily (for example, the button that is supported can be interchanged with any one of the other buttons), but it stops short of actually implementing multiple buttons, which is really strange. In any case, I should stop getting held up with this and move on to more relevant things. Over the next few days I will start playing with IPv6 communication and then revisit some of the initial research I did to determine which direction to take the project.
Just been writing code to set up the tests and investigating how to programmatically generate packet loss in a way that will give me the best accuracy in terms of timing.
I havent been able to find a better option than just calling tc from within python with os.system. I am unsure how much that is going to affect the level of accuracy or how I can determine how much it affects the accuracy. So that is a bit of a concern, but I am soldiering on in the mean time.
I also helped Brad with our new SDN some more too. It wont stay up for more than a couple of minutes, because there seem to be issues with the routeflow rules interfering with the hidden rules (which have higher priorities than the routeflow rules, and therefore should not be being interfered with). But the point is that when the neighbour resolution on the switches times outs it cant re-resolve the address of the controller, so it loses connection. It's a bit weird.
Put together some tests for the data structures I've been using within the parallel libtrace implementation. Tidied up a lot of compiler warnings.
Added a global flag to the library to track if the parallel API is being used to provide backwards comparability with existing code. The only place where this is a problem is destroying packets after the trace has already been destroyed. Now the existing tests pass again.
Finished purging the last of the SQLAlchemy code from NNTSC. Once that was working, I was able to create a new class hierarchy for our database code to reduce the amount of duplicate code and ensure that we handle error cases consistently across all query types.
Split insertion operations across two different transactions: one for stream-related operations and one for measurement results. This allows us to commit new streams and data tables without having to commit any data results, which is an important step towards better synchronisation between the database and the messages in the Rabbit queue.
Spent a lot of time tracking down and fixing various error cases that were not being caught and handled within NNTSC. A lot of this work was focused on ensuring that no data was lost or duplicated after recovering from an error or a database restart, especially given our attempts to move towards committing less often.
Migrating the prophet development database over to the new NNTSC schema on Thursday. Generally things went pretty smoothly and we are now turning our attention to migrating skeptic and the live website as soon as possible.
Tidied up some arbitrarily sized buffers in the icmp test to be the
actual size required for the data. Accidentally made them too small, so
fixed that and then wrote some more unit tests to cover the
sending/receiving of data and buffer management. Also updated the icmp
test to be able to short circuit the loss wait timeout once all data has
been accounted for - previously it was always waiting a minimum of
200ms, even if all responses had been received.
Spent some time examining query logs from the newly migrated test
database on prophet to see where slowdowns were now occurring. Found and
fixed a simple case where we were over-querying for data, and have a few
ideas for other places to look for more improvements.
Investigated how it might be possible to set DNS servers per process in
order to run multiple amplet clients on the same linux host without
putting them in individual containers. It isn't made obvious in libc how
to do this, but it seems to be possible by modifying some internal
resolver structures. If I set these right, then getaddrinfo() etc will
all work as normal except using the specified name server rather than
whatever is in /etc/resolv.conf. The alternative here seems to be
replacing the name resolution functions with another library or custom code.
Thanks to Richard I now have an STM32W RF Control Kit, which I had a chance to play around with a little bit this weekend. Spent some time looking through its documentation and eventually found Windows drivers for communicating with each component (the USB dongle and the application board) through a virtual COM interface. The boards run a simple "chat" application by default so you can see the RF communication between them by typing into one COM terminal and watching it appear at the other end. I tested flashing another couple of sample applications, in particular one that is mentioned in the documentation that contains a number of commands for testing functionality. (The LED commands didn't seem to actually control the LEDs, but otherwise it seemed to function as described in the docs so I assume I'm still on the right track...) All in all an interesting intro and next week I'll start looking into what it's going to take to get Contiki on to the boards.
I spent a lot of my time this week getting my project environment set up and familiarising myself with Ryu and Openvswitch. I've started with a very basic topology to work through, and so far I successfully have flows being learnt between a series of kvm hosts and multiple connected virtual switches. Once I'm comfortable enough with the environment, my goal is to work towards implementing a basic virtual network which will allow DHCP leases to be issued to hosts from an out-of-band DHCP server through the Ryu controller. This step should represent the first milestone of my project as I work towards distributing some of the existing functionality of a BRAS out to multiple controllers and switches.
Built new CentOS and Debian amplet packages for testing and deployed to
a test machine to check that both old and new versions of the transfer
format could be saved. After a bit of tweaking to the save functions
this looks to work fine.
Tested the full data path from capture to display, which included fixing
the way aggregation of data streams is performed for matrix tooltips.
Everything works well together, except the magic new aggregation
function fails in the case where entire bins are NULL. Will have to
spend some time next week making this work properly.
Wrote some more unit tests for the amplet client testing address
binding, sending data and scheduling tests. While doing so, found what
appears to be a bug in scheduling tests with period end times that were
shorter than hour/day/week.
Updated NNTSC to include the new 'smoke' and 'smokearray' aggregation functions. Replaced all calls to get_percentile_data in ampy with calls to get_aggregate_data using the new aggregation functions. Fixed a few glitches in amp-web resulting from changes to field names due to the switch-over.
Marked the 513 libtrace assignments. Overall, the quality of submissions was very good with many students demonstrating a high-level of understanding rather than just blindly copying from examples.
Modified NNTSC to handle a rare situation where we can try to insert a stream that already exists -- this can happen if two data-inserting NNTSCs are running on the same host. Now we detect the duplicate and return the stream id of the existing stream so NNTSC can update its own stream map to include the missing stream.
Discovered that our new table-heavy database schema was using a lot of memory due to SQLAlchemy trying to maintain a dictionary mapping all of the table names to table objects. This prompted me to finally rip out the last vestiges of SQLAlchemy from NNTSC. This involved replacing all of our table creation and data insertion code with psycopg2 and explicit SQL commands constructed programatically. Unfortunately, this will delay our database migration by at least another week but it will also end up simplifying our database code somewhat.