User login

Blogs

10

Jul

2017

Brought the network/topology management into line with the new multi-process model. The topology can now be modified on the fly, which will generate messages to various peers allowing them to re-run their route selection algorithms to take the changes into account.

Removed the distinction between internal and external nodes, and reworked the classes to allow multiple node types, depending on how we want to manage them. We could speak BGP to some, Openflow (or other SDN type protocol) to others, etc.

Spent some more time looking at improving memory usage. Found what I consider a bug in the python multiprocessing queue implementation - a reference to the last message sent to the queue is kept around when it is no longer required, continuing to use memory. I send small infrequent but large messages, which means a lot of memory could be tied up for quite some time for no useful purpose.

28

Jun

2017

Libprotoident 2.0.11 has been released.

Firstly, this release updates the existing tools to be compatible with both libflowmanager 3 and parallel libtrace. This means that the tools can now take advantage of any parallelism in the traffic source, e.g. streams on a DAG card or a DPDK-capable NIC.

Secondly, we've added 61 new application protocols to our set of detectable protocols, bringing the total supported number of applications to 407. A further 25 existing protocols have been updated to better match new observed traffic patterns.

Finally, there have been a couple of minor bug fixes as well.

Note that this release will require both libflowmanager 3 and libtrace 4, which means that you will likely have to upgrade these libraries prior to installing libprotoident 2.0.11. If this is problematic for you but you still want the new application protocol rules, you can use the '--with-tools=no' option when running ./configure to prevent the tools (which are the reason for the upgraded dependencies) from being built.

The full list of updated protocols can be found in the libprotoident ChangeLog.

Download libprotoident 2.0.11 here!

12

Jun

2017

Spent most of the week working on the BGP program. Had a bit of a general tidy up and reorganisation of the class hierarchy, and updated unit tests to match the changes.

All the various copies of routing tables are now stored on disk when not actively being modified to try to save on memory usage (and to make recovery easier in the future if BGP connections are interrupted). The Python garbage collector generally doesn't seem keen to return memory to the operating system though, so processes still end up with something of a high water mark for memory usage but this does improve it slightly.

Finished adding a command interface to allow updating filters on the fly, as well as any other operations we want to add (could inject crafted BGP messages, swap out parts of the decision process etc). Peer objects rerun the filters over received routes as they are updated. Peer objects also now run a BGP decision process to determine the best routes to export, and make sure that their own ASN is not in the path.

12

Jun

2017

Try to save some more memory in my ipaddress module by calculating netmasks as required rather than storing them. Storing the AS path as an array rather than a list can also save considerable amounts of memory in my route entries.

Decided it was easier to send the full current state of routes between peers and VRFs rather than incremental updates. It means the state is always up to date and we don't need to keep track per peer or VRF when there are one-to-many relationships and peers might come and go at different times. Passing 1 million routes between processes takes milliseconds which is plenty fast enough.

Fixed a bug in the equality/hashing functions for route entries that meant they would never match and so all routes were being withdrawn and re-advertised to peers any time there were changes to be made.

12

Jun

2017

Moved the peer and VRF route management out into a separate process for each individual one, so that they can process filters etc in parallel without blocking other peers/VRFs. They are all self-contained now and operate via message passing - BGP commands or routes come in, which are processed and then sent on as further BGP commands or route lists.

Spent some time tracking down the cause of ports not being reused correctly in the AMP throughput test. When run through the server, with IPv4 and IPv6 available, it was not properly closing the socket for the unused address family once a client connected so the test port would still be in use when it later tried to restart the connection to test in the other direction. It should now make sure that only the address family in use has a socket bound to the test port.

12

Jun

2017

Replaced the python ipaddress module in my BGP program with my own very minimal one to reduce memory usage. Replaced some empty sets/lists that may not ever have data with "None" by default as empty data structures are quite memory heavy when you have millions of them. Also updated a couple of heavily used classes to use slots (explicitly stating the attributes) rather than leaving it open ended (and using more memory).

Started looking at adding a command interface to allow updating filters and receiving external measurements or metadata about how we should be routing traffic. The current event loop around I/O doesn't really support this (and has other issues about deadlocking with exabgp) so needed to be rewritten. All exabgp reading and writing now happens in the same place, and in a different thread to the command interface and route management.

02

Jun

2017

Libflowmanager 3.0.0 has been released today.

The libflowmanager API has been re-written to be thread-safe (and therefore compatible with parallel libtrace), hence the major version number change.

The old libflowmanager API has been removed entirely; there is no backwards compatibility with previous versions of libflowmanager. If you choose to install libflowmanager 3 then you will need to update your existing code to use the new API. This should not be too onerous in most cases, as most of the old global API functions have simply been replaced with method calls to a FlowManager class instance. The README and example programs demonstrate and explain the new API in detail.

Note that much of our other software that relies on libflowmanager, such as the libprotoident tools and lpicollector, have NOT yet been officially released with libflowmanager 3 support. If you are currently using any of this software, you should continue to use libflowmanager 2.0.5 until we are able to test and release new libflowmanager 3 compatible versions.

You can download both libflowmanager 3 and libflowmanager 2.0.5 from our website.

16

May

2017

Tidied up some unusual entry points in ampweb that web crawlers were hitting (to return proper error codes rather than broken templates) and tried to block a few of them with robots.txt. Fixed YAML schedule generation to not include tests where the source is an explicit destination (we previously removed the source from the mesh description, but missed this case). Spent some time trying (unsuccessfully) to fix some edge cases in the graph browser modals where previously set values weren't being set correctly.

Put together new releases for all the ampweb components, got them up on github and deployed to a test site. Also updated documentation to make it clearer that all these components work together and aren't independent.

Spent some time profiling the memory usage of my BGP program, after having got the runtime down to a reasonable level. I don't have enough memory to announce a million prefixes to all my peers as well as maintaining internal routing for them all. Found some significant savings by not instantiating my set variables until they were required (an empty set in python is 232 bytes!), but still need to lower the usage.

16

May

2017

Added options to the ampweb mesh configuration to allow setting the individual tests that should be visible in the matrix view, and whether the mesh is a source for these tests or not. Previously we tried to guess and enable these as tests were scheduled, but this led to issues when meshes were created for convenient display grouping and didn't actually run any tests, and it was not transparent why things were behaving the way they did. This is now an entirely manual process that the user has full control over. Also fixed a bug in the matrix display that meant the throughput matrix was losing configuration options when switching between it and other tests, and updated the URL validity checks to match the new formats so that our URLs are now pushed into browser history.

Spent some more time investigating why the BGP code was so slow. Most of the time appears to be spent copying route entries, so I rewrote the deepcopy function for that class to be much more efficient, and also reduced the number of locations where route entries were copied. Replaced some dictionaries with defaultdict structures that remove the need to check for key presence (in very large data structures) before taking actions. I can now import 1 million routes from a peer in under 60 seconds, including running them through a number of simple filters and copying them into a number of VRFs. Exporting these routes takes more memory than I have available however, which will be a job for next week.

16

May

2017

Added interface elements to ampweb enabling the scheduling of a normal or HTTP POST style throughput test, as well as the database/ampy support to make this work. Updated the graph browser to allow selecting and displaying throughput tests of both sorts as well. Spent some time trying to add support for the new tests to the matrix as well, but we haven't done this particular sort of data split before and it's not immediately clear the best way to go about this.

Started to look into why my BGP code is scaling so poorly at 10,000+ prefixes.