User login

Brendon Jones's blog

13

Feb

2018

Spent most of the short week trying to track down some issues that were preventing RabbitMQ shovels from connecting after an erlang upgrade. The issue appears to be around Server Name Indication (SNI) being enabled and the SSL upgrade taking place on an already connected socket, so only the peer address is available and not the peer name. I don't appear to be able to use SNI directly with the shovel parameters but I can set it for the erlang RabbitMQ client that gets used for the shovel.

13

Feb

2018

Continued to integrate the AMP Chromium test better into the rest of the AMP framework, specifically into the build system so that the appropriate Chromium libraries can be specified and found.

Spent most of the week in Queenstown at the NZNOG conference. Worked on Faucet briefly at the Facebook Hackathon on Wednesday, with the conference proper the rest of the week.

13

Feb

2018

Started integrating the chromium test into AMP, which means being able to run it both standalone (providing my own main function) and as part of the AMP client, and reporting data in the correct protobuf formats.

The way chromium forks and executes itself repeatedly (zygote processes) caused some confusion around why argument parsing was failing, as it was passing through the getopt multiple times with unexpected arguments. It now accepts and ignores the arguments used with zygote processes, letting them pass through to headless chromium.

Found that the timings available to javascript had improved since I last looked, and that I could now fetch the information for the initial page in exactly the same format as the objects in the page, saving a heap of near-duplicate code and getting more accurate information.

13

Feb

2018

Finally successfully linked the chromium libraries with my own test runner after adding a few more compiler flags to match those used by chromium. After getting it linked and running, found it would crash on the first callback made when the page completed loading, complaining about missing vector functions.

Had to rebuild my chromium source as I had been using a debug build, which generated debug versions of some STL containers and caused crashes when other parts of code expected regular containers. It now links and runs and outputs useful data!

13

Feb

2018

Spent the short week trying again to get a useful set of Chromium libraries that I can link my own AMP tests against. Digging into the ninja build configuration I've extracted a list of all the useful libraries that go into building a headless program, as well as the build and link flags that I need to use. Still having issues with the final step where I link with my code, but it's getting much closer to working.

13

Feb

2018

Found and fixed a bug in the BGP prefix code that meant sorting/comparison of prefixes wasn't using some of the attributes that were important for determining differences. Removed some unnecessary special cases and code paths when exporting routes from a peer, which makes the code less complicated and metric collection easier. Added more metrics tracking when route import/export last occurred, how long updates are taking, etc.

15

Dec

2017

Finished updating the RouteEntry code to save and load to/from a raw buffer and put it into testing. The time to transmit a million routes between processes dropped from 20+ seconds to less than a second. Memory usage also shrunk massively such that all million routes could be sent at once, which was not possible when using pickle.

Made some other improvements to memory usage by no longer storing copies of the routes where not required (and it's easier to ask a peer to resend them), and by storing routes in simple lists when a more complicated data structure isn't actually required.

The BGP router still uses more memory than I would like, and takes longer to do things than I would like, but it is now much improved. Half of the time taken is now spent waiting on ExaBGP to send me all routes, and there are still plenty of inefficiencies to fix in the way filtering and fixing of routes happens.

15

Dec

2017

Continued to investigate the performance of the BGP router and discovered that a very significant amount of time is spent pickling and unpickling routes to send between processes. Using a newer version of python allows me to modify how messages sent through multiprocessing queues get serialised, so I experimented with using protocol buffers, json, and a few other approaches to see what might work. Everything I tested was still too slow or memory intensive when dealing with a million route entries. Decided that the best approach was to store all the routes in one "bytes" field in a protocol buffer message (rather than having each route as a distinct part of the message) and to write just the relevant parts of the route entry straight into the buffer. Started work on implementing this.

15

Dec

2017

Found and fixed a couple of small bugs in the pickling implementation of the RouteEntry class used in the BGP router. Updated the unit tests to check that prefixes and route entries could be correctly pickled and unpickled. Also found and fixed various small issues that didn't show up in testing, but did when exposed to real BGP implementations and a more diverse set of routes (more tests required!).

Started work on getting useful performance numbers around how long it takes to process and distribute routes, so merged the testing prometheus code I had previously written and expanded it to cover more of the interesting parts of the code. Every time routes are touched (importing, exporting, filtering, etc) the time that took is recorded and available to query. So far it looks like most of the time spent is outside of my main functions and in other places - moving data around between processes.

15

Dec

2017

Had another look into building an AMP test using headless Chrome to measure web (particularly YouTube) performance. I can get my code building within the Chrome build system, but I really want to create a library that I can link my own code against, and nothing like that gets built. They claim it does, but those libraries are missing most of the symbols I need, so still need to look into this further.

Found and fixed an issue around amplet2-client cert fetching failing after a certain number had been issued. Turned out to be a simple type issue and comparisons were being made using the wrong type, thus sorting incorrectly and returning an incorrect certificate.

Spent some time writing installation documentation for the AMP server components and adding it to the github wiki.