Shane Alcock's Blog
Spent my week in San Diego attending the BGP hackathon and the AIMS workshop.
The hackathon went really well. I was so intimidating that nobody wanted to join my team, but I still managed to add a lot of useful filtering capabilities to CAIDA's BGPStream software. Will try to write a more detailed blog post on what I did at some point, but it was enough to win myself a prize for being one of the top teams.
The AIMS workshop was also very valuable, as there was definitely some interest in what we have been doing with both AMP and NNTSC. In particular, it seems that AMP might have some value for some big ISPs outside of New Zealand. Looking forward to seeing what comes from the discussions I had with various workshop attendees.
Finished prep for AIMS and hackathon. Gave a practice run of both of my talks on Tuesday and got some useful feedback. Incorporated this into the talks on Wednesday.
Flew out to San Diego on Thursday and had a quiet day to rest and recover on Friday before the hackathon on the weekend.
Continued prepping for the trip to San Diego. Wrote a talk on AMP to present at AIMS, since Brendon won't be able to attend. Managed to finally settle on a project that I'll be working on at the BGP hackathon: adding useful filtering to the BGPstream software.
Met with Harris to talk about the CSC dataset and how he can go about looking for interesting events in the dataset. Wrote some example code to extract a metric from the data (syscalls per second for each major type) and added a module to NNTSC for the new metric. Hopefully Harris will be able to use that to start adding his own metrics.
Noticed that there was a lot of variation in my rtstats performance test results. It seems that the achievable packet rate for ring: seems to fluctuate from test to test, but will remain constant within a test. For example, one test run I'll get 1 million packets per second consistently for 60 seconds, the next test run I'll get 40,000 packets per second for 60 seconds. Spent a lot of time looking into this further (including an afternoon with Richard S. trying out various things), but we're still unable to account for this inconsistency.
Ran some full experiments with the stats and rtstats workloads (using ring: to capture) to make sure the numbers match up with what Richard was seeing in his earlier tests. So far, we're getting the expected behaviour: adding more threads makes stats perform worse (due to threading overhead outweighing any performance gain), but helps with rtstats. Wrote the section in the paper that describes our evaluation methodology, so now I just need to fill it in with some results!
Wrote my talk on NNTSC for AIMS. It's a 10 minute talk that is meant to provoke some discussion, so it is pretty light on implementation details. At least I hope people will come away from the talk knowing that there are some battles in this space that we've already fought so they won't repeat our mistakes.
Helped Andy get started with NNTSC so he can try implementing some InfluxDB support for storing data. The idea so far is to keep postgres around for doing the things it does well (streams, traceroute data) and use Influx for the rest.
Tracked down a segfault in the Ostinato drone whenever I tried to halt packet generation on the DPDK interfaces. This took a lot longer than it normally would have, since valgrind doesn't work too well with DPDK and there are about 10 threads active when the problem occurs. It eventually proved to be a simple case of a '<=' being used instead of a '<', but that was enough to corrupt the return pointer for the function that was running at the time, causing the segfault.
Once I fixed that, I was able to write some scripts to orchestrate sending packets at specific rates for a period of time, while having a libtrace program running at the other end of the link trying to capture and process these packets. Once the packet generation is over, the libtrace program is halted. This will form the basis of my experiments to determine how much traffic we can capture and process with parallel libtrace. The experiments will use different capture methods (ring, DAG, DPDK, PF_RING etc), different packet rates, different numbers of processing threads (from 1 - 16) and different workloads ranging from just counting packets to cryptopan anonymisation.
My initial tests have shown that the numbers of dropped packets are not particularly consistent across captures with otherwise identical parameters, so I'll have to run each experiment multiple times so that I can get some more statistically valid results.
Also spent a bit of time helping Brendon capture some traces of his ICMP packets to help figure out whether his timing issues are network-based or host-based.
Added a new graph type to the AMP website for showing loss as a percentage over time. This graph is now shown when clicking on a cell in the loss matrix, as well as being able to be accessed through the graph browser. Fixed a complaint regarding the matrix where clicking on a cell in an IPv4 only matrix would take you to a graph showing lines for both IPv4 and IPv6 so you would never get the smokeping-style colouring via the matrix.
Started messing around with ostinato scripting on the 10g dev boxes and using DPDK to generate packets at 10G rates. Had a few issues initially because I was using an old version of the DPDK-enabled ostinato that Richard had lying around; updating to Dan's most recent version seemed to fix that.
Spent a bit of time looking at the data collected during the CSC and how it might be able to be used as ground truth for developing some security event detection techniques.
Finished up the implementation chapter of the libtrace paper. Added a couple of diagrams to augment some of the textual explanations. Got Richard S. to read over what I've got so far and made a few tweaks based on his feedback.
Spent a decent chunk of time looking at Unknown UDP port 80 traffic in libprotoident. Found a clear pattern that was contributing most of the traffic, which I traced back to Tencent. Unfortunately Tencent publishes a lot of applications so that knowledge wasn't conclusive on its own.
My initial suspicion was that it might have been game traffic so I downloaded and played a few popular multiplayer games via the Tencent games client, capturing the network traffic and comparing it against my current unknown traffic. No luck, but then I had the bright idea to look a bit more closely at video call traffic in WeChat (a messaging app). Sure enough, once I was able to successfully create two WeChat accounts and get a video call going between them, I started seeing the traffic I wanted.
Also added rules for Acer Cloud and OpenTracker over UDP.
Wrote a skeleton for a centralised collector of progger data for Harris to start filling in with actual useful code.
Continued writing up the implementation chapter of the libtrace paper. It's turning out to be a pretty long paper, as there are a lot of design decisions that warrant discussion (memory management, combiners, hashers etc.).
Succumbed to my head cold on Thursday, so had a day at home to rest and recover.
Started writing some content for the parallel libtrace paper. Managed to churn out an introduction, a background and a little bit of the implementation section.
Fixed a couple of bugs in netevmon prior to the deployment: crashing when trying to reconnect to a restarted NNTSC and some confusing event descriptions for changepoint events.
Finished setting up a mobile app test environment for JP. I've configured my old iPhone to act as an extra client for 2-way communication apps (messaging etc.). So far the environment has already been helpful, as we've managed to identify one of the major outstanding patterns as being used by the Taobao mobile shopping app.
Finished up the demo for STRATUS forum and helped Harris put together both a video and a live website.
Spent a bit of time trying to fix some unintuitive traceroute events that we were seeing on lamp. The problem was arising when a normally unresponsive hop was responding to traceroute, which was inserting an extra AS transition into our "path".
Rebuilt DPDK and Ostinato on 10g-dev2 after Richard upgraded it to Jessie so that I can resume my parallel libtrace development and testing once he's done with his experiments.
Installed and tested a variety of Android emulators to try and setup an environment where JP and I can more easily capture mobile app traffic. Turned out Bluestacks on my iMac ended up being the most useful, as the others I tried either lacked the Google Play Store (so finding and installing the "official" apps would be hard) or needed more computing power than I had available.