Shane Alcock's Blog
Another solid week of state machine improvements. I've been comparing the machines derived by my algorithm against the machines I can derive manually from the raw data. This has revealed quite a few failures on the part of my algorithm; a lot of the problems fell into one of two categories: 1) creating loops in situations when we probably shouldn't have or 2) a failure in the variant recognition code (both in terms of failing to recognise a variant and being too keen to decide two sequences are variants).
In the process of fixing these problems, I also discovered a bug in my original pattern extraction code that was causing it to halt too early, i.e. as soon as it has extracted a pattern of at least 4 tokens rather than the intended 20 tokens, which explains why many of the patterns I was working with were fragments of a whole sequence. Fixing that has greatly improved the quality of the machines I have been deriving, as well as revealing some patterns that I was previously always missing.
Also spent a day tidying up some of the ampy and amp-web code prior to Brendon releasing them on github. Made the old rrd-smokeping collection work again, as well as removed all of the old LPI and munin collections which we are not interested in maintaining right now.
Another short week of refinement on the FSM generation code. Fixed a major bug in my pattern-mining code that was causing it to return substrings that overlapped as the most common repeated substring. Also spent a lot of time refining the code that determine whether a sequence is a variant of another; now, a short sequence that is entirely encompassed by another much longer sequence is considered a good match despite the number of tokens in the long sequence that are unmatched.
Put together a poster describing the FSM work, as CROW are interested in displaying it at the CultivateIT event next week. Even if they don't use it there, it'll probably be handy to have available at some point.
Helped Brendon test out some code polishing that he has done to NNTSC before putting it up on GitHub. Went through and removed some outdated code in the repo (specifically the LPI modules) and updated the docs to not refer to our non-working modules so hopefully nobody will try to use them.
Another disrupted week, this time caused by a malfunctioning vehicle causing me to have to work from home for much of it.
Returned to polishing and improving my state machine generation code, mostly to deal with some minor inaccuracies when creating loops or converging branches. The machines are starting to look reasonably right, although I still need a good method for working out the best candidates to be start states.
Fixed some AMP matrix issues that cropped up when we rolled the latest code out to one of our deployments. The two main problems were that a) the throughput matrix hadn't been updated to the new API and b) the relative matrix metrics were inconsistent. As part of the process of fixing these, I also found that we've been calculating relative latency incorrectly for quite a while so I've fixed that as well.
NZNOG week. Spent the first two days finalising everything for the AMP tutorial.
The tutorial itself went ahead on Wednesday afternoon and seemed to be fairly successful. No major technical glitches and the participants seemed to get something out of it.
Gave my talk on my latest libprotoident study on Thursday. Happy with the reception I got and had quite a few interesting conversations afterwards as a result.
Helped Brendon get the NZNOG AMP tutorial in a presentable state. Built some VM images that our attendees will be able to use as an AMP server and made sure that the steps provided in our tutorial will result in a functional server. As a result, we've fixed a few little bugs in ampy and amp-web that showed up in situations where you don't already have a lot of pre-existing streams or meshes.
Made sure that our VMs and instructions will work with VMWare, VirtualBox and QEMU, as well as on Ubuntu, Windows 10 and macOS.
Installed Ubuntu 16.04 on all of our UP boards, so they are all ready to go next week.
Spent a bit of time testing out some of Brendon's AMP tutorial instructions, making sure that everything so far is sane and no steps are missing. I anticipate there will be a lot more of this next week as the tutorial gets closer to a complete draft.
Continued working on verifying and fixing the auto-generated FSMs. Going over the entire set of generated FSMs from my test dataset threw up a number of bogus looking machines, so I've been working on investigating and (when necessary) fixing the problems. I've also managed to get self-repeating states working correctly for the most part; just one or two edge cases that still need to be detected and handled properly. Re-implemented tagging the original call logs with the FSMs that were matched by subsequences within the call log -- the current implementation is naive in that it assumes any state within a machine could be a start state, which is not going to scale well so I need to come up with a way to infer potential start states (or at least rule out definite non-start states).
Re-worked libflowmanager to be usable in a parallel situation. Previously, the flow map was a global variable. Now, you can have multiple flow maps so you can have one per thread and use libtrace's bidirectional hashing to ensure that each flow corresponds to only one thread, and therefore only one flow map.
Started experimenting with using parallel libtrace with libprotoident applications. I soon ran into a bug where using the built-in hasher thread to distribute packets could cause a deadlock, so spent most of Friday trying to track this down.
Back to work for two days this week. Caught up on a pile of email, then wrote my talk for NZNOG later this month.
Tested and released a new version of libprotoident.
Started working on adding single node loops to my FSMs for the STRATUS project.
Libprotoident 2.0.10 has been released.
This release includes rules to match new traffic patterns for many of the protocols that we introduced in the 2.0.9 release. We've also added two new protocols: BACnet and Maxicloud.
This release also no longer treats TCP keepalive packets as payload-bearing.
The full list of updated protocols can be found in the new libprotoident ChangeLog.
Tidied up and documented the FSM extraction code, so that I'll be able to remember how it works when I start working on it again in earnest next year.
Finished the matrix layout / selection changes and merged them back into develop. Hopefully we will get a chance to roll these out early next year once Brendon builds some new packages.
I had to run a test capture for a few days last week to make sure that some changes Richard had made to libtrace had not broken DAG and RT inputs. Ran the resulting traces through libprotoident to see if there are any new protocols worth investigating. Managed to make a few improvements to the rules for existing protocols to catch a few cases that we were missing but otherwise nothing particularly exciting cropped up.
In Wellington for STRATUS forum on Monday. Had a few interesting chats -- definitely a lot of people out there interested in anomaly detection in a variety of contexts.
Continued refining my FSM generation code. Managed to get rid of most of the obviously incorrect transitions in my test cases now. There's still a bit of work to do in terms of tidying up some orphaned states that are left over as a result of the code realising they are redundant and trying to choose better start states, but my main focus before the end of the year will be tidying up the code and making sure it is sufficiently documented so I'll be able to pick it up again in the new year.
Fixed a bunch of small problems with amp-web and NNTSC that we've known about for a while. Started working on replacing the matrix selection tabs with dropdowns and combining related "tabs" into a single matrix type, e.g. http duration and http page size are combined into a single "http" matrix with the ability to change the metric using a dropdown.