User login

Search Projects

Project Members

Shane Alcock admin

STRATUS: Awareness and Response to Anomalous Data Activity

This project addresses the problem of recognising potential cloud security breaches and alerting 
affected users as soon as possible. This will enable cloud users and administrators to be better 
informed about events that should concern them in a way that is user-friendly and easy to 
comprehend.  This research will also provide the ability to react to these events, by identifying the source of the 
vulnerabilities and allowing data access to be revoked if necessary. 

The main expected outcome is a system that will allow cloud users and administrators 
to explore the history of activities involving their cloud data and have significant events 
highlighted for them. The system will also enable cloud users to be immediately notified 
about any events that affect them or their data. The system can also be used to provide 
visibility into relevant aspects of the cloud infrastructure as well, which could help cloud 
administrators find and react to potential breaches before they compromise the entire 
user base. 
 
The benefits from this system are that all cloud stakeholders can achieve better 
awareness of what is happening to the cloud and their data within it. Administrators can 
feel confident that important issues will be brought to their attention without having to 
go looking for them. Users can feel secure knowing that there is a monitoring system in 
place that will alert them immediately should a security issue arise.  
  
  

17

Oct

2017

Helped Jayden with polishing up the final version of his Honours report. Hopefully he is happy with the final result!

Started testing the initial prototype of the DAG multicaster on our development boxes. Had a few issues getting dpdk pktgen to do exactly what I wanted (not helped by the terrible documentation!) but eventually managed to happily capture 10Gb of small packets split across 4 DAG streams with no real issues. Next step is to start encapsulating and multicasting some nDAG records.

Went to the STRATUS forum on Friday, flying down to Wellington on Thursday afternoon. Forum seemed to go pretty well; plenty of people that I spoke to thought that our work so far was interesting.

Released a new version of libprotoident.

03

Oct

2017

Spent a decent portion of my week working on my reworked cluster evaluation code for STRATUS. The new version seems to be producing labels that are much more useful, so my ability to evaluate clusters and identify the least conforming members has improved greatly.

Continued to tweak and improve the libprotoident rules. Started working towards a possible 2.0.12 release by updating documentation and running some basic build tests on various operating systems.

28

Aug

2017

Back at work on Wednesday after a couple of weeks away. Spent most of my week catching up on emails and preparing for a STRATUS workshop coming up on Monday.

Did manage to spent a little bit of time looking at unknown traffic on the Waikato capture point again. Added a couple of new protocols: Smite and Fliggy. I've also found what appears to be another IP sharing "tool" similar to IPSharkk -- this one looks like it is installed as malware, so the network user is probably unaware that their machine is being used to proxy other people's traffic. Will try to dig a little more into this next week.

08

May

2017

Added another 5 protocols to libprotoident -- having a slightly more powerful PC for installing and running various candidate applications has helped quite a bit. Updated the rules for several more protocols as well.

Made some more progress on my protocol taxonomy -- I'm up to 'P' for the TCP protocols so I'm probably about 1/4 of the way through now.

Continued re-factoring the FSM generation code. Getting close to done, although I suspect the amount of changes and variable renaming will require a fair bit of testing to make sure I've transferred everything across correctly.

Added the ability to choose between TCP and HTTP throughput data on the AMP matrix. To do this, I had to bring the amp-web/nntsc install on prophet back up to date after a few months of being untouched. As always, there were a few issues with dependencies and versioning which slowed everything down, but eventually Brendon and I got it all working correctly.

01

May

2017

Another disrupted week, this time due to being ill. Spent most of my available time looking over the output of my new multi-process state machine generation algorithm. The extra sequence fragments that become apparent when considering multiple processes managed to reveal a few new situations where my code wasn't quite doing the right thing. I've fixed those and am reasonably happy again with the machines produced for my test dataset.

Moved on to some code re-factoring, as the existing code-base had become a bit of a mess from hacking in fixes to all of the edge cases I had been dealing with. In particular, I'm aiming to separate code that deals with the machine itself, i.e. the states and their transitions, from the code that compares sequences and determines what needs to be added to (or removed from) the machine to accommodate the variation.

19

Apr

2017

Slightly disrupted week with Easter and cyclones having an impact on the productivity. Most of my time ended up being spent hunting down more previously unknown protocols. Just three new protocols this week, along with fixes for three more.

On the STRATUS side, I worked on creating a way to "combine" the suffix trees for each individual process so that we can account for sequences that appear frequently in the whole dataset but never more than once or twice within a given process. The original implementation would not recognise those sequences as frequent, because it considered each process individually. I think I've got this working now -- but I'm yet to look at the results too closely.

10

Apr

2017

Continued delving into the unknown traffic on the campus network. Had a mix of frustrating days and successful days -- one protocol (N2Ping) took nearly two days to track down but I got there in the end. 8 new protocols added to libprotoident this week again, so we're starting to get close to 400 supported protocols in libprotoident.

Another week of refinement on the FSM code. Most of the effort has been focused on loop recognition, particularly in terms of making sure we don't ignore candidates that can be used to identify loops.

03

Apr

2017

Have been using my new daily libprotoident email to make some good progress in terms of adding new protocols to libprotoident. Another 8 protocols added this week, with 5 existing protocols improved as well.

Found a few new bugs in my FSM tandem-repeat code after running it against my full test dataset and doing an initial validation of the resulting machines. Finished up a set of slides describing (broadly) what I'm doing overall with the FSM project and how I'm going about it, i.e. suffix trees, pattern extraction, variant detection and machine building.

Started looking into a parallel RT implementation for libtrace / wdcap, with an eye towards removing the combiner bottleneck from wdcap.

27

Mar

2017

Finished implementing tandem repeat detection within my existing pattern extraction code. The initial results look promising, i.e. the code has been able to identify "write,read" as a repeat in the FTP system call log with no obvious false positives. Next job will be to repeat the machine validation and make sure that I have improved the results overall.

Wrote a libprotoident program to perform daily monitoring of unknown payload patterns on the Waikato capture point and send me an email every morning with the 25 "biggest" patterns by payload, as well as a few example flows matching each pattern. Using this data, I've already been able to add a few new patterns to libprotoident and look forward to being able to be more proactive at keeping libprotoident up to date.

20

Mar

2017

Finished porting the remaining libprotoident tools to be parallel-compatible. Spent a couple of days looking at unknown payload patterns in some recent Uni traffic -- unfortunately I wasn't able to make much tangible progress on identifying much of the unknown traffic.

Worked on implementing an algorithm for finding tandem repeats in strings, with the eventual aim of porting it over to work with my system call sequences. The published algorithm consists of three phases, but each of those phases has either involved looking up and implementing several other string processing algorithms (LZ-decomposition, longest common extension) or has required modifications to my existing suffix tree code (extracting a suffix array, bottom-up traversal, storing the longest child suffix in each node). Therefore, I'm about half-way through implementing the algorithm.

Moved libtrace into its own github organization to reflect that libtrace is now going to be more of a community project than a WAND project. I'll still be helping out with maintaining it for now, but now the workload can be shared amongst a group of trusted libtrace users (including people outside of WAND). This will hopefully keep libtrace well looked-after, even as my available time gets more and more restricted.

13

Mar

2017

Went back and finished making libflowmanager work with parallel libtrace. The remaining problem had been that the expiry modules were not thread-safe, so I've rewritten them to be classes so that the expiry lists are local to each module. Testing with lpi_protoident has proven these changes to work (at least when reading from a trace file), so I can continue updating the rest of the libprotoident tools to be parallel-libtrace compatible soon.

Spent the remainder of my week validating some of the FSMs produced by my model generation algorithm. Overall, the results are starting to look fairly good -- most of the machines being generated by my code are close matches to the ground truth machines, and there are very few duplicate or redundant machines. The most obvious outstanding problem is related to "tandem repeats", i.e. sequences of multiple system calls that can be repeated any number of times (such as "read,write,read,write,read,write", where "read,write" simply repeats until the action is over. Started looking into methods where I could detect tandem repeats so that I can try to encode them as a single self-repeating state.

06

Mar

2017

Finished testing my packet ordering fix for libtrace. Managed to come up with a more efficient method of determining an appropriate order value for int: and ring: so hopefully performance shouldn't be impacted too much by this change. Also fixed a couple of other libtrace bugs that I had noticed, particularly the horrid performance of tracertstats on some live formats. Released a new version of libtrace (4.0.1) that includes these fixes as well as a few others that have come in since the first parallel release.

More tweaks to the FSM generation code. I've found some errors in the way by which I was determining whether one machine was effectively superceded by another, which was causing me to produce extra redundant machines. I've also come up with the new method for creating the match maps when comparing two sequences -- the old method simply focused on picking the longest match and then finding any other matches that cover unmatched territory, which doesn't work so well for some looping sequences which have repetitive sub-sequences within them. My new method tries to find an optimal set of matches that gets the best possible coverage while minimising the amount of overlap, so we avoid matches that only end up covering one token because the rest of the sequence has already been covered by a larger match.

27

Feb

2017

Still having some problems with my variant recognition code for the FSM construction. Decided to go back to square one in terms of the set of conditions for variant matching. I've started developing a small dataset of potential variants and tagging them with whether I actually want them to be recognised as variants or not. Using this, I can hopefully look at these as a complete set and try to develop a set of conditions that works well for all scenarios, rather than the previous "whack-a-mole" development strategy where I would focus on the case I'm currently getting wrong, come up with something that fixes that problem and then consequently break several other previously good matches.

Finally managed to track down and fix a nuisance segfault in anomaly_ts that would very occasionally crop up. The biggest challenge was getting the segfault to occur in an environment where I could get a core dump; the problem was obvious once I had a useful dump. Also fixed a handful of interface issues in amp-web and resolved an issue with the event dashboard being slow to load.

Found and resolved a libtrace bug where parallel ring: inputs would appear to produce out-of-order packets, even with the ordered combiner. This was because we were relying on the packet timestamp as a ordering mechanism, but the clock used to timestamp the packets is not strictly monotonic -- using a monotonic clock to determine packet order makes the combiner a lot happier. Packet ordering is now determined per-format as a result, so I'm still testing that ordering still works for the other formats.

20

Feb

2017

Another solid week of state machine improvements. I've been comparing the machines derived by my algorithm against the machines I can derive manually from the raw data. This has revealed quite a few failures on the part of my algorithm; a lot of the problems fell into one of two categories: 1) creating loops in situations when we probably shouldn't have or 2) a failure in the variant recognition code (both in terms of failing to recognise a variant and being too keen to decide two sequences are variants).

In the process of fixing these problems, I also discovered a bug in my original pattern extraction code that was causing it to halt too early, i.e. as soon as it has extracted a pattern of at least 4 tokens rather than the intended 20 tokens, which explains why many of the patterns I was working with were fragments of a whole sequence. Fixing that has greatly improved the quality of the machines I have been deriving, as well as revealing some patterns that I was previously always missing.

Also spent a day tidying up some of the ampy and amp-web code prior to Brendon releasing them on github. Made the old rrd-smokeping collection work again, as well as removed all of the old LPI and munin collections which we are not interested in maintaining right now.

13

Feb

2017

Another short week of refinement on the FSM generation code. Fixed a major bug in my pattern-mining code that was causing it to return substrings that overlapped as the most common repeated substring. Also spent a lot of time refining the code that determine whether a sequence is a variant of another; now, a short sequence that is entirely encompassed by another much longer sequence is considered a good match despite the number of tokens in the long sequence that are unmatched.

Put together a poster describing the FSM work, as CROW are interested in displaying it at the CultivateIT event next week. Even if they don't use it there, it'll probably be handy to have available at some point.

Helped Brendon test out some code polishing that he has done to NNTSC before putting it up on GitHub. Went through and removed some outdated code in the repo (specifically the LPI modules) and updated the docs to not refer to our non-working modules so hopefully nobody will try to use them.

07

Feb

2017

Another disrupted week, this time caused by a malfunctioning vehicle causing me to have to work from home for much of it.

Returned to polishing and improving my state machine generation code, mostly to deal with some minor inaccuracies when creating loops or converging branches. The machines are starting to look reasonably right, although I still need a good method for working out the best candidates to be start states.

Fixed some AMP matrix issues that cropped up when we rolled the latest code out to one of our deployments. The two main problems were that a) the throughput matrix hadn't been updated to the new API and b) the relative matrix metrics were inconsistent. As part of the process of fixing these, I also found that we've been calculating relative latency incorrectly for quite a while so I've fixed that as well.

16

Jan

2017

Spent a bit of time testing out some of Brendon's AMP tutorial instructions, making sure that everything so far is sane and no steps are missing. I anticipate there will be a lot more of this next week as the tutorial gets closer to a complete draft.

Continued working on verifying and fixing the auto-generated FSMs. Going over the entire set of generated FSMs from my test dataset threw up a number of bogus looking machines, so I've been working on investigating and (when necessary) fixing the problems. I've also managed to get self-repeating states working correctly for the most part; just one or two edge cases that still need to be detected and handled properly. Re-implemented tagging the original call logs with the FSMs that were matched by subsequences within the call log -- the current implementation is naive in that it assumes any state within a machine could be a start state, which is not going to scale well so I need to come up with a way to infer potential start states (or at least rule out definite non-start states).

Re-worked libflowmanager to be usable in a parallel situation. Previously, the flow map was a global variable. Now, you can have multiple flow maps so you can have one per thread and use libtrace's bidirectional hashing to ensure that each flow corresponds to only one thread, and therefore only one flow map.

Started experimenting with using parallel libtrace with libprotoident applications. I soon ran into a bug where using the built-in hasher thread to distribute packets could cause a deadlock, so spent most of Friday trying to track this down.

09

Jan

2017

Back to work for two days this week. Caught up on a pile of email, then wrote my talk for NZNOG later this month.

Tested and released a new version of libprotoident.

Started working on adding single node loops to my FSMs for the STRATUS project.

19

Dec

2016

Tidied up and documented the FSM extraction code, so that I'll be able to remember how it works when I start working on it again in earnest next year.

Finished the matrix layout / selection changes and merged them back into develop. Hopefully we will get a chance to roll these out early next year once Brendon builds some new packages.

I had to run a test capture for a few days last week to make sure that some changes Richard had made to libtrace had not broken DAG and RT inputs. Ran the resulting traces through libprotoident to see if there are any new protocols worth investigating. Managed to make a few improvements to the rules for existing protocols to catch a few cases that we were missing but otherwise nothing particularly exciting cropped up.

12

Dec

2016

In Wellington for STRATUS forum on Monday. Had a few interesting chats -- definitely a lot of people out there interested in anomaly detection in a variety of contexts.

Continued refining my FSM generation code. Managed to get rid of most of the obviously incorrect transitions in my test cases now. There's still a bit of work to do in terms of tidying up some orphaned states that are left over as a result of the code realising they are redundant and trying to choose better start states, but my main focus before the end of the year will be tidying up the code and making sure it is sufficiently documented so I'll be able to pick it up again in the new year.

Fixed a bunch of small problems with amp-web and NNTSC that we've known about for a while. Started working on replacing the matrix selection tabs with dropdowns and combining related "tabs" into a single matrix type, e.g. http duration and http page size are combined into a single "http" matrix with the ability to change the metric using a dropdown.