Shane Alcock's Blog
Only worked three days this week -- on leave for the rest.
Continued developing the event filtering mechanism for the amp-web dashboard. Managed to make all of the filtering options work properly, including AS-based filtering and filtering based on the number of affected endpoints.
Changed event loading to happen in batches, so if the selected time range covers a lot of events we will only load 20 at a time. A new batch is loaded each time the user scrolls to the bottom of the event list. This means that we can now replicate the old infinite scrolling event list behaviour on the dashboard, so I've removed the former page.
Added automatic fetching of new events to the dashboard, so the event list is now self-updating rather than requiring a refresh of the whole page to see any new events.
Continued working on the event filtering mechanism for amp-web. Added support for an ASN->AS name mapping database which will be used to manage the list of AS's that can be filtered on, as well as be used for labeling our traceroute graphs (instead of querying whois.cymru.org which can fail from time to time).
Changes to event filters are now posted back to the amp-web server and saved for the next time the user loads the event dashboard.
Started working on actually filtering the events based on the user's selections. I've got filtering working for time period, maximum event groups, event types, sources and targets. One interesting side effect of filtering is that the removal of certain events from event groups can create situations where we have duplicate event groups (because the events that made those groups distinct are no longer on the dashboard). Removing events can also change the start time of an event group and therefore event groups no longer appear in chronological order. As a result, I've had to re-work the event processing to correct for these issues.
Marked the 513 libtrace assignments. Some students performed very well and I was glad to see that the investigative task proved to be very doable.
Started working on adding the ability to filter events and event groups on the amp-web dashboard. Most of my effort so far has been in producing a mock-up of the interface, which I showed to Nathan and Chris on Thursday afternoon. Started replacing some hard-coded filtering settings with a dynamic template that uses user preferences stored in a database on Friday.
Fixed a few little netevmon issues that cropped when trying to restart netevmon on prophet prior to starting work on the dashboard filtering, mostly in relation to ensuring that the 'purge event database' option works sensibly.
Started writing up a short paper on the unexpected traffic analysis I've been doing for the past few weeks. Made decent progress -- I've got a mostly complete draft, just missing a conclusion and an abstract.
Spent a decent chunk of Thursday dealing with the fallout from upgrading influxdb to 0.11 on prophet. This broke most of our existing rollup tables, as the data type that we were now inserting (int) was no longer compatible with the data type that we apparently used to insert (float). Compounding matters was influxdb's lack of visibility into what data types are associated with any given column. Ended up trashing and re-creating the database (somewhat by accident) which fixed the problem, but not an ideal solution if we ever roll this out in production.
513 assignment was due at 5pm on Friday, so dealt with a few final queries from students. 20 submissions in the end, so a bit of marking to do next week.
Continued making progress with my unidentified mice flows in libprotoident. Added a whole pile of new rules, mostly for various Chinese apps again. Have probably done enough now that I can draw a line under this and start writing the paper itself; there are a few obvious patterns that I would like to identify but this has consumed a lot of time already.
Answered a handful of questions from 513 students -- mostly intelligent ones, so I'm reasonably confident about how the class is going overall. Due date is this coming Friday, so we'll know for sure soon enough.
Helped finish off the funding proposal in the first half of the week.
Continued working with libprotoident. This week I gave up on the elephant flows and started looking at the mice flows. Found some interesting stuff; the highlight being a huge number of flows on TCP port 80 that seem to be associated with the Baidu web browser. The behaviour of these flows is particularly odd: connect to server, send a FIN with seqno N, retransmit FIN a few times, send a non-FIN packet with 1 byte of payload (0x00) and seqno N-1 (incredibly invalid TCP behaviour!), server sends a RST. End result is > 150,000 flows over a week on port 80 with a single outgoing byte of payload.
Added some filters on the Endace probe to see if we can find people doing this traffic on campus, as the Baidu browser is pretty well-known for having a tendency to leak all sorts of private data back to its masters. Found multiple staff PCs that appear to be doing this sort of traffic, so Brad and I will try to prepare a report for ITS next week.
Met with Nathan at Lightwire on Thursday afternoon re: AMP and netevmon. Came away with plenty of ideas and suggestions for improvements we can make and hopefully we also helped Nathan understand parts of our system better as well. The good news is that netevmon seems to mostly be picking up valid events, but even so the number and frequency of these events can be overwhelming so we need better control over what events are shown to the user.
Worked on the next MBIE funding proposal document. Still got a fair way to go so this will probably eat up a lot of next week too.
Continued trying to identify the remaining Unknown applications in the Waikato Sept 15 traces. Only managed to identify one new protocol (Xunlei Accelerated) but this did account for 14G of unknown traffic on TCP port 8080 so that has gotten rid of the biggest outstanding quantity of unknown traffic. The rest are looking like they might get the better of me -- it's almost all Chinese in origin and I can identify the parent company (Tencent, CERNET, Taobao etc) but actually figuring out which of the myriad of apps these companies own is mostly just trial and error at this stage.
Continued working away at the Unknown traffic from my libprotoident port study. Added new protocols for Telegram Messenger and Kuguo, as well as improved DNS (especially TCP DNS) and NTP matching. I still have a bit more Unknown traffic to identify before I'd be comfortable putting the results in a paper, but we're getting closer.
Gave my 513 lectures this week. Looking forward to seeing how the class get on with my assignment.
Met with Ryan Jones who is doing an Honours project that will use netevmon to try and find events in the CSC data. Gave him access to the code and a few hints to start out, but I imagine I'll have to dedicate some more time to this over the course of the year.
My fixes to Andy's InfluxDB code seems to be resulting in consistent and correct bins being stored in the rollup tables. Threw netevmon at the development system to see if it can cope, which it seems to be doing OK. There's still a bit of a concern around long-term memory usage, but I'll see how that pans out over the next couple of weeks.
Spent the rest of my week concentrating on finishing up JP's summer study on unexpected traffic on typically open ports. Managed to improve a few existing rules to recognise more traffic, as well as add new rules for QQ video chat and what appears to be a C&C covert channel for some Chinese malware using UDP port 53. Started framing up a paper for IMC based on this study.
Did some final prep work for the libtrace lectures and assignment for 513.
Arrived back in NZ on Monday, back at work on Tuesday. Brought Brendon and Richard N. up to speed on the things I learned at AIMS and the potential collaboration opportunities I discussed with people there. Spent a bit of time writing emails to chase up on some of these opportunities.
Deployed Andy's InfluxDB code on prophet. Spent much of the rest of the week playing around with the continuous query system to try and fix some outstanding issues caused by Influx's design decision to never automatically backfill the aggregated series when older / lagged data is received (e.g. when restarting NNTSC after an outage or AMP results arriving 40 seconds later than their timestamp due to timeouts). This was a bit trickier than you would think because there's no obvious way to find out when the last automatic continuous query ran (they don't happen exactly on the bin boundary) so I have to guess based on the current time, the time the bin should have ended and the timestamp of the current result.