Talked to brad about getting tagged log file data and he modified bearwall to log both packets that were allowed and packets that were blocked, these are indicated in the event in the log and in the case they were blocked the reason is indicated. So now I have a days worth of tagged firewall logs (about 11,000 lines).
I then made an application to process this data and convert it into the ARFF format so that it will be compatible to work with in WEKA. Now I need to split this into separate sets for testing, parameter tuning and evaluation.
Now I will build the application with the WEKA framework to rank this data based on classification accuracy.
Got a little bit of work done this week. Have setup freeradius3 in a new deb8 VM, have bridged device to mininet test environment and can hand out DHCP leases without too much hassle. Currently, there's no VLAN tags on this otherwise it breaks, so need to investigate freeradius sitting on multiple vlans (vlan per hop) handing out DHCP to double tagged traffic (triple if you count the HOP vlan).
Also added rest api support to the controller, will build on that once I have the VLAN task sorted.
Spent some more time looking into using embedded Chromium as part of the
HTTP test. I've managed to successfully extract all of the
navigation/resource timing information from the browser after the page
has loaded which is very useful. Getting access to headers looks like it
will require implementing my own resource handlers and processing
requests manually, but should be doable. Also, I still haven't managed
to completely decouple the browser from GTK - something is still trying
to initialise it even though there is no need for it and nothing is ever
drawn to the screen.
Tidied up some more configuration parsing and parts of the main loop in
the amplet client, removing the need for a few more global variables
that were convenient at the time of writing.
Helped Brad configure a new measurement machine to be sent out.
Reconfigured some of the existing machines to swap the management ports
around so we can test them without the reporting traffic interfering.
Tried multiple setups of CoAP servers:
libcoap has been most successful so far, the example provided functional server with time functionality and worked nicely with Copper (Cu) a Firefox extension for CoAP servers.
Can configure the address (-A) to start up with - one thing that wasn't included was the necessity to add '&' to the end in order for the command to execute (possibly something within the code)
Tested with a LAN machine (/w Firefox + Copper) and the IPv4 address of the RPi on LAN
Initial Copper screenFirst view of the CoAP server from inside Copper
Selecting 'time' and clicking 'observe' produces the following result
ObservingObserving the time feature of the CoAP Server (PUSH notifications)
Running the client example program with settings "echo 1000 | sudo ./coap-client -m put -T cafe coap://192.168.1.7/time -f -"
gave the output: v:1 t:0 tkl:4 c:3 id:18275
which is just the header of the CoAP message (no payload was defined)
Also trying to 'put' into the core or async (instead of time) gives a response of "Method Not Allowed" same as trying the operations in Copper.
Having a running example is useful and next steps will be to configure the example, or write a separate script based on the example with the functionality that is required for the projects server.
ccoap is a C based implementation of a CoAP server with examples, however running anything other than the examples (there are command line versions) fails to work (no response from the server. Also required editing of /etc/services and installing cmake to compile the code.
WebIOPi is an interesting setup, has functionality for both CoAP and HTTP servers which could come in handy later on in the project. The main feature of the project is to have a web interface for viewing the GPIO ports on the board. it also is implemented as a daemon service, which is part of what is wanted for the project server.
I've continued evaluating libraries.
I run into issues with padding using loxi, turns out this needs to be done manually and in the correct order to ensure fields are not overwritten etc.
I looked at using OFConnect again this seems to perform similar functionality to fluidbase, helps establish a OF connection. However does not include functions to build OF messages, given I'm happy with fluidbase and that there are a few bugs listed with OFConnect I've decided to skip it in favour of Revised OpenFlow Library (ROFL).
ROFL is a C++ library and supports fewer OF versions than loxi, however supports 1.0 and 1.3. The same flow mod is used for both allowing for quite nice support between versions.
I've also had a quick play with floodlight's oftest. A python library that tests OpenFlow switches capabilities and conformance to the OF standards. It's similar to OFLOPS however appears to be targeting testing capabilities rather than benchmarking performance. I.e. the tests included typically add a rule and send a packet that will match it and verify the packet is returned modified correctly. I needed to make some slight modifications to get it working such as removing, including adding some sleep to work around what I'm assuming is incorrect barrier behaviour (response before TCAM update) and some removed some specific match fields as it creates very specific matches matching every field in the packet to be sent.
Brad got the Brocade working, I've done some quick testing it appears to work but has limited support for features and lacks VLAN support. However it would still be useful to test packet in rates upon.
I have written a chapter on efficient discovery of load balancer diamond divergence points at the expense of internal diamond nodes. This used a restricted set of flow IDs. There is now also a corresponding section in the discussion chapter.
I also wrote a chapter on improving the efficiency of MDA analysis by changing the way in which the source port flow ID is chosen. The three methods were incrementing, random and sequential bit flipping. Sequential bit flipping did not appear to offer any advantage in discovering the successor set of a load balancer sooner. However it was hard to know what to expect as there are likely to be a number of different hashing algorithms used by different types of routers.
Continued fine-tuning the event groupings produced by netevmon. Major changes I made include:
* When displaying groups on the dashboard, merge any event groups that have the exact same members.
* Avoid including events in a group if they don't have a reasonable D-S score, even if there are other similar significant events happening at the same time. This gets rid of a lot of pointless (and probably unrelated events) from each group and also ensures groups expire promptly. This change has introduced a few flow-on effects: the insignificant events still need to be members of the group (in case they do eventually become significant) but shouldn't affect any of the group statistics -- particularly the group start time.
* Allow events that occur within one expiry period before the first event in a group to be included in that group -- threaded netevmon doesn't export events in a strict chronological order any more, so we need to be careful not to throw away out-of-order events.
* Have a fallback strategy if there is no AS path information available for a given source, dest pair (e.g. there is no traceroute test scheduled or the traceroute test has failed for some reason). Instead, we will create 2 groups: one for the source and one for the target.
* Polished up the styling of the dashboard and event group list and fixed a few UI issues that Brendon had suggested after looking at it on Friday.
Switched from using Netflow to sFlow since it turns out it is the most convenient way to get the information I need with the equipment available. Brad gave me some info on the switch which is exporting me traffic so I can differentiate between incoming and outgoing flows. I am having to manually check the interfaces in my parser program to get direction information. This isn't the most flexible way of doing it but the versions of Netflow and sFlow that are available do not support direction information and only handle incoming packets, not outgoing ones. I plan on making a configuration file which will contain the interface on which packets are sent to the Internet for the network on which my application is being installed. My parser program will use this to ascertain which flows are outgoing.
Currently I maintain 2 databases; one for ingress flows and one for egress. This is a result of me having to coordinate the individual flows which sFlow exports for each ingress interface. To reduce the amount of entries in my databases, I will only collect information from a couple of incoming interfaces on the local network. I will listen to all incoming packets on the uplink interface to the Internet.
Shane said altering lpicollector to export MAC addresses would be too much trouble so I skipped that idea. I still plan on using it later on to get application layer information and associate them with the flow exports produced by sFlow.
This week I will do as much of the application as possible with the data I have.
Found a paper where they cluster event logs with word vector pairs; this approach compares each pair to each other pair in the supplied logs allowing it to cluster lines with similar parts. There is also a toolkit associated with the paper that allows you to specify the input files and the support for making a pair then outputs the clusters where the support is reached, the outlier clusters can also be outputted. This will need to be investigated further to see if it is a good possible solution.
Had a meeting with Antti Puurula about possible approaches, where we discussed outputting a ranking into lists of safe and unsafe events. It was discussed on how this could be evaluated with a Mean Average Precision measurement and then a few algorithms that could be used for scoring events like clustering if the feature space could be separated, supervised learning if all the data was tagged or his recommended a supervised learning where the user manually updates the list of safe/unsafe and the classifier updates iteratively.
Then non language features like time stamps were discussed on how to integrate them as well by having another algorithm like niavebayes handling continuous features. This way we could identify events happening within a certain time period of one another to tie events between files.
Short week as I was off sick on Monday and Tuesday.
Spent some time looking into using a headless web testing environment
as an alternative to the current HTTP test. This would give us
don't (due to them being generated programmatically or obfuscated). Not
all of the headless testing software appears to give full access to the
events that I'm interested in, while some are written such that they
will be awkward to integrate into an AMP test. Currently looking at
embedded Chromium as most likely to be useful.
Started refactoring some of the configuration parsing code in amplet to
remove some unnecessary globals and remove some cruft from the main loop
that didn't really need to be there.