Weekly Report - 22/5/15




Finished off making word count program that outputs comma seperated format: word, occurances, frequency. To help give a better understanding of the document's that are being worked with. Next I think it will be useful to get the counts stats on how many words occur only once, how many occur > 10 etc.

On Friday 22/5 I had a meeting with Bob Durrant from the Statistics department to see get his opinion on possible approaches etc. I now have a better idea of what I'm going to do next firstly ignoring Topic Modeling for now and looking at Clustering, specifically to start with only the bearwall firewall logs to get started then look into comparing multiple types of log files later.

Firstly I will start with a simple version of K-Means and add functionality onto it as I go along e.g. seeing if there is more meaning in an IP address by separating the network and host portions and many other possibilities. Will also need to look into the languages and libraries I have found for this type of work to decide what I will be using.

Also I have been working on my presentation that I will have on Wednesday 27.