Christopher Lorier's blog
I set up an snmp poller to look at consistency of packet counts at two ends of a link and it looks fairly promising. I'm currently setting up an openflow system for some more extensive testing.
I've also been looking at the latest versions of RouteFlow as well to get an idea of what has changed since I last worked on it. And investigating what could be done to include multiple table support and other things needed for what I am doing.
I've been writing the proposal and thinking about how things will scale.
To poll both ends of a path I can add an extra flow on each switch specifying the ingress switch, so that when a packet leaves the system it is counted with the other packets that entered the fabric at that switch. This will require tables and stacked mpls labels (3 layers of mpls), though it could probably be made to work with two.
This way I can poll both ends of a path, but inbetween I am aggregating paths, since the alternative means I have a number of flows to create paths on each switch that is quadratic in the number of switches in the fabric. This is going to be a complication for accurately locating problems just by polling counters.
Since doing my presentation I have done a bit more reading and have just started on writing the proposal.
So the idea is to include fault detection into the distributed router used for Cardigan. Looking at packet counts hitting various flows/ports and injecting packets to determine when there is a problem. I have to look at how to make it quick to react without overreporting or overwhelming the controller.
This would like to use openflow groups, which have failover mechanisms, however these are not implemented by anyone as far as I am aware.
So this week I have worked on my conference presentation and read a lot of stuff about fault detection in and outside of SDN.
The SDN stuff seems to mostly agree that to be fast you need fast-failover groups, which arent in open vswitch yet.
For non sdn stuff I read stuff on EOAM and BFD and RFC 6374. As well as some other random stuff that wasnt much use.
Handed in on monday, here's a copy if for whatever reason someone decides they actually want to have a look at this.
I think it is safe to assume my blog entries will consist of me just writing up over
the next couple of weeks.
An extremely drafty first draft will be ready today.
So I have everything working now, and have started writing the report. I should have a first draft of the first chapter finished today.
Some of the code is not as pretty as I would like it to be, I completely broke my interface between the load balancing module and rfserver..
Having bodly claimed to have finished debugging the path learning last week, when I tried to get it to put the paths onto switches, I found that it wasnt actually deleting the paths properly, it was just telling me that it had with how I was testing it.
So I ended up completely overhauling how I am storing these things, and am just working out the last few kinks in that now. But hopefully that means that when it comes time to actually use them the whole process should be a lot easier.. Hopefully..
Finished debugging the path learning. So now I learn paths between switches with dijkstra's correctly. But dont do anything with them yet.
I started testing the path learning, which took a bit of effort, changing the test network to be a non full mesh, and adding scripts to bring switches down and back up again when I need to. So most of the week was spent debugging this.
I also did my in-class presentation, which went fine..