Christopher Lorier's blog
So over the break I was trying to fix ovs, but after finally talking to the guy who wrote the ovs mpls branches this week, I am now giving up on that.
So instead I have the polling working with vlan tags and unique flows for each pair of nodes. It is currently just printing out values for packets sent and received, but it is counting them correctly and not losing any packets.
So then I started reading a few papers on passive monitoring techniques to focus on how they tested them. They've actually been fairly interesting. A couple using very similar techniques to mine.
I tracked down the ovs bug. I have got it doing what I wanted it to, but it is currently failing ovs test suite tests to do with bfd and lacp for whatever reason. These are taking quite a long time to run, but I will double check that they arent also affecting the branch of ovs I am using without my changes. And then have a go at running it with routeflow. Hopefully I can get all this sorted then start the new year with my routeflow path polling all set up and ready to do some tests on.
Havent been super productive lately. I am still digging away at openvswitch, as well as reading things relating to what I am going to do if I actually discover packet loss (that is packet loss not caused by problems in experimental branches of openvswitch).
I spent the week dealing with the disappearing packets. My initial attempts to recreate the problem in a more simple setup kept resulting in kernel panics. To try to expedite the process of diagnosing the kernel panics we set up a virtual machine. This didnt particularly help with diagnosis however, as it fixed the problem immediately.
The mystery of the disappearing packets seems to be related to recirculating packets. That is, when you add or remove an mpls label the packet is recirculated to update the header information.
When I push an mpls label and send the packet to another table, if the flow on that table attempts to push another label the packet will be dropped instead. The flow count for the second flow is incremented but its actions dont seem to occur.
This only seems to happen with pushing labels. Other actions, like updating the mpls label fields (ttl or label), dont seem to cause this.
There are some other bizarre outcomes I am coming across. Popping mpls labels seems to be popping the innermost label, I end up with packets with one mpls tag, without the bottom of stack bit set and with the wrong label arriving at the hosts.
Also in some cases the flow counters are not getting updated, which could be a big problem for me.
All of this also is occurring in what appears to me at least to be a delightfully non-deterministic fashion.
I emailed the guy who maintains the branch I am using, but havent heard back yet. Hopefully he can shed some light on things.
But, in general, things are not going as well as they might be just at the moment.
Had a cold most of last week, so I got basically nothing done. I am still working on tracking down the problem of the disappearing packets in the recirculation branch.
I changed rfserver over to a proper MPLS approach, but am having trouble with the ovs recirculation branch. My packets are hitting one table then disappearing before they reach the next..
Hit a little hitch with implementing polling packet counts. The poller works fine---aside from breaking forwarding within the fabric.
The problem is that I'm only able to push one layer of label and I need to edit the label twice. I could push metadata instead, and then push the complete label in a single action, but that would scale quadratically in the number of switches. I also thought about using the vlan pcp field as a bonus tag, but that seemed a little bit messy. Plus that's only 3 bits, so it limits me to 8 switches.
So instead I'm gonna use a development branch of ovs with MPLS support. It's doing it the proper way, but for the time being it is pretty much completely unsupported by anything else.
So my rebase last monday turned out to be more catastrophic than I had thought. Trying to move on and unearthing the hidden consequences took up most of last week. But I have now moved on to working on polling paths. I have built the poller itself and am currently adding the messages to routeflow it needs. After that I just have to plug it in to rfserver and I can take it for a spin.
I have completed all the work to get the arbitrary_topology working tidily. I tried to rebase and tidy it all up this morning but it wasnt a great success.. It has mostly been compressed into one giant commit now. I will look at rebasing that nicely so I can put it online, but my motivation to continue with that has taken a bit of a hit.
The next thing is to work out how I can implement the path end polling with the current state of ovs. IE no layered MPLS.
I think it should be achievable with masked VLan tags, which will probably turn out to be a much simpler approach. If a little hacky.
So, I have been completely revising RFServer at the moment. Getting rid of most of the remnants of mongodb to hopefully have something a lot simpler to deal with.
This is mostly because I noticed about 4 thousand or so bugs with the code in my 591 to deal with network partitions. Hopefully, this will help reduce that number a little..