Troubleshooting anti-spoofing messages


2012/10/27

Is this network congestion or some other problem? Can you see what is confusing me? Join me in a network troubleshooting problem.


I'm going to start by describing the root problem and then we'll move into the details and observations.

A customer has a site to site VPN between two Kerio Control firewalls, one at the main office and one at a warehouse. The performance on that has never been great and the lack of performance was always attributed to having an asymmetric Internet connection at the main office. We'll call that connection at the main office "Cable" and say that it is nominally 100 Mb down and 15 Mb up.The connection at the warehouse is nominally 100/15.

They were able to get a 100/100 Fiber connection at the office. Call that "Fiber". We expected that would help the VPN problems. It did not - in fact, performance became worse when we tried using the Fiber for the site to site VPN.

Locking down the VPN

As the normal configuration attempts load balancing between Cable and Fiber, we thought perhaps that the VPN was still sending some traffic through Cable. If so, that certainly shouldn't have caused worse performance, but just the same we locked the VPN traffic to Fiber using this Traffic Rule:

Source: Firewall
Destination: Warehouse
Service: Kerio VPN
Action: Permit
Translation: Source NAT with Fiber connection
 

To my surprise, that broke the VPN completely. The speed was so bad we had to immediately abandon that rule. But why? What was happening here?

Speed tests

Speed tests from within the LAN using Fiber showed us better than 90 Mb down and never better than 60 Mb up. While disappointing, even 60 Mb still should have improved the VPN performance.

The laptop test

While attempting to debug that, a test was made where a laptop was connected directly to the Fiber. The speed tests jumped to slightly over 100 Mb down and 80-85 Mb up. Obviously the network or firewall was adding significant overhead.

Move VPN back to Cable

Because the VPN was unusable over Fiber, we moved it back to Cable. There it was slow, but still useful.

Packet drops

Attempting to debug that, we turned on debugging at both ends for "packets dropped for some reason".

Immediately the debug was filled with anti-spoofing messages.

[27/Oct/2012 00:41:20] {pktdrop} packet dropped:
Anti-spoofing (from Cable, proto:UDP, len:334, 10.10.10.232:2699
-> 10.10.10.255:5002, udplen:306)
[27/Oct/2012 00:41:20] {pktdrop} packet dropped:
Anti-spoofing (from Fiber, proto:UDP, len:96, 10.10.10.169:137 ->
10.10.10.255:137, udplen:68)
[27/Oct/2012 00:41:20] {pktdrop} packet dropped:
Anti-spoofing (from LAN, proto:UDP, len:334, 172.16.1.50:2698 ->
172.16.1.255:5002, udplen:306)
 

Anti-spoofing comes into play when a packet is seen at an interface it shouldn't be at. Note that first packet? That's a LAN packet appearing on the Cable interface. The second is a LAN packet on the Fiber interface.

The third one is another VPN to another office. How do a LAN packets appear to be coming into a WAN interface?

Topology

They get there because of topology. Everything in this network comes through two interconnected 48 port switches. There are no VLANS - the Cable is plugged into one port, the Fiber is in another and the LAN machines are in the rest.

Perhaps worse, some LAN machines are dual homed and have Cable public IP's on another NIC and those cards are also plugged into the very same switch.

But a switch is supposed to switch

Here's where my knowledge gets shaky. A switch is supposed "switch" packets. That is, it is supposed to learn that one MAC address is at port 6 and another is at port 7, so if it gets a non-broadcast packet that is supposed to go to that first computer, it puts it on port 6 and only port 6. The other ports don't see it - that's part of what distinguishes a switch from a hub.

That's plainly not happening here. Most (but not all) of the anti-spoofing messages are about UDP traffic and some of it is broadcasts, but thousands and thousands of non broadcast packets are being seen at the wrong interfaces.

Is this flooding slowing us down?

I'm assuming that the switch must be flooding these as though they were all broadcast packets. If it were actually dropping all these packets, this wouldn't work at all. So does this activity slow us down?

It definitely makes debugging hard: there is so much noise from this stuff that we could easily miss more significant messages.

A VLAN

It turns out that the switches are capable of creating VLANs. Therefore we created a VLAN for the Fiber. That eliminated messages in the log about Fiber and it also marginally improved the Speed test. The VPN was moved back to Fiber and it was again too slow (4 Mb) to be of any use.

We also tried taking that interface on the firewall directly to the Fiber, bypassing the VLAN. No help.

My questions

I'm confused right now and have several questions I can't wrap my head around.

  • Why are the switches flooding traffic?
  • Why is it mostly or maybe all UDP flooding? Why not TCP?
  • Are the anti-spoofing messages just noise or do they cause congestion?
  • Does having dual homed machines on the switches contribute to flooding?
  • Why is bypassing the switches (laptop test) so much faster?

And another thing

In addition to all that stuff, there's another oddity here. We had trouble creating the VPN on the Fiber.

With site to site VPN's, you make one side Passive and one side Active. That wasn't working, but making both sides Active did work. That makes no sense me at all unless some rule I didn't notice is interfering. If a rule IS causing interference, maybe that's the source of the poor performance, but I don't see anything and I have had Kerio look these over also.

So this is where we are right now. Very, very confused. We intend to VLAN the Cable side also, but the dual homed machines make that a little more complicated - one will have to be physically relocated before that can happen.

Any thoughts are welcome.

Solved!

Hurricane Sandy interrupted us, but we eventually got back to this.

Having exhausted everything else, I felt it HAD to be the Windows box, so I convinced the customer to take the time to put up Control in the Linux version.

Upload speeds immediately climbed to 73 Mbs - still a bit shy of the 90Mbs download, but big, big jump. He used an old piece of hardware for this; something newer might do better. At least it is now fast enough to use!

I don't think it's Linux vs. Windows issue. I think it's NIC problem on the Windows box that Windows just isn't seeing, but the happy thing is that it is fixed!

.

Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Troubleshooting anti-spoofing messages


10 comments



Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Anthony Lawrence







Sat Oct 27 17:51:23 2012: 11402   Dominic

gravatar


Everything plugs directly into your switches? You're not using the firewall as a WAN/LAN gateway? Are you 100% public IPs? Or are you mixing and matching public and private IPs on the switch?

Switches typically work at the layer 2 level. So yes, while they do limit the traffic in a sense (non broadcast traffic at least), they do so on a layer 2 level, i.e. utilizing the MAC (hardware) address not the IP address. They however do nothing to prevent one device from trying to talk with another even between incompatible IP classes.

If you've got layer 3 switches, you could set one up to do routing, assigning separate wan/LAN IPs and such. Otherwise, you have no boarder gateway to prevent your ISP/Internet from seeing you LAN traffic. But it's typically easier to use a firewall as the gateway.

Maybe I'm misunderstanding the setup?

-apologies for typos. Quickly typed on a mobile.






Sat Oct 27 17:55:19 2012: 11403   TonyLawrence

gravatar


No, you are NOT misunderstanding. I wish you were.

Yes, I should have said MAC address, not IP - I'll correct that, but yeah, it's all glommed together and there were no VLANS until we added one last night. I'd like to do the rest, but as I said, I can't because a machine needs to be moved.







Mon Oct 29 07:39:53 2012: 11406   TonyLawrence

gravatar


And I totally forgot to mention that there's another VPN that works fine. Moreover, Internet speed tests at either end of the "bad" VPN are not problematical. None of it makes sense!

As a big storm is bearing down on both me and the customer right now (we're both in New England), he may not have any VPN's or anything else in a few hours and I may not have Internet access, so if I don't respond to any questions or comments that's probably why. This site should stay up as that's hosted in California.



Mon Oct 29 13:01:45 2012: 11407   TonyLawrence

gravatar


I saved the debug log and then did a global delete of every line that matched 255:

That helps to un-clutter a lot (don't know why I didn't think of that earlier). There was a lot of NETBIOS traffic, so I filtered out all that. That left me with nothing but some IPS drops and a few filter rules that were legit - nothing that indicates a darn thing wrong with the VPN!



Tue Oct 30 12:59:17 2012: 11408   TonyLawrence

gravatar


And then this
(link) helped cut through the log noise even more.



Tue Oct 30 13:18:35 2012: 11409   anonymous

gravatar


Apologies for the delay - hunkered down for the hurricane. At this point I'd be looking at hardware. Maybe there is something physically wrong with the switch port or cabling?



Tue Oct 30 13:23:14 2012: 11410   TonyLawrence

gravatar


If there were bad hardware, how could it only affect VPN traffic? Although the upload speed on the fiber is slightly reduced, it's still 60 Mbit and the VPN performance was more like 4Mbit..



Thu Nov 8 14:39:43 2012: 11419   NickBarron

gravatar


Has there been any progress on this issue?

Are you able to throw in two cheap boxes and put together an IPsec VPN to see if it is something related to Kerio VPN?



Thu Nov 8 23:16:11 2012: 11424   TonyLawrence

gravatar


We could, but not easily. And the other Kerio VPN's work fine..



Sun Nov 18 18:40:36 2012: 11435   TonyLawrence

gravatar


I felt it HAD to be the Windows box, so I convinced the customer to take the time to put up Control in the Linux version.

Upload speeds immediately climbed to 73 Mbs - still a bit shy of the 90Mbs download, but big, big jump. He used an old piece of hardware for this; something newer might do better.

I don't think it's Linux vs. Windows. I think it's NIC problem on the Windows box that Windows just isn't seeing.

------------------------
Kerio Connect Mailserver

Kerio Samepage

Kerio Control Firewall

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





AOL is like the cockroach left after the nuclear bomb hits. They know how to survive. (Jan Horsfall, VP of Marketing for Lycos)

Much to the surprise of the builders of the first digital computers, programs written for them usually did not work. (Rodney Brooks)








This post tagged: