By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. (Donald Knuth)
Today’s computers are not even close to a 4-year-old human in their ability to see, talk, move, or use common sense. One reason, of course, is sheer computing power. It has been estimated that the information processing capacity of even the most powerful supercomputer is equal to the nervous system of a snail—a tiny fraction of the power available to the supercomputer inside [our] skull. (Steven Pinker)
This article was written for Kerio Control version 8.2.1. Concepts expressed here may or may not apply to earlier or later versions.
I find that Control Content and Web Filter are more confusing for my customers than any other part of Control. After doing some testing, I think I understand why and can offer some advice that might make it easier.
BE AWARE OF BROWSER CACHING WHEN TESTING! You can easily mistakenly think that your rule did not work if your browser has cached the page you are testing.
First, let's look at a simple deny rule. You just want to deny access to foobar.com:
That rule will affect "foobar.com", "www.foobar.com" and "whatever.foobar.com/whatever/hello.html". You do NOT need to use wildcards (*.foobar.com/*). You CAN use wildcards, for example if you wanted to restrict to the top level of the domain with "foobar.com/*/*". That would allow allow "www.foobar.com/whatever.html" but not "www.foobar.com/inside/whatever.html".
That would be an extremely unusual case, though, so the general case is that you do NOT need wildcards in HTTP URL rules.
You can also elect to use Perl regexes. That would be even more unusual.
That regex, if used in a Deny rule, would stop you from going to anything in aplawrence.com/Linux, aplawrence.com/Unix (but not Unixart) as well as anything under MacOSX here. Obviously regular expressions are for very unusual cases also.
Hostname Across All Protocols
That leaves us with "Hostname across all protocols". That would block URL's like "ftp://ftp.somesite,com" (which would NOT be blocked by an HTTP rule), but it does not block command line ftp access.
The Control manual gives an example of using that for Facebook and explains:
Kerio Control sends DNS query and ensures that all IP addresses used
by Facebook will be identified.
I think what they mean is that if you use an ip address, Control will do a reverse lookup to see if it comes back with a host name that matches a rule. I have not been able to test that, though.
Kerio Web Filter
The confusion here comes when a page is being blocked and needs to be whitelisted. There are two ways to do this. One is to add it to the Kerio Web Filter Whitelist:
If you do that, you MUST use wildcards: *.xyz.com/*
However, you can add an ALLOW rule above the KWF rule and simply use xyz.com. The smart way to do that would be to have a "Whitelist" URL Group and add to that.
This can obviously cause confusion - I'd suggest using one way and sticking to it.
Kerio Support offered this:
The option Hostname across all protocols is more difficult than the reverse proxy. For SSL it reads the certificate to get hostnames. So this option is a combination of previous SSL filtering option and reverse DNS lookup for the FTP filtering (you cannot filter FTP based on URL, as final request is IP based, that's why the reverse DNS lookup is made as well).
New WebFilter Categories are also improved. One part of it is the DNS based service known as the Kerio Web Filter. The new part is the IDS detection of specific protocols, which can be used for further filtering. It means that for certain services you do not have to identify the port, but there is general inspection which can identify the communication type automatically and apply appropriate filtering module.
This is a lot of modules put together hence the confusion in settings and it is a reason why each part accepts slightly different format. Some rules can be defined by regexp, while some of these cannot (eg. FTP, and HTTPS filtering where only the hostname part is present). Most of this confusion comes from a fact you mix the HTTP filtering (where you need to filter URLs) and FTP (which is in fact IP based filtering where regexp has no effect).