Girish Venkatachalam is a UNIX hacker with more than a decade of
networking and crypto programming experience.
His hobbies include yoga,cycling, cooking and he runs his own
business. Details here:
More posts by Girish Venkatachalam.
The genesis of the SpamCheetah spam buster
How did I end up doing this?
First off I should thank Tony for letting me write something close
to my heart. This is also something that I deeply believe in and spam is
a problem that all of us should get together and fight.
Many of us feel strongly about dirty spammers and we never want spam in
our inbox. Neither do we want to lose important mail at the altar of
E-mails have been around for quite some time. From the early days of the
Internet electronic messaging has existed. But the world has come a
long way since then and businesses realized the low cost and
accessibility to a large audience that e-mail gives.
E-mail authentication protocols and verification mechanisms grew in the
same culture of the Internet and consequently there is neither security
in the Internet nor in e-mail protocols. It has got to do with the
culture at the time when the Internet evolved. Since everyone used to
trust computers and other people and nobody knew that Internet would
become as widespread as it is today, people became lax about security.
E-mails have become not only a communication tool but also a very
effective business tool. Businesses rely on e-mails for everything and
if there is anything that affects their bottomline directly then e-mail
efficiency scores very high.
When that is the case companies hate reduction in employee productivity
due to unwanted mails entering into their inboxes. In fact it is hard
to predict or measure the productivity loss involved in receiving p0rn
mails or messages about Nigerian widows or bumper lotteries.
All said and done, we all know the kind of problem that spam is causing.
And my motivation to develop a product for spam control does not have
much to do with what I wrote above. I knew it had potential but I wanted
a product to start my business. I wanted something that I thought would
sell. That is all. It just happened to be this. My interests are wide
and I would have got into anything where I could add value.
Anyway now I have spent a good part of the last two years developing
this product and even before that I started learning the various spam
control techniques. After evaluating a lot of the theory I felt that
something was quite wrong about statistical filtering and content
scanning. Though Zdziarski wrote eloquently about how well his dspam
filter worked I was not quite impressed.
There were many different mechanisms like distributed checksum
computation, the idea behind Vipul's razor, advanced Bayesian filtering,
Paul Graham's articles and many other literature on the Internet that
gave me a sound understanding of the ways in which people attacked this
problem. I was confused for a long time about what greylisting meant and
how it was implemented in OpenBSD.
I was way off the mark in the beginning and I sent a mail to the mailing
list asking somewhat embarrassing basic questions. But I got very
detailed answers by one of the developers Nick Holland. A user also
testified how effective OpenBSD greylisting was in his network. Later he
also gave a neat readme telling me and everyone else in the mailing list
how to get it working.
I could instinctively feel that there was something novel in their
approach. Something different and interesting in the way they approached
the spam problem. I sat there thinking.
I could only think and do nothing more since I neither had the money nor
the resources to run my own mail server. This was until my client, a
company in Chennai that relied completely on mails for their back office
operations for big banks asked me to develop a spam filter.
It happened in this way. They had purchased a Windows based software for
hefty sum of 4 and a half lakhs Indian rupees. And like most Windows
based products, it came with a lot of strings attached and all kinds of
silly restrictions. They got fed up with this content scanning filter
since their mails were getting delayed and some key mails were getting
lost (false positives). If you want to know which product I am talking
about, mail me and I will tell you! But it is not important since all
products have defects and we cannot achieve anything by nitpicking.
So now I got a golden opportunity to try out my ideas in their network.
I sat and developed the initial version of SpamCheetah sitting in their
server room. I spent countless hours slogging in their heavily air
conditioned server room. India being a hot country and Chennai being hot
most of the time this was a nice situation.
I did not take long to develop a product. It was called 'anjal' in those
days as 'anjal' means 'letter' in Tamil. Interestingly the development
of this product was not hard. What hit me hard on my face was its
The technical details
I could develop this product without much ado, but how to deploy and
test? This technology had the restriction that it could not be tested in
a test setup. It had to go live. So I had to wait for the opportunity to
test my product in their production network.
Now they are a major customer of Citibank and we all know how major
corporations work. They demanded certification for my product. They were
scared of audit and security had to be proven by certification.
All the silly firewalls out there in the market(once again I shall not
name them) come with big bunch of various certifications. But where
could I go? How to obtain certification for an open source software?
Everyone knows that OpenBSD is the most secure OS. Everyone knows that
OpenBSD has the last word in security and I am also well aware of its
innards and the various stack protection mechanisms, its audit process,
its development culture and so on. I am a cryptographer myself, so I do
know something about security.
Bob Beck, the principal author of OpenBSD spamd sent me a certificate
from a Japanese OpenBSD developer. But it was not sufficient. I was
stuck. But they told me that if I could deploy my product somehow inside
the DMZ and behind the firewall then I did not need a certificate.
Thankfully I had advanced networking skills. I knew something about port
forwarding and routing topologies. In spite of my knowledge and
experience I had to validate my claims with netcat and several computers
on my home network. A sniffer like tcpdump also helped. I learnt that
using aliasing one can simulate several networks with just 3 computers.
I wrote about this in the local Linux User's group and I started
becoming good at networking techniques as well.
Finally I got it installed and very quickly they realized the power and
value of this technology. It has been running successfully in their
premises for close to 18 months now and they get ZERO spam. They would
get close to 50 spams in a day per mailbox but now they get none.
And they process all the banking transactions by e-mail and they get
close to 10,000 mails in a day on average. They were happy and I got
paid a good sum of money. By that I mean something with which I could
Why product development is not hard with open source tools and
But money was not the thing. The satisfaction of having a product that
is used in production by a large enterprise itself is very satisfying. I
have developed several products when working for companies but you never
get the joy of end to end development of real deployment.
Whereas in the open source world you get ample opportunities for real
life exposure for our programs. That is what attracted me to open source
years ago. Moreover my charity mindset and technical excellence found in
OpenBSD attracted me further.
I will not bore you with other details but suffice it to say that even
after having a proven product I still needed to develop the web
interface. Now I am a hacker and web programming is not something that I
knew. But I had to learn. I knew that in the long run no matter what
product I developed it had to have a web interface.
Once again my open source underpinnings and focus on simplicity
attracted me to jQuery. At that time
jQuery had not been adopted by Microsoft and it was somewhat nascent.
But I was quick to realize it's potential.
Realizing the potential of a technology is key to success. You also have
to be quick at realizing the potential of humans...then only you can
employ good people and grow. The insight you get in learning various
technologies will help you in a big way.
Never pass up an opportunity to use the right tool for the job.
I used OpenBSD spamd for my product. Greylisting has been around for a
long long time but OpenBSD did it right and combined it effectively with
tarpitting(teergrubing), blacklisting and database updation. It does not
use mysql for storing the 3 tuple involved in greylisting. OpenBSD
developed their own database. OpenBSD integrated this with OpenBSD's
firewall, pf and gave a complete solution integrated well with th OS.
All this makes a big difference. No software exists in isolation. No
software is an island. It has to work with everything else to form a
complete whole. Even in my product, SpamCheetah does only spam control.
It does not do e-mail. This is the old UNIX philosophy of do one job,
but do it well.
This gives you the ability to run any mail server behind SpamCheetah.
You are free to run MS Exchange, sendmail, qmail, postfix or whatever.
All you have to do is setup SpamCheetah properly in your network. That
requires slightly advanced networking skills. But I have given clear
instructions here. But
you need some experience in networking to follow it.
You can see a short video of OpenBSD spamd in action here. The tarpit
receives mail at 1 character per second and annoys spammers! If more and
more people use OpenBSD greylisting with stock OpenBSD or with
SpamCheetah or whatever then the spammers will go out of business!
Why? OpenBSD spamd eats up spammer resources. I am really happy to be
able to serve society by taking OpenBSD's great technology to a wider
audience. How successful SpamCheetah will prove to be, only time will
tell. Hopefully you enjoyed reading this experience of mine. I wish you success
in product development. Please mail me for any
References and further reading
SpamCheetah technology backgrounder
- Understanding the
network level behavior of spammers
- SpamCheetah's OpenBSD
- OpenBSD papers
spamd - greylisting and beyond
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Girish Venkatachalam
© 2009-11-07 Girish Venkatachalam