In the mid 1990's, I took Microsoft Certification exams. One of my electives was Exchange Server. I was certainly no stranger to email at that time; having spent the previous decade working with Unix servers, I had installed and done troubleshooting for many a Sendmail and Qmail system. I had not had much exposure to Exchange, however, so I signed up for a technical class before attempting the exam. I felt that my strong background in other email systems would make that class and the exam fairly easy and I was correct in that assumption.
I do remember being horrified by the complexity of Exchange. Panel after panel of arcane settings meant a lot of memorization would be required for details that I knew I would never run into with my customer base. Exchange really is designed for very large companies with far flung offices and complex needs. I deal with much smaller businesses who at most might have a small branch office or two. For my customers, Exchange would have been gross overkill.
Yet smaller companies did deploy Exchange. I remembered a passing conversation I had with a fellow geek who had been out of work a few months. We didn't know each other well, but we crossed paths socially now and then and he had told me that he had finally found a new job as a "Junior Exchange Administrator" - joining a team of four dedicated administrators, he said. I assumed he must have found work at a very large company if they had "Junior" Exchange Administrators, but no, the company wasn't all that large. Interestingly, this was still the mid 1990's, before email was ubiquitous, so the raw volume of email was still small - why did they need so many people?
In the Unix world I was accustomed to, the general system administrator would usually be responsible for mail as part of their duties. While there may have been some business somewhere that needed dedicated people for this function, I'd never been exposed to any company that large and if I had, I'm sure I would have recognized their name. I couldn't imagine why a small Connecticut based company that I had never heard of would need four people with such specific titles.
I still can't understand it, by the way. It seems to be true: a Google search for "Junior Exchange Administrator" turns up many millions of results. In fairness, quite a few are actually looking for System Administrators and Exchange is just mentioned in passing, but there are a high number of results that seem to be looking for that specific need. Contrast that with a search for "Junior Sendmail Administrator" - far fewer results, but more importantly none of them seem to contain "Junior" in conjunction with Sendmail at all. That doesn't help me understand why these people are needed, but it might be some proof that they really are.
Oh, well. My intent here is to talk about Kerio and Exchange, so let's look at that.
Traditional email systems like Sendmail and Qmail were only responsible for storing mail temporarily. Their job was to accept mail and transfer it to wherever it belonged. For local users, that usually meant delivering to a Unix "mbox" format (a flat file to which new messages were added as they arrived). However, as users demanded more flexibility (such as folders stored on the server), alternative schemes like Maildir filled the gap with programs that would work with existing mail servers to provide the desired features.
Of course more unified approaches were soon developed, but whether there is one unified system or add ons, something has to be done about storing the mail messages and providing access to them.
The problem with email is that there is a lot of it. I'm hardly an unusual user; I sometimes see more than 10,000 email messages a month. Storing that email for my access is problematical.
There are three basic approaches. One is to simply store the emails as separate files in file system directories, where each directory represents an email folder. This is the approach Kerio uses: all messages that belong in my INBOX will be found in a directory called "Inbox" (specifically, _STORE_/mail/DOMAIN/USER/INBOX/#msgs). My sent mail will be in another directory named "Sent Items" and so on. Various indexes are built around those messages.
The problem with that approach is that the underlying file system is stressed by very large directories. Although file system designers have begun using database filesystems, these are not common and have had consistency issues that make users reluctant to use them. Add-ons like Linux dir_index which uses hashed b-trees to speed up lookups in large directories can crash - complexity seems to always mean more places for things to go wrong. There may come a day when a native file system can reliably and efficiently handle very large directories, but that day doesn't seem to be here yet. Reliable (simple flat directory files) will cause noticeable slowness as new files are added and when seeking to find existing entries. More efficient structures can crash at the filesystem level, causing difficulties for all applications.
A slight modification is to store individual emails in file system directories, but to scatter them across multiple subdirectories (the "Maildir" method). Your "INBOX" might actually be ten or more separate directories, each containing a small number of individual messages. External indexes or databases help accessing programs find the desired messages. This approach adds complexity to access, but does help eliminate underlying stress and slowness.
Finally, systems like Exchange use a database file that stores all messages. This avoids underlying filesystem issues - the only limitation is the file size limit, which on most modern filesystems is larger than any currently imaginable storage device. The concept could of course be broken up to one database per user, but Exchange doesn't do that. The disadvantage is again complexity, which risks crashes. Third party tools also become more difficult - rather than accessing messages at the filesystem level, any such tools have to make requests of the database server itself.
It seems that we really can't avoid problems. The simple individual file approach has performance issues if users allow their mailboxes to grow unchecked, while pure database approaches make external access more difficult and can cause data loss from database malfunction.
I prefer the file approach. External indexes/databases can always be rebuilt from scratch (Kerio actually has a regular sweep that looks for such problems and rebuilds automatically). Having access to individual messages makes it easy to search mail using operating system tools for compliance or extended functionality. This is one of the many reasons that I began selling and recommending Kerio Connect Mailserver.
I'm not the only person who feels that way. It's interesting to read the comments at Exchange TCO comparison. It's not that one large database is "wrong"; it's just that it has its own issues. Ideally, a robust and efficient filesystem would provide the best of both worlds, but we don't have that yet.
More information comparing Kerio and Exchange: "Kerio Connect - The Exchange Alternative".
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Anthony Lawrence
Find me on Google+
© 2012-03-19 Anthony Lawrence