htdig (site indexing)

© September 2005 Tony Lawrence

htdig is indexing software similar in concept to Swish-e (I use Swish-e here).

SCO users first saw htdig in the 5.0.7 release, where it annoyed by being used to index the documentation, which can take an amazingly long time (over five hours on an old Celeron 433Mhz IDE system). It also refuses to run if you have used dhcp to obtain an ip address. I don't know if that's a SCO specific problem or general stupidity in htdig itself.

Htdig isn't usually installed out of the box with Linux, but it should be an easily build. Apparently it has also been built on Mac OS X, BSD, IRIX, HP/UX, Solaris and SunOS, so it can't be too demanding of specific OS features. That's assuming Unix, of course, which is another somewhat dubious advantage for Swish-e, which can run on Windows. Not that anyone reading this pages is likely to care, of course.


There's little doubt that htdig is more powerful than Swish-e and can handle larger data sets. Even at this site (something around 12,000 pages, give or take), Swish-e is starting to gasp a bit. Still, I think Swish-e is easier and more flexible, and expect that its ability to handle larger volume will grow - hopefully before my site gets too large for it.

One of the best pages I found for htdig resources is https://www.searchtools.com/tools/htdig.html

