Referencing: http://textism.com/wordcleaner/

Boy do I hate it when somebody sends me an HTML document created by Microsoft Word. It is bloated junk, and a major pain to publish in a format that is useful here. But that's the Microsoft Way: eschew simplicity, embrace complexity, excelsior!

I usually just refuse such offerings unless the content is so good that I just can't bring myself to turn it down. If I do take it, I fuss and fume and annoy my wife by complaining about "Microsoft idiots". Was I referring to the author or the corporation? Maybe both.

I haven't test driven this tool, and who knows if it will still be there when I need it, but if you publish web pages and your contributors are wont to submit in this abominable manner, maybe you can use this.

---December 4, 2004

Next time, tell them to simply download openoffice.org, open their .doc file in that and then use that program's facilities to convert to html.

OO.org produces much nicer code. Doesn't do a perfect job, but it does it better then MS Office does!


---January 2, 2005
The XML output is much cleaner, as you would just need a SAX parser to strip out the excess namespaces, which are built cleanly.

2 cents.

