APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Word HTML Cleanup

Referencing: http://textism.com/wordcleaner/

Boy do I hate it when somebody sends me an HTML document created by Microsoft Word. It is bloated junk, and a major pain to publish in a format that is useful here. But that's the Microsoft Way: eschew simplicity, embrace complexity, excelsior!

I usually just refuse such offerings unless the content is so good that I just can't bring myself to turn it down. If I do take it, I fuss and fume and annoy my wife by complaining about "Microsoft idiots". Was I referring to the author or the corporation? Maybe both.

I haven't test driven this tool, and who knows if it will still be there when I need it, but if you publish web pages and your contributors are wont to submit in this abominable manner, maybe you can use this.

Got something to add? Send me email.

Increase ad revenue 50-250% with Ezoic

More Articles by

Find me on Google+

© Tony Lawrence

---December 4, 2004

Next time, tell them to simply download openoffice.org, open their .doc file in that and then use that program's facilities to convert to html.

OO.org produces much nicer code. Doesn't do a perfect job, but it does it better then MS Office does!


---January 2, 2005
The XML output is much cleaner, as you would just need a SAX parser to strip out the excess namespaces, which are built cleanly.

2 cents.

Kerio Samepage

Have you tried Searching this site?

Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

I wanted to learn how to swim, so Google showed me how to turn on the water at the sink and let me splash it around a bit. They then dragged me into a helicopter, flew way out into the ocean and dumped me out. (Tony Lawrence)

This post tagged: