XML stands for eXtendible Markup Language. It was created to offer improvements over HTML, but its use is not limited to the web, and although it can be and is used for word processing documents, it isn't limited to that either. In fact, using XML for general data exchange is one of its most common uses. What the heck does a markup language have to do with data exchange? No wonder XML is confusing!
Well, don't let it throw you. It's just text files, its stuff you can read and could even create by hand if you had to. There's no magic here, nothing that ordinary human beings can't grasp. Of course, that doesn't necessarily mean that you can understand what's going on just from looking at a particular XML file. You can certainly read it, and reading it may give you clues as to its purpose and use, but a stand-alone file is just like any lone message: you may need more information.
For example, if you overhear me say to my wife,
"Deb called. They'll be here about 10:00"
you can certainly understand every word, and can even guess a few things: "Deb" is probably female, she's coming from somewhere else and is not alone, and will be "here" at 10:00. You don't know how many people are with her, you can't be 100% sure she really is a "her", and you neither know what day I am talking about or whether I mean AM or PM. My wife, however, knows all of that (and more), because she has other information she can put together with what I've told her. Same thing with any old XML file: you can read it, and you may be able to glean quite a bit from it, but there may be lots of things left unsaid.
That's it, really. That's all XML does. HTML markup does the same thing: a "<BR>" tag in this document tells your browser to drop down a line before it does any more; it's describing how to present the data. HTML could do everything XML does, except there's no HTML tag to describe "the number of 101 key keyboards in the Springfield warehouse", and there is in XML. Or can be, of course.
Here's the funny thing about XML: of course there's no tag for those keyboards. But there are no tags for anything at all. Nothing. I don't mean there are no tags; any XML document has plenty of tags. What I mean is that none of them are predefined. Unlike "<BR>", which is defined by HTML, XML defines nothing. That's your job. Not only is it your job, but you have complete freedom to make up any old tags you like and have 'em mean anything you like. Really. You do have to follow some rules about how tags get used. For example, while many HTML tags have no closing tag to indicate the extent of their scope, XML always has to have closing tags. But the rules aren't hard, and you truly can do whatever you want.
That might not be real useful, though. Throw a bunch of tags you made up together with some data and what the heck is anyone else supposed to do with them? Try loading your "document" into a word processor like Star Office that uses XML files: how would it know what to do with your "<foo>AHA!</foo>" ? It wouldn't. That doesn't mean that YOU couldn't have some program that knew exactly what to do with that, but if XML is supposed to be for general use, we need some standards. Don't worry, we've got them. Oh, have we got standards. Lots of 'em. Definitions on top of definitions. Enough to keep you tied up for weeks and weeks just reading this stuff. In fact, you might never get around to the document itself. Just kidding. It's not that bad.
Probably the most important standard for most of us is namespaces. People who are developing XML web browsers might argue that style sheets are every bit as important, but conceptually they really are similar and we should look at the big picture: namespaces and stylesheets both help define what tags mean. Notice I said "help"? That's because, well, because a namespace really doesn't define anything in any absolute sense. All it does is give a common set of names that two or more people can agree on. The authors of the name space say that such and such tag is to be used for a certain purpose, but you could ignore that. That wouldn't make sense, would it? That's why we have standards, so we can agree on things. It is perhaps time for an example. We're going to look at the XML file that defines the RSS feed for this site.
<?xml version="1.0" encoding="iso-8859-1"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http:///aplawrence.com/rss.rdf"> <title>Site News for A.P.Lawrence Unix, Linux and Mac OS X Resources</title> <description> Resources and information for Unix/Linux, Mac OS X, and other computer related topics. Thousands of articles, reviews, consultants listings, skills tests, opinion, how-to's, suggestions and more for Unix, Linux and Mac OS X, netorking, web site maintenance and more.. </description> <admin:errorReportsto rfd:resource="mailto:[email protected]"/> <admin:generatorAgent rdf:resourec="http://aplawrence.com/Unix/simplerssfeed.html"/> <sy:updatePeriod>daily</sy:updatePeriod> <sy:updateFrequency>2</sy:updateFrequency> <sy:updateBase>2003-09-14T00:00:20+00:00</sy:updateBase> <dc:language>en</dc:language> <dc:publisher>A.P. Lawrence</dc:publisher> <dc:rights>Copyright A.P. Lawrence</dc:rights> <dc:creator>A.P. Lawrence (mailto:[email protected])</dc:creator> <dc:date>2003-09-14T00:00:20+00:00</dc:date> <link>http://aplawrence.com</link> <lastBuildDate>Tue, 14 Oct 2003 00:00:20 GMT</lastBuildDate> <image rdf:resource="http:///aplawrence.com/image21.gif"> </image> <items> <rdf:Seq> <li rdf:resource="http://aplawrence.com/Words/2003_10_14.html" /> <li rdf:resource="http://aplawrence.com/Blog/B589.html" /> <li rdf:resource="http://aplawrence.com/Blog/B588.html" /> <li rdf:resource="http://aplawrence.com/Blog/B587.html" /> <li rdf:resource="http://aplawrence.com/Blog/B586.html" /> <li rdf:resource="http://aplawrence.com/Blog/B585.html" /> <li rdf:resource="http://aplawrence.com/Unixart/newhtmlfeatures.html" /> <li rdf:resource="http://aplawrence.com/Opinion/joemckearnings.html" /> <li rdf:resource="http://aplawrence.com/Blog/B584.html" /> <li rdf:resource="http://aplawrence.com/Opinion/whattowrite.html" /> <li rdf:resource="http://aplawrence.com/Basics/basicthreads.html" /> <li rdf:resource="http://aplawrence.com/Unix/perlforkexec.html" /> <li rdf:resource="http://aplawrence.com/Blog/B575.html" /> <li rdf:resource="http://aplawrence.com/Opinion/stwinwin.html" /> <li rdf:resource="http://aplawrence.com/Blog/B574.html" /> </rdf:Seq> </items> </channel> <image rdf:about="http:///aplawrence.com/image21.gif"> <title>A.P.Lawrence Logo</title> <url>http://aplawrence.com/image21.gif</url> <link>http://aplawrence.com</link> </image> <item rdf:about="http://aplawrence.com/Words/2003_10_14.html"> <title>Words: CLI: Tech Words of the Day </title> <description></description> <link>http://aplawrence.com/Words/2003_10_14.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B589.html"> <title>Blog: Unicode </title> <description>Arrogance and laziness</description> <link>http://aplawrence.com/Blog/B589.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B588.html"> <title>Blog: Analogies (SCO Lawsuit) </title> <description>Secret recipes..</description> <link>http://aplawrence.com/Blog/B588.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B587.html"> <title>Blog: Virus Insurance? </title> <description>Little coverage</description> <link>http://aplawrence.com/Blog/B587.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B586.html"> <title>Blog: Poor Microsoft </title> <description>They get no respect</description> <link>http://aplawrence.com/Blog/B586.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B585.html"> <title>Blog: Robber Barons </title> <description>IP wants to be free!</description> <link>http://aplawrence.com/Blog/B585.html</link> </item> <item rdf:about="http://aplawrence.com/Unixart/newhtmlfeatures.html"> <title>Articles: New HTML features </title> <description>Things I didn't know about HTML</description> <link>http://aplawrence.com/Unixart/newhtmlfeatures.html</link> </item> <item rdf:about="http://aplawrence.com/Opinion/joemckearnings.html"> <title>Opinion: Windows, Unix, and Linux- Which Earns the Most by Joe McKendrick</title> <description>Salary comparisons</description> <link>http://aplawrence.com/Opinion/joemckearnings.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B584.html"> <title>Blog: Lying or Incompetent? (SCO Lawsuit) </title> <description>It doesn't really matter</description> <link>http://aplawrence.com/Blog/B584.html</link> </item> <item rdf:about="http://aplawrence.com/Opinion/whattowrite.html"> <title>Opinion: What to write about to be published at aplawrence.com </title> <description>Write about what you didn't know yesterday</description> <link>http://aplawrence.com/Opinion/whattowrite.html</link> </item> <item rdf:about="http://aplawrence.com/Basics/basicthreads.html"> <title>Basics: Understanding Threads </title> <description>Are threads better?</description> <link>http://aplawrence.com/Basics/basicthreads.html</link> </item> <item rdf:about="http://aplawrence.com/Unix/perlforkexec.html"> <title>Programming: Fork and exec with Perl </title> <description>Fork it over</description> <link>http://aplawrence.com/Unix/perlforkexec.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B575.html"> <title>Blog: SCO Road Show </title> <description>Big news, I thought</description> <link>http://aplawrence.com/Blog/B575.html</link> </item> <item rdf:about="http://aplawrence.com/Opinion/stwinwin.html"> <title>Opinion: Win-Win Negotiations in Program Management by Sanjay Tailor</title> <description>Nogiation Skills</description> <link>http://aplawrence.com/Opinion/stwinwin.html</link> </item> <item rdf:about="http://aplawrence.com/Blog/B574.html"> <title>Blog: Unix is NOT just as expensive </title> <description>It's cheaper</description> <link>http://aplawrence.com/Blog/B574.html</link> </item> </rdf:RDF>
In the "rdf" tag (which extends over a few lines) we have a bunch of "xmlns" definitions. Later on, you see
"sy:updateBase" means we're using updateBase from the "sy" namespace (http://purl.org/rss/1.0/modules/syndication/). With luck, an RSS reader that uses this file will know that I'm asking it not to check for new files too often. I say "with luck", because nothing forces that - it's all about cooperation. If you click on that link, you'll see that's just an ordinary web page that describes what this stuff is supposed to me. Other namespaces will be xml pages themselves, often referencing still more namespaces. That's where the X in eXtensible comes from: namespaces can build on other namespaces.
I have simplified things quite a bit, and completely ignored some concepts you'll need if you are really going to be developing XML applications, but the basic ideas are here. You can see that, unlike the conversation fragment of what I said to my wife, an XML document may allow you to drill back and obtain a very full understanding of its purpose and intended function.
But there's no rules, are there? What says that you can have tag X inside an A tag but not in a B tag? Ah, that gets more complicated: that's a DTD, a Document Type Definition and is more than I want to get into here. For the moment, lets say that a DTD specfies how XML tags can be put together to make a specific kind of document. The DTD is an XML document, but with very special syntax.
If you want to learn more about XML, there are plenty of books and web resources to choose from. Because of the wide variety of usage (XHTML, document creation, data exchange), you will need to focus in on the specific area you want, but as you can see from this brief introduction, there's nothing really mysterious about any of it. Just another buzzword, just another way to describe data. The beauty of it is its flexibility, and the fact that it can be extended easily.
If you found something useful today, please consider a small donation.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2009-11-06 Tony Lawrence