APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Website Infrastructure - Building your web site

If you are not using a Content ManagementSystem, you will be doing all that work yourself. It's your decision where your pages will live, whether they will be static or dynamic or something in between, whther you'll be using Perl or PHP or something else entirely.

Expect to make mistakes.

There's an adage in programming that says "Build one to throw away - because you'll have to anyway". In other words, start out with the expectation that you are going to bumble, and be prepared to throw out everything and start over, because it's unlikely you'll be happy with your first shot at it. This applies to your web site just as much and perhaps even more.

When I first started writing web pages at aplawrence.com, they were always static pages: straight HTML. If there was any CGI at all, it was only to handle forms. No doubt many beginning web master start out that way, and if the site never grows beyond a handful of pages, that's probably fine. However, as the site grows in complexity, static pages become burdensome and quite annoying to maintain. When you have a dozen pages, restructuring them to add some new feature or to redesign their look is not difficult, but if you end up with hundreds or even thousands of pages (I have over 5,000 pages at my aplawrence.com site), even minor changes are nothing you'd want to do manually.

Because of that, you'll probably want to design some sort of dynamic generation - in other words, you'll build your own CMS software.

One aside before we dive in completely: there is a technical difference between dynamic pages and generated pages that the rest of this article is going to generally ignore. A dynamic page has imbedded code (Javascript, Server Side Includes, or whatever) or may be generated completely from a CGI script. A generated page can be just static HTML as it sits on the server, but that HTML very well could have been created by a script or other program (and of course nothing prevents such a page from also including dynamic content). What sort of scheme you use to produce pages varies with your needs and preference; I use all methods myself and you probably will too.

If you have hundreds or thousands of pages, you have no choice. Most of the process has to be automatic or you'll be tied up with boring repetitive tasks or more likely just never make the changes you know need to be made.

For example, let's look at the various things that need to happen when I add a page to my main site:

First, it needs to appear in one or more index pages. There's the main page, which shows the fifteen most recent articles, and of course the overall sitemap has to know about the new page. It may be categorized to belong in multiple subject areas also, so it has to appear in each of those. Of course the RSS file has to be updated, as does Google's gmap file, and of course Google, Technorati and other tracking sites have to be told that there are new pages.

Then there is the article itself. Advertising will appear in it, but what kind? Is it suitable for more than one ad? Should it only use banners, never use banners? Quite possibly it is related to previously written articles; a reader should be able to see a list of related posts and of course any of those need to include this new post in their list of related reading.

Of course I have a search engine at that site. For small sites, you can get away with just index pages or put a Google site search up, but when you get bigger, you probably need a customized search. I use Swish-e, with some customization of that, too, so the new posts need to be melded into that database.

Each article displays site index navigation links at the top. Of course, I change those from time to time, and again I don't want to have to update every article.

Similarly, each article contains a copyright notice that I may want to adjust, and each article contains style sheet information that determines font sizes, etc. I wouldn't want to have to go through every single article because I decided that the color of my <H2> tags should change!

All of this is handled by Perl scripts. The generation of the style sheet classes and the copyright notice is just simple SSI (Server Side Includes): a line in each file has this, for example:


<!--#include virtual="/cgi-bin/copyright.pl" -->

As you may already know, such a line (on a server that supports SSI) will cause the displayed web page to have the text generated by the script that it calls. The "copyright.pl" generates the boilerplate copyright, and a similar line generates the style sheets.

The rest of it all happens because one line that describes the article is added to a data file that a number of other programs read. That data file contains information about the post: who wrote it, when, who has the copyright and what are its terms, what kind of advertising should appear in it, and so on. The imbedded ssi scripts read that file and adjust their behavior accordingly.

By the way, you'll see a lot of stuff on the web warning you against using cgi and ssi. "Too slow", "bog you down", all that sort of thing. Yes, if you became extremely popular, you might run into problems here. I haven't reached that point yet - that aplawrence.com site has about 6,000 visitors a day and hums along without problem. Keep in mind also that hardware gets faster all the time; what would drag a machine to its knees five years ago barely causes modern hardware to breathe heavily. That trend of course continues. Someday I might have to go to a database driven system, but honestly, I doubt it. I expect that faster hardware will let me keep using my "inefficient" cgi for a long, long time.

Every index page of course also gets information from this file by way of a Perl script, and decides whether or not the article belongs in its listing. In the same way, "related" listings determine if they are related from this file.

This kind of structure lets me control that site fairly easily, but I tinker with it constantly and am never entirely happy. One of the things I wished I had planned for earlier is separating content from structure. I do have a lot in place toward that goal, but ideally I'd like to have more.

If you are just starting out, you need to think about this sort of thing. The web will be changing, and you'll need to present your content in different ways. You may need to change from html to pure xml at some point - if you design now with that possibility in mind, it will be easier for you to react to needed changes.



Got something to add? Send me email.





Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Tony Lawrence



Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





er.

We must be very careful when we give advice to younger people: sometimes they follow it! (Edsger W. Dijkstra)












This post tagged: