I though it might be fun to have a little Twitter update in the sidebar, so I downloaded the Twitter Javascript Widget and popped in it. It works - I'll give it that. But it broke my W3C page validation.
Arrgh. This is so common with third party scripts - nobody seems to care much about valid html, and of course if they did they'd sometimes have to supply multiple versions (XML and HTML), so there's really not much hope..
But it annoys me, so I went looking at the Twitter API to see how difficult it would be to write my own. There's always trepidation when I do that: I do not want to download libraries, new Perl modules or anything I need to compile. Basically I hope that the API is simple and well defined. If it isn't, I have to weigh my options: do I install all their special junk and thereby make it more difficult for myself should I need or want to move to a different platform, or do I just do without it? Usually I just say the heck with it.
However, Twitter turned out to be easy enough. Basically, you can get your data with an HTTP GET. You could use "curl" from the command line, or Perl LWP, or anything else you like. You can get the file in several different formats (see Twitter API Documentation). For my needs, I chose rss, so I did "curl -u pcunix:$pass http://twitter.com/statuses/user_timeline.rss" (using my actual password for $pass). I then picked out what little I wanted with a Perl script.
Curl?? Why on earth would I do that? Why wouldn't I build the GET into the Widget? Well, think about this: if I do a "curl" every 15 minutes, I'll do 96 of them a day. If I build it into the Widget, I'd be doing many thousands of fetches per day - that's a waste of my resources and theirs. Of course I could make a more complicated Widget and cache the results somewhere, but why bother: I'll just do a curl and have the file as my cache.
I'll carry that reasoning a little farther. Why should the web page process the file? Again, that would be done thousands and thousands of times in the web page when I only really need to do it when I get the file - I strip out what I don't want and reformat the rest so it's ready to load into the web page for display:
#!/usr/bin/perl
use Time::Local;
# these are just for time conversions
%mons=(Jan => 0, Feb=>1, Mar => 2, Apr => 3, May =>
4, Jun => 5, Jul => 6, Aug => 7, Sep => 8, Oct =>
9, Nov => 10, Dec => 11);
@months=qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
#
# open the curled file
#
open(I,"/Users/apl/Desktop/twitter") or die $!;
@stuff=<I>;
close I;
#
# and create a new output file
#
# It's POSSIBLE that a web page might read this before I finish
# writing it. My write is small, so it's not going to
# get a partial read, but it may get nothing. That's ok, but
# if I really cared about that I would put locking around
# it here and in the web page code.
#
open(I,">/Users/apl/Desktop/twitter.fixed") or die $!;
print I "<ul>\n";
$reading=0;
$in_desc=0;
foreach(@stuff) {
# skip until we get to the real twitters
$reading++ if /<item>/;
next if not $reading;
# description lines can be more than one line
$in_desc=1 if /<description/ ; # that's the begiining of a description
if (/<.description>/) {
# now in end of description, merge in any saved lines
s/^/$saved_line /;
s/description>/li>/g;
$desc=$_;
$in_desc="";
$saved_line="";
}
# save if we are still reading a description
$saved_line .= $_ if $in_desc;
if (/<pubDate>/) {
#
# We want to merge the date into our output line
# pubdate comes after description.. better code
# would make sure that is still true and act accordingly..
#
#
# pubdate looks like this
#<pubDate>Tue, 18 Dec 2007 21:01:29 +0000</pubDate>
#
s/.*, //;
#
# now would be this
#18 Dec 2007 21:01:29 +0000</pubDate>
#
s/ .0000.*//;
#
# and finally this
#
#18 Dec 2007 21:01:29
#
# they store as GMT
# so we'll convert that into Epoch Seconds using timelocal()
#
@date=split / /,$_;
@time=split /:/,$date[3]; # 21:01:29
$mon=$mons{$date[1]}; # Dec
$seconds=timelocal($time[2],$time[1],$time[0],$date[0],$mon,$date[2]-1900);
#
# and then write it back out in our timezone
#
@mytime=localtime($seconds - 3600 * 5);
$time=sprintf("%s %d, %.2d:%.2d",$months[$mytime[4]], $mytime[3],
$mytime[2],$mytime[1]);
#
# tuck it into our description
#
$desc=~ s/<li>/<li>$time<br \/>/;
print I "$desc\n";
#
# shouldn't need this unless end of description somehow went missing
# but it doesn't hurt..
#
$in_desc="";
$saved_line="";
}
last if $reading > 3;
}
print I "</ul>\n";
close I;
That's it. As noted in the code, there are things that could be done better.A change in the order that Twitter produces the rss would mess up the dates, and if I cared about the small chance of reading during a write, I could put locking around it all. But this is a relatively insignificant little "extra"; if it breaks it's not at all critical.
In the web page itself, I can just "include" it or read it in as part of a bigger script. It's a "ul" list, ready to display. Simple as that..
Enter your email address for automatic notification of new posts here
(be sure to whitelist 'feedburner.com' if you use spam filtering)
| Views for this page | ||||
|---|---|---|---|---|
| Today | This Week | This Month | This Year | Overall |
| 1 | 5 | 9 | 389 | 675 |
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Add your comments