(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Kerio Reseller
Printer Friendly Version

Writing a Twitter getter Widget


2007/12/19

I though it might be fun to have a little Twitter update in the sidebar, so I downloaded the Twitter Javascript Widget and popped in it. It works - I'll give it that. But it broke my W3C page validation.

Arrgh. This is so common with third party scripts - nobody seems to care much about valid html, and of course if they did they'd sometimes have to supply multiple versions (XML and HTML), so there's really not much hope..

But it annoys me, so I went looking at the Twitter API to see how difficult it would be to write my own. There's always trepidation when I do that: I do not want to download libraries, new Perl modules or anything I need to compile. Basically I hope that the API is simple and well defined. If it isn't, I have to weigh my options: do I install all their special junk and thereby make it more difficult for myself should I need or want to move to a different platform, or do I just do without it? Usually I just say the heck with it.

However, Twitter turned out to be easy enough. Basically, you can get your data with an HTTP GET. You could use "curl" from the command line, or Perl LWP, or anything else you like. You can get the file in several different formats (see Twitter API Documentation). For my needs, I chose rss, so I did "curl -u pcunix:$pass http://twitter.com/statuses/user_timeline.rss" (using my actual password for $pass). I then picked out what little I wanted with a Perl script.

Curl?? Why on earth would I do that? Why wouldn't I build the GET into the Widget? Well, think about this: if I do a "curl" every 15 minutes, I'll do 96 of them a day. If I build it into the Widget, I'd be doing many thousands of fetches per day - that's a waste of my resources and theirs. Of course I could make a more complicated Widget and cache the results somewhere, but why bother: I'll just do a curl and have the file as my cache.

I'll carry that reasoning a little farther. Why should the web page process the file? Again, that would be done thousands and thousands of times in the web page when I only really need to do it when I get the file - I strip out what I don't want and reformat the rest so it's ready to load into the web page for display:



#!/usr/bin/perl
use Time::Local;
# these are just for time conversions
%mons=(Jan => 0, Feb=>1, Mar => 2, Apr => 3, May =>
4, Jun => 5, Jul => 6, Aug => 7, Sep => 8, Oct =>
9, Nov => 10, Dec => 11);
@months=qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
#
# open the curled file
#
open(I,"/Users/apl/Desktop/twitter") or die $!;
@stuff=<I>;
close I;
#
# and create a new output file
#
# It's POSSIBLE that a web page might read this before I finish 
# writing it.  My write is small, so it's not going to 
# get a partial read, but it may get nothing.  That's ok, but 
# if I really cared about that I would put locking around 
# it here and in the web page code.
#
open(I,">/Users/apl/Desktop/twitter.fixed") or die $!;
print I "<ul>\n";
$reading=0;
$in_desc=0;
foreach(@stuff) {
  # skip until we get to the real twitters
  $reading++ if /<item>/;
  next if not $reading;
  # description lines can be more than one line
  $in_desc=1 if /<description/ ; # that's the begiining of a description
  if (/<.description>/) {
     # now in end of description, merge in any saved lines
     s/^/$saved_line /;
     s/description>/li>/g;
     $desc=$_; 
     $in_desc="";
     $saved_line="";
  }
  # save if we are still reading a description
  $saved_line .= $_ if $in_desc;
  if (/<pubDate>/) {
#
  # We want to merge the date into our output line
  # pubdate comes after description.. better code 
  # would make sure that is still true and act accordingly..
#
#
# pubdate looks like this
#<pubDate>Tue, 18 Dec 2007 21:01:29 +0000</pubDate>
#
    s/.*, //;
#
# now would be this
#18 Dec 2007 21:01:29 +0000</pubDate>
#
    s/ .0000.*//;
#
# and finally this
#
#18 Dec 2007 21:01:29                
#
# they store as GMT
# so we'll convert that into Epoch Seconds using timelocal()
#
    @date=split / /,$_;
    @time=split /:/,$date[3]; # 21:01:29
    $mon=$mons{$date[1]};  # Dec
    $seconds=timelocal($time[2],$time[1],$time[0],$date[0],$mon,$date[2]-1900);
#
# and then write it back out in our timezone
#
    @mytime=localtime($seconds - 3600 * 5);
    $time=sprintf("%s %d, %.2d:%.2d",$months[$mytime[4]], $mytime[3],
    $mytime[2],$mytime[1]);
#
# tuck it into our description
#
    $desc=~ s/<li>/<li>$time<br \/>/;
    print I "$desc\n";
    #
    # shouldn't need this unless end of description somehow went missing
    # but it doesn't hurt..
    #
    $in_desc="";
    $saved_line="";
  }


  



  last if $reading > 3;
}
print I "</ul>\n";
close I;


That's it. As noted in the code, there are things that could be done better.A change in the order that Twitter produces the rss would mess up the dates, and if I cared about the small chance of reading during a write, I could put locking around it all. But this is a relatively insignificant little "extra"; if it breaks it's not at all critical.

In the web page itself, I can just "include" it or read it in as part of a bigger script. It's a "ul" list, ready to display. Simple as that..



Comments /Programming/writing_twitter.html


Add your comments




Enter your email address for automatic notification of new posts here
(be sure to whitelist 'feedburner.com' if you use spam filtering)

Or use any RSS reader

Delivered by FeedBurner





Views for this page
Today This Week This Month This Year  Overall
159389 675

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

pavatar.jpg
More:
       - Programming
       - Perl
       - Shell
       - Blogging




Unix/Linux Consultants

Your ad here - $24.00 yearly!

http://echo3.net/ Unix/Linux Custom Applications, Web Hosting, C/C++ Programming Courses


http://www.vss3.com SCO/Caldera OpenServer, Unixware & Linux. Tarantella & Non-stop Clustering


UBB Computer Services Support for Openserver, Unixware and Linux. Windows integration with Unix/Linux servers. Hardware, Backup and Networking issues. Located near Sacramento CA, we provide onsite support throughout Northern CA and Nationwide via remote access. We are a SCO Authorized Partner and a Microlite BackupEdge Certified Reseller.







Coming Attractions

My Favorites

Change Congress


Related Posts