2005/05/23 LWP (Library for WWW in Perl)

If you want to automatically process web pages to extract data, you have a number of tools available. You can bring a web page down to your computer using "curl" or "wget"



curl http:.//aplawrence.com > mysite



Hate these ads?

If you don't really want the html, use "lynx --dump http://whatever.com > /yourstorage/whatever.txt" to get a text representation of the page. Check the man page for options you might want like "--nolist" and also see lynx alternatives

You can also easily be selective and pull only the data you want from a page with simple Perl scripts.



#!/usr/bin/perl
use LWP::Simple;  
$url = 'http://aplawrence.com";   
$content = get $url;     
print $content;


And then of course you'd process the $content as desired. It's only a little more complex if you are dealing with forms; see /Words/2005_03_05.html for a small example of that.

A book that covers LWP is reviewed at /Books/webc.html.




ad

Enter your email address for automatic notification of new posts here
(be sure to whitelist 'feedburner.com' if you use spam filtering)

Or use any RSS reader

Delivered by FeedBurner


Views for this page
Today This Week This Month This Year  Overall
464342 1,549

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

pavatar.jpg
More:
       - Web/HTML
       - Perl




Unix/Linux Consultants


http://www.breakthru.com.au SCO (Openserver and Unixware), Unix, Solaris and Linux Consulting services including: Secure Networking Solutions; Linux based Firewalls; Backup Solutions; Secure Home to Office Network Setup; Phone, Remote and On-Site Support available - Satisfaction Guaranteed!


http://thatitguy.com Business networking servers, Linux and Unix experts. In business since 1997! Windows and Exchange to Samba and Scalix migration experts.


larryi@ccamedical.com SCO OS5, Debian Linux, RedHat Linux, MySQL, Apache, AJAX development using dXport/dL4/Unibasic, Windows Connectivity, Sharing Resouces, Automation, Shell Scripting



Twitter
  • Nov 30 20:25
    I have 37,000 words of a 50,000 word project. I'd like to finish it this week..
  • Nov 30 20:05
    My wife made turkey sandwiches with stuffing and cranberry orange relish - I did not want to eat the last bite. Didn't want it to end!









Change Congress


Related Posts