Blogrolling with Perl, PHP, and mySQL

Weblogs.com tracks blogs that have been changed. If you go to their home page it lists blogs that have changed recently, and what time they changed. This information is also available as an xml file at http://www.weblogs.com/changes.xml. Adam Curry set up a macro that displays updated blogs the same way. Since I don't use the same software he does, I decided to see if I could replicate it using the tools I use: Perl, PHP, and mySQL.

In setting this up I kept things fairly simple. There is a single table in mySQL called weblogs:


CREATE TABLE `weblogs` (
  `name` varchar(50) NOT NULL default '',
  `url` varchar(200) NOT NULL default '',
  `updated` int(11) NOT NULL default '0'
)

Name is the name of the site, url is its address, and updated is a unix timestamp of when the blog was last updated. If it is 0, it has never been seen on weblogs.com

There is a perl script called 'updatedchanges.pl' that runs every hour via cron. This grabs the changes.xml file, uses XML::Simple to pull the information from it into variables and then updates the weblogs table for any url that I have in the database. At the end it touches a file, setting the accessed time to the time the script ran. Here is the script:


#!/usr/bin/perl
use LWP::Simple;
use XML::Simple;
use Data::Dumper;
use Time::ParseDate;
use DBI;

$driver = "mysql";
$database = "yourdb";
$host = "yourhost";
$user = "dbuser";
$password = "dbpassword>;
$dsn = "DBI:$driver:database=$database;host=$host";
$dbh = DBI->connect($dsn, $user, $password);
my $xml = &LWP::Simple::get("http://www.weblogs.com/changes.xml") || croak $!;

my $hashref = XMLin($xml);

#print Dumper($hashref);

#%weblog = $hashref->{weblog};
$updated = parsedate($hashref->{updated});
# print "updated on: $updated\n";
for my $thing (keys %{$hashref->{weblog}}) {
    $site = $thing;
    $url = $hashref->{weblog}->{$thing}->{url};
    $when = $hashref->{weblog}->{$thing}->{when};
    $realwhen = $updated-$when;
    if ($when < 3600) {
	$dbh->do("UPDATE weblogs SET updated=$realwhen WHERE url='$url'");
    }
}
system("touch /path/to/changes");

So, the final step for this was to write some PHP for my home page that would list out the blogs and make bold any that had been updated in the last hour. I created a file called weblogs.inc.php to include into my home page that reads as follows:


<?

/* Assumes db connection exists */
$logs = mysql_query("SELECT name,url,updated FROM weblogs ORDER BY name");
while ($r = mysql_fetch_row($logs)) { if ($r[2] > (time() - 3600)) { $name = "<B>$r[0]</B>"; } else { $name = "$r[0]"; } print "&nbsp;- <A HREF=\"$r[1]\" TARGET=\"_blank\">$name</A><BR>\n"; } ?> <BR> (Items in bold updated within the last hour using <A HREF="http://www.weblogs.com/changes.xml">changes.xml</A> from <? print date("m/d/y @B",fileatime("/path/to/changes")); ?>)<BR>

As you can see, this is fairly simplistic and doesn't include easy ways to add new blogs into the database. I plan on writing an interface to do this eventually.

I'd also like to add that I'm a relative novice when it comes to perl. If anyone can recommend ways to clean up my code I'd love the input.

Return to Gregory's Home