logowill england :: seti at home : perl stats script

So, how did I make those cool stats?

Rather simple, if you know a bit of perl. Below is the source code with (hopefully) enough comments. Follow the link above for a plain-text version. What we are doing, in a nutshell, is making an HTTP connection to the Seti@home site, grabbing the page and stuffing it in a variable. Then, we strip the HTML tags out, strip the non-numeric data out, compress the spaces, and we are left with a list that looks like this:

 118 78 1003 10 30 7 12 51 40 4 1322874 83645 1105 93 594

Those fields match up to the various bits of data we want, like how many units have been processed, etc. Then, it's just a matter of splitting that into discrete variables, and forming the output.

I loop thru each of the 3 people who's ID's I have for our team, and get the data and write it to separate files, which are read in on the previous page by PHP.

Anyway, enough talk - if you have questions, just e-mail me

Remember - some of the lines below are warpped for presentation on the web. For accurate source code, use the snagseti.txt program above.

use strict;
use LWP::Simple;

# snagseti.pl
# Version 1.2
# Author: Will England
# will@mylanders.com
# October 18, 1999
# Simple program to grab the users units done for
# a group.  Requires Perl 5 and the LWP::simple 
# package be installed.  
# USAGE: I run as a cron job every 60 minutes and 
# have the data stuffed into a text file, which is
# then read into my webpage with a PHP include()
# statement.  
# You could also have the whole web page generated 
# by this script, by just adding it to the print
# section.
# Revisions:
# 10-18-1999: Original Script (1.0)
# 10-19-1999: Improved comments, changed file path. (1.1)
# 11-06-1999: Moved the strip tags regex to a subroutine
# 11-07-1999: Added total group time
# 11-18-1999: Added sanity checking - if seti@home site is down, 
#             will not overwrite previous files.

# declarations
# These are all the numbers on the page.
my $junk;			#Strip first space, other unneeded fields
my $content;		#The whole page
my $group_content;	#The whole grouppage
#my $unitSent;		#Units Sent to you
my $unitComplete;	#Units you have completed
my $hrsComplete;	#Hours total
my $minComplete;	#minutes total
my $secComplete;	#seconds
my $tenthsComplete;	#1/10th seconds
my $avgHours;		#average hours per w/u
my $avgMin;			#and minutes
my $avgSec;			#and seconds
my $avgTenths;		#and again 1/10th
my $totalUsers;		#total registered users
my $place;			#Your ranking
my $pct;			#your percentile
my $pctDec;			#and the decimal part of the %
my @userID;			#The array of users for your group
my $user;			#the specific ID you are processing
my $out;			#output file -- defined later.
my $pctYear;		#percent of a year

# All the variables for the group thing.
# Same naming convention as above.
my ($mem_count,$t_hrs1,$t_min1,

# User Variables
# Just fill in these bits, set up the cron job, and 
# let run!
# USER ID's are in quotes, separated by commas, and have
# the @ sign escaped with a slash
# like this:("will\@mylanders.com", "nmorton\@mylanders.com") 
@userID = ("will\@mylanders.com", "nmorton\@mylanders.com", "seti\@pmgi.com"); 

# First stage is to grab the groups page and 
# calculate the total processing time.  We have
# to do this in a separate step, cause one of our
# defunct group members doesn't have an e-mail addy
# registered.

# get the member total time
# first, get the web page

$group_content = get("http://setiathome.ssl.berkeley.edu/cgi-bin/cgi?cmd=team_lookup&name=Team+Fox+Lake");

#Did anything come from the web page?
if ($group_content != "")  {
	#print ("Group Content Exists and is: $group_content\n");

	#now strip it
	$group_content = strip_tags($group_content);

	#split it
	# lines wrapped for web -- get the text file above for real source code!
	$junk,$junk) = split (/ /, $group_content);

	#calculate the total hours, minutes, seconds.
	my $T_Tenths;
	my $hold_sec;
	$T_Tenths = $t_tenths1 + $t_tenths2 + $t_tenths3 + $t_tenths4;
	if ($T_Tenths => 10) {
		use integer;
		$hold_sec = $T_Tenths / 10;
		$T_Tenths = $T_Tenths - ($hold_sec * 10);

	my $T_Sec;
	my $hold_minutes;
	$T_Sec = $t_sec1 + $t_sec2 + $t_sec3 + $t_sec4 + $hold_sec;
	if ($T_Sec => 60) {
		use integer;
		$hold_minutes = $T_Sec / 60;
		$T_Sec = $T_Sec - ($hold_minutes * 60);

	my $T_Min;
	my $hold_hour;
	$T_Min = $t_min1 + $t_min2 + $t_min3 + $t_min4 + $hold_minutes;
	if ($T_Min => 60) {
		use integer;
		$hold_hour = $T_Min / 60;
		$T_Min = $T_Min - ($hold_hour * 60);
	my $T_Hour;
	$T_Hour = $t_hrs1 + $t_hrs2 + $t_hrs3 + $t_hrs4 + $hold_hour;
	$pctYear = (($T_Min/60) + $T_Hour) / 8742;

	#stuff to a text file for later use.
	open (OUTFILE, ">/usr/home/will/public_html/seti/group_time.dat") or die "Could not open $out: $!";
	print OUTFILE (" Over $T_Hour:$T_Min:$T_Sec.$T_Tenths processed!");
	printf OUTFILE (" (%.3f years, even!) ", $pctYear);
	close (OUTFILE);

} #end is there data section

#Now, start hitting each user.
#Loop thru each userid given and kick out the results.

foreach $user(@userID) {

#get the user web page

	$content = get("http://setiathome.ssl.berkeley.edu/cgi-bin/cgi?name=$user&cmd=user_stats");

	#See if anything came across
	if ($content != "") {
		$content = strip_tags($content);
		#print ("Content Exists and is: $content\n");

		#split the resulting numbers into discrete variables.  
		($junk, $unitComplete, $hrsComplete, $minComplete, $secComplete, \
		$tenthsComplete, $avgHours, $avgMin, $avgSec, $avgTenths, $totalUsers, \
		$place, $junk, $pct, $pctDec) = split(/ /, $content);

		# We define the file name as the first 8 characters of
		# the users e-mail address.

		$out = substr($user,0,8);

		# The outfile is where the files are dumped -- you will have to change
		# this bit. We are appending the extention .dat to the first 8 characters
		# and putting it in the appropriate directory.  You'll need to have the
		# full path if it's run by cron, otherwise the files just get dumped
		# in $HOME.  

		open (OUTFILE, ">/usr/home/will/public_html/seti/$out.dat") or die "Could not open $out: $!";

		# Here be the print bit.  You can update or change it any way you like.
		# As you can see, I'm dumping it to a file for each user.
		# It could also be e-mailed to the user, dumped to a single
		# file, written out as a web page, etc.  The possibilities are
		# limitless.
		print OUTFILE ("
<table width=60%>
<th>Units Done</th><th>Time</th><th>
For an average of</th><th>Placing:</th><th>
<td align=right>
$unitComplete </td>
<td align=right>
<td align=right>
<td align=right>$place</td>
<td align=right>$totalUsers</td>
<td align=right>$pct.$pctDec%</td>

		# End printy bit.  
		# and close the file
		close (OUTFILE);
	#end the data check

# and end the loop for each user.

# 11-6-1999 moved this funky regex to a subroutine.
# see below for comments.

sub strip_tags {

# simple (?) regex to strip HTML tags.  Works for this 
# app, may break for pages with embedded comments, etc.
# (courtesy of Tom Christiansen)
	my $val = @_[0];
	$val =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs;

# replace all the newlines with spaces

	$val =~ s/\n/ /g;

# replace everything that isn't a digit with a space

	$val =~ s/\D/ /g;

# replace one or more spaces with one space

	$val =~ s/ +/ /g;

The usual suspects

[ Now -- Main -- Humor -- Art -- Food ]
[ seti@home -- Shooting -- Motorcycle -- Blog ]

Disclaimer: Anything I have to say is mine, dammnit! My employers, clients nor anyone else can take credit (or be blamed) for it.

Author: Will England (will@mylanders.com) Complaints? /dev/null

Like what you found? Kick a buck or two my way via Paypal... Send $ via PayPal

This page is a Y to K complaint.

Mostly compliant with Valid HTML 4.0! Valid CSS!

Created October, 1999      ::      Updated Wednesday, September 05 2018 @ 11:52pm