Introduction to Perl programming
Dave Regan
regan@peak.org
Perl is a programming language which excels at handling textual
information. It has been
used with good success for systems administration tasks on Unix systems,
World Wide Web CGI programming, as well as customizing the output
from other programs.
This class is an introductory class which will cover the basics of
Perl programming and use example scripts as guides.
I assume that you have some experience with some other programming
so that we can discuss parallels between the languages and build
upon your current knowledge.
There is a 3
hour class, an assignment to work on, and a follow up 3 hour lab a week
later to go over the assignments and other rough spots which the
students have found.
These course notes are available on the World Wide Web at http://www.peak.org/~regan/perl/.
The on-line version of these notes has the advantage of linking
into the FAQs and other on-line references which are too bulky
or would get out of date too quickly to print out.
A fair amount of documentation is included with each of the Perl distributions.
On Unix systems, you can say man perl to get an idea of general
topics, and then use perldoc to get detailed information about
sections of the language and information on particular functions.
Perl was developed by Larry Wall as a programming language to take the
place of shell and AWK scripts on a Unix system. Because many Unix systems
are on the Internet, there is lots of
information
available on the Internet
about Perl, including the language itself, all for free.
There are also a number of
books
written about Perl, including one which Larry Wall is a co-author.
Finally, there are the Perl newsgroups.
comp.lang.perl.misc The Perl language in general.
comp.lang.perl.announce Announcements about Perl. (Moderated)
Use your favorite news reading program to access these groups for
a discussion about Perl, and as a place to ask and answer questions.
To really learn how to program in Perl (or any other language), you must
actually do some programming. I have found for myself that it is often
easiest to examine other peoples programs, and to find one close to the
task at hand and modify it. This class is built around this idea.
There will be many parts of the language which I won't attempt to
cover, as well as different sorts of applications of Perl which I won't
cover. However, by the end of the class you should be able to read
a simple Perl program, understand it, and modify it to meet similar
applications.
Example Programs
The examples I use here are derived from Perl scripts that I actually use.
We will start with simple scripts and work through scripts which use a variety
of Perl's features. This will give us a chance to discuss those
features, and how they compare to other programming languages you may know.
In my examples, I use italics to indicate commentary which isn't part
of the original program, but help explain various pieces of Perl.
The Programming Perl book also has code fragments that
are well worth looking at to get an idea of what is possible.
Before getting into example programs, I want to talk a little
about regular expressions. The idea of regular expressions
has been used in Unix system since very early on. They allow for
sophisticated pattern matching and substitution, and are used
in ed/ex/vi, awk, sed, grep, as well as Perl. Perl has made
the rules even more interesting. With Perl, you can have
if statements which depend upon a pattern matching a
string, and you can alter strings based upon the patterns.
You are well advised to look in a Perl book for the full
details. But this provides a set of examples for the
regular expressions commonly used
in simple Perl programs.
gettime.pl
is a simple program which gets the current time
off of some machine which is connected via the network. Obviously,
the real solution to this problem is xntp (the network time
protocol), but all I needed was a quick hack to keep my clock set
reasonably close to "true" time. I call this routine from the
cron facility once a day to keep time correct.
add.mem is another simple program.
This is a "throw away" program that I did while trying to figure out
where the memory on my computer was going. It wasn't particularly
effective for that, but does give an idea of how simple it is to do
certain operations.
pager is a program which
is used to notify a user that he has new e-mail via a pager.
(It does seem as if electronics does get out of hand sometimes.)
This sample program shows how to use the timestamps on files
to make decisions, and a simple use of subroutines.
net.health is a program
which periodically does a ping of the various computers
in our company, and maintains a display of which computers are
up, and which are down. This example uses subroutines in a
slightly more fancy way, and also uses a package of external
library code (the termcap library here). There are a number
of packages available for doing the low level work of network
programs, CGI scripts, and any number of other such tasks.
This program also makes use of associative arrays.
news-description
is a program which opens a TCP/IP network socket to a NNTP
server to get the descriptions for a set of newsgroups.
It provides a simple example of network programming.
Assignments
If at all possible, you should find a computer which has Perl on it and work
through some simple exercises. This will let you know if you understand
the material and can apply it to your needs.
Perl is available at Peak, and can be obtained from the net for the Mac,
Windows, and Unix.
See http://language.perl.com/info/software.html for more information
on obtaining the software.
Don't worry about doing all of the exercises; simply do the ones
which interest you. Alternatively, work on something else using Perl.
During the lab session, we will go through peoples work and see what
problems cropped up.
Of course, in order to have them available during
the lab session, you will need to have them on a computer you can get
to from the computer lab. If that is not possible, you can mail them
to me at regan@peak.org, and I'll ensure that the programs are available
for the lab session.
- Programs in use.
This program is quite similar to add.mem described above.
If you type w at the kira prompt you will see what programs
different people are running at that instant.
Write a Perl program to go through and pick out the program name
(ignore arguments to the program) and tally how many people are
using each of the different programs. Sort the output by frequency
of use.
Assuming that the output of w looks like:
11:01pm up 8 days, 15:14, 6 users, load average: 0.07, 0.10, 0.09, 2/106
User tty login@ idle JCPU PCPU what
regan ttyS3 10:27pm 33 3 -dip (reganh.ao.com) (dip)
fred ttyp1 Sun2pm 7days 1 (bash)
go ttyq3 10:28pm 31 1 -bash (bash)
phays ttyq4 10:29pm 3 1 vi index.html
toby ttyq5 10:29pm 7 w
anne ttyq6 10:29pm 31 1 1 -bash (bash)
you would generate output which looked like:
bash 3
dip 1
w 1
vi 1
To see some hints, as well as one way that this can be done, see
programs.html.
- Trim News headers.
When you save news messages with your favorite newsreader,
it will put the various headers at the start of the file
such as Subject:, Lines:, From:.
If you look at one of the files, you will probably also see
a bunch of header lines which have no interest to you.
The header lines are stored at the start of the file,
followed by a blank line, followed by the body of the
message.
Write a short Perl program which removes all header
lines except those few that you actually find useful.
To see some hints, as well as one way that this can be done, see
trim.news.html.
- Upper case to mixed case conversion.
A number of reports published by government offices
are published in all upper case.
This doesn't look particularly pleasing.
Write a Perl program which takes a file in this format and
converts it to mixed case.
It is fair to make certain simplifying assumptions:
- A period ends a sentence, and the next letter should be
in upper case.
- Don't worry about proper names.
- Don't worry about acronyms which are typically written in upper case.
To see some hints, as well as one way that this can be done, see
mixed.case.html.
- Hotlist builder.
When I read news, I save messages that have an interest to me
either because to their content, or sometimes because there is
a URL (World Wide Web reference) that looks interesting.
Build a perl script which goes through all of the files in
the News directory of a home directory, and extract things
which look like URLs.
Next, take the URL, and get the title string from the URL
in question, and take the output and build a file suitable
to use as a potential hotlist for your web browser.
- This is a non-trivial assignment.
- There are some standard Unix programs which can make this
easier.
- The find can find all of the files given are starting directory.
- You can use grep to find URLs: e.g. grep -i http:// file
can pull all of the http URLs out.
- Perlhas commands to remove the URL from the rest of the line,
as does sed. A perl command of
$line =~ s#.*(http://.*)\s.*#\1#i;
should come close to removing the uninteresting text from around
a URL (assuming only one URL per line).
- You can use the sort -u will sort the list and toss out
duplicates once you get to a bare URL.
- Rather than talking straight to the HTTP server which serves
the URL, use lynx -source $url to get the file. Then
dig through the source looking for a title string.
To see some hints, as well as one way that this can be done, see
hotlist.html.
If you have any questions about this course, feel free to drop me a message
at regan@peak.org and I'll get back to you.