Home


Introduction to Perl programming

Dave Regan
regan@peak.org

Perl is a programming language which excels at handling textual information. It has been used with good success for systems administration tasks on Unix systems, World Wide Web CGI programming, as well as customizing the output from other programs.

This class is an introductory class which will cover the basics of Perl programming and use example scripts as guides. I assume that you have some experience with some other programming so that we can discuss parallels between the languages and build upon your current knowledge.

There is a 3 hour class, an assignment to work on, and a follow up 3 hour lab a week later to go over the assignments and other rough spots which the students have found.

These course notes are available on the World Wide Web at http://www.peak.org/~regan/perl/. The on-line version of these notes has the advantage of linking into the FAQs and other on-line references which are too bulky or would get out of date too quickly to print out.


A fair amount of documentation is included with each of the Perl distributions. On Unix systems, you can say man perl to get an idea of general topics, and then use perldoc to get detailed information about sections of the language and information on particular functions.

Perl was developed by Larry Wall as a programming language to take the place of shell and AWK scripts on a Unix system. Because many Unix systems are on the Internet, there is lots of information available on the Internet about Perl, including the language itself, all for free. There are also a number of books written about Perl, including one which Larry Wall is a co-author.

Finally, there are the Perl newsgroups.

	comp.lang.perl.misc	The Perl language in general.
	comp.lang.perl.announce	Announcements about Perl. (Moderated)
Use your favorite news reading program to access these groups for a discussion about Perl, and as a place to ask and answer questions.

To really learn how to program in Perl (or any other language), you must actually do some programming. I have found for myself that it is often easiest to examine other peoples programs, and to find one close to the task at hand and modify it. This class is built around this idea. There will be many parts of the language which I won't attempt to cover, as well as different sorts of applications of Perl which I won't cover. However, by the end of the class you should be able to read a simple Perl program, understand it, and modify it to meet similar applications.


Example Programs

The examples I use here are derived from Perl scripts that I actually use. We will start with simple scripts and work through scripts which use a variety of Perl's features. This will give us a chance to discuss those features, and how they compare to other programming languages you may know.

In my examples, I use italics to indicate commentary which isn't part of the original program, but help explain various pieces of Perl.

The Programming Perl book also has code fragments that are well worth looking at to get an idea of what is possible.

Before getting into example programs, I want to talk a little about regular expressions. The idea of regular expressions has been used in Unix system since very early on. They allow for sophisticated pattern matching and substitution, and are used in ed/ex/vi, awk, sed, grep, as well as Perl. Perl has made the rules even more interesting. With Perl, you can have if statements which depend upon a pattern matching a string, and you can alter strings based upon the patterns. You are well advised to look in a Perl book for the full details. But this provides a set of examples for the regular expressions commonly used in simple Perl programs.

gettime.pl is a simple program which gets the current time off of some machine which is connected via the network. Obviously, the real solution to this problem is xntp (the network time protocol), but all I needed was a quick hack to keep my clock set reasonably close to "true" time. I call this routine from the cron facility once a day to keep time correct.

add.mem is another simple program. This is a "throw away" program that I did while trying to figure out where the memory on my computer was going. It wasn't particularly effective for that, but does give an idea of how simple it is to do certain operations.

pager is a program which is used to notify a user that he has new e-mail via a pager. (It does seem as if electronics does get out of hand sometimes.) This sample program shows how to use the timestamps on files to make decisions, and a simple use of subroutines.

net.health is a program which periodically does a ping of the various computers in our company, and maintains a display of which computers are up, and which are down. This example uses subroutines in a slightly more fancy way, and also uses a package of external library code (the termcap library here). There are a number of packages available for doing the low level work of network programs, CGI scripts, and any number of other such tasks. This program also makes use of associative arrays.

news-description is a program which opens a TCP/IP network socket to a NNTP server to get the descriptions for a set of newsgroups. It provides a simple example of network programming.


Assignments

If at all possible, you should find a computer which has Perl on it and work through some simple exercises. This will let you know if you understand the material and can apply it to your needs.

Perl is available at Peak, and can be obtained from the net for the Mac, Windows, and Unix. See http://language.perl.com/info/software.html for more information on obtaining the software.

Don't worry about doing all of the exercises; simply do the ones which interest you. Alternatively, work on something else using Perl. During the lab session, we will go through peoples work and see what problems cropped up.

Of course, in order to have them available during the lab session, you will need to have them on a computer you can get to from the computer lab. If that is not possible, you can mail them to me at regan@peak.org, and I'll ensure that the programs are available for the lab session.

  • Programs in use.
    This program is quite similar to add.mem described above. If you type w at the kira prompt you will see what programs different people are running at that instant.

    Write a Perl program to go through and pick out the program name (ignore arguments to the program) and tally how many people are using each of the different programs. Sort the output by frequency of use. Assuming that the output of w looks like:

     11:01pm  up 8 days, 15:14,  6 users,  load average: 0.07, 0.10, 0.09, 2/106
     User     tty       login@  idle   JCPU   PCPU  what
     regan    ttyS3    10:27pm    33      3         -dip (reganh.ao.com) (dip)
     fred     ttyp1     Sun2pm 7days      1          (bash)
     go       ttyq3    10:28pm    31      1         -bash (bash)
     phays    ttyq4    10:29pm            3      1  vi index.html
     toby     ttyq5    10:29pm            7         w
     anne     ttyq6    10:29pm    31      1      1  -bash (bash)
    
    you would generate output which looked like:
    			bash	3
    			dip	1
    			w	1
    			vi	1
    
    To see some hints, as well as one way that this can be done, see programs.html.

  • Trim News headers.
    When you save news messages with your favorite newsreader, it will put the various headers at the start of the file such as Subject:, Lines:, From:. If you look at one of the files, you will probably also see a bunch of header lines which have no interest to you. The header lines are stored at the start of the file, followed by a blank line, followed by the body of the message. Write a short Perl program which removes all header lines except those few that you actually find useful.

    To see some hints, as well as one way that this can be done, see trim.news.html.

  • Upper case to mixed case conversion.
    A number of reports published by government offices are published in all upper case. This doesn't look particularly pleasing. Write a Perl program which takes a file in this format and converts it to mixed case. It is fair to make certain simplifying assumptions:
    • A period ends a sentence, and the next letter should be in upper case.
    • Don't worry about proper names.
    • Don't worry about acronyms which are typically written in upper case.

    To see some hints, as well as one way that this can be done, see mixed.case.html.

  • Hotlist builder.
    When I read news, I save messages that have an interest to me either because to their content, or sometimes because there is a URL (World Wide Web reference) that looks interesting. Build a perl script which goes through all of the files in the News directory of a home directory, and extract things which look like URLs.

    Next, take the URL, and get the title string from the URL in question, and take the output and build a file suitable to use as a potential hotlist for your web browser.

    • This is a non-trivial assignment.
    • There are some standard Unix programs which can make this easier.
    • The find can find all of the files given are starting directory.
    • You can use grep to find URLs: e.g. grep -i http:// file can pull all of the http URLs out.
    • Perlhas commands to remove the URL from the rest of the line, as does sed. A perl command of $line =~ s#.*(http://.*)\s.*#\1#i; should come close to removing the uninteresting text from around a URL (assuming only one URL per line).
    • You can use the sort -u will sort the list and toss out duplicates once you get to a bare URL.
    • Rather than talking straight to the HTTP server which serves the URL, use lynx -source $url to get the file. Then dig through the source looking for a title string.

    To see some hints, as well as one way that this can be done, see hotlist.html.

If you have any questions about this course, feel free to drop me a message at regan@peak.org and I'll get back to you.

PEAK


Last modified 27 May 2006
Dave Regan
http://www.peak.org/~regan/
Resume / Biography