1. Introduction
Perl was invented by Larry Wall. He called it Practical Extraction and Reporting Language (he also calls it Pathologically Eclectic Rubbish Lister). What started as an exercise in unifying multiple tools used to write scripts to make routine tasks of a system adminstrator evolved into a powerful scripting language with lots of followers.
In all fairness, Perl (written always as Perl and not as PERL), is now treated as a generic programming language, though its early beginnings as a melting pot of multiple computing paradigms still make it possible to write undecipherable programs! We will try to get an introduction to Perl and its prowess as a text manipulation language without trying to write cryptic programs.
According to Larry Wall, the parents of Perl are
COMPUTER SCIENCE
LINGUISTICS Perl COMMON SENSE
ART
So, Perl is a computer language that helps to implement some common sense with help from the principles of computer science in an artistic way using common linguistic constructs.1.1 Short History
Like mentioned before, Perl was born as a tool to aid system administrators. In 1986, Larry Wall was asked to build a bi-coastal CM system in a very short time. That, he did and then his manager asked to produce reports from the system. Awk in those days did not have the capability to manage multiple input files and hence the new language was born.
Primarily, the new language was aimed at getting things done quickly with data from and to multiple files.
1.2 Evolution
From a quick hack by one system administrator, Perl has grown into a full-fledged language. It is being developed and enhanced continuously by hundreds of programmers around the world. One big step in earning recognition was the addition of regular expression engine. Now, the regular expression capabilities of Perl are so well known (especially since version 5.0), that it is being used in other languages like Python as Perl5 regex'es.
The growth of Internet also complemented Perl. The initial attempt at providing dynamic content was through CGI (even now CGI is used extensively), and Perl's remarkable text handling features made it a quick fit. CGI programming is now synonymous with Perl programming.
CPAN - Comprehensive Perl Archive Network, was set up to share Perl code. Perl supports modules and chances are that for 99% of the programming requirements, there is already a tested module in CPAN (for the remaining 1%, write modules and contribute to CPAN!). Using modules really mask the complexities of adhering to pre-defined standards and frees you to concentrate on your tasks - no point in re-inventing the wheel. Now, you have modules which handles graphics, CGI etc...
You can also embed Perl code in your C/C++ programs. A very popular embedded Perl architecture is mod_perl for Apache web server.
JAPH is a project to get Java and Perl working together.
1.3 Relevance
Data manipulation
Perl can handle strings, dates, binary data, database connectivity, streams, sockets and many more. This ability to manipulate multiple data types help immensely in data conversion (and by the way, it is much faster than PL/SQL!). Perl also has provision for lists (or arrays) and for hashes (associative arrays). Perl also supports references, which are similar to the pointers in C. Lists, hashes and references together make it possible to define and manipulate powerful custom-defined data-types.
Glue language
Perl does not differentiate between files and pipes. So, it makes it very easy to use Perl as a glue language. Suppose you have a sed script, the output of which is to be given to a Perl script. You can do this the UNIX way,
sedscript | perlscript
or the perl wayperlscript
open(FH,"sedscript|") or die "could not open sedscript\n"l
...
This really helps when people want to migrate from traditional UNIX tools like Awk, sed, grep etc... You can use these tools straightaway instead of worrying on how to do the same thing entirely in Perl.In this aspect, Perl is just like shell. However, we must consider other features of Perl, which shell simply cannot provide easily.
CGI
CGI.pm. Period. Almost all CGI programs written today are using the CGI.pm module from CPAN. Even before this was written, people used to use Perl extensively for CGI programming. CGI.pm made the process streamlined and easy, even for beginners. The graphics library GD is used extensively in producing dynamic web charts.
Quick coding
The ease with which Perl can be employed to write programs quickly cannot be overstressed. A disturbing fact about this is that such quick code can tend to be dirty and quickly get out of hand if you keep extending it! Most of the time, you must control your urges to over-extend short programs! But, as a prototyping tool, or as a fast reporting/text-processing tool, Perl is immensely helpful.
Two very good tools worth mentioning in this context are s2p and a2p tools which come with the Perl distribution. s2p converts a sed script to Perl script and a2p converts from Awk scripts. These two help a lot in extending sed and awk scripts.
Portability
Most of the Perl code will run without any change in Unix or Windows or Macintosh. Typical changes you might have to make include specifying file paths and use of low-level OS specific functions.
1.4 Installation
Just go to http://www.perl.com and download the source or pre-compiled binaries. Installation typically includes extracting the binary and then changing your PATH variable to reflect where Perl executable resides. Even when you want to compile Perl from scratch, it is a simple job.
1.5 Similarities to common languages/tools
Perl has a remarkable resemblance to the syntax of C, AWK, SED and SHELL.
C
99% of code looks like C code. So, it is very easy for C programmers to switch to Perl. And believe me, the code as you go philosophy of Perl really makes C programmers happy - especially for small programs. All C functions that are available through standard libraries are available with little or no change at all in Perl.
AWK & SED
The string processing strategy of Perl is very similar to that of Awk and sed, making it easy to migrate.
Shell
Again, the commenting scheme, variable naming scheme etc of Perl look similar to that of Shell. Many shell utilities like grep, tr etc are available as functions within Perl.
2. Tutorial
Majority of the contents of this tutorial section were written by Nik Silver, at the School of Computer Studies, University of Leeds, UK. Assuming working knowledge of any programming language, we will now try to see what Perl programs look like.
After you've entered and saved the program make sure the file is executable by using the command
The most basic kind of variable in Perl is the scalar variable. Scalar variables hold both strings and numbers, and are remarkable in that strings and numbers are completely interchangeable. For example, the statement
Perl uses all the usual C arithmetic operators:
Array variables have the same format as scalar variables except that they are prefixed by an @ symbol. The statement
To remove the last item from a list and return it use the pop function. From our original list the pop function returns eels and @food now has two elements:
It is also possible to assign an array to a scalar variable. As usual context is important. The line
Arrays can also be used to make multiple assignments to scalar variables:
To define an associative array we use the usual parenthesis notation, but the array itself is prefixed by a % sign. Suppose we want to create an array of people and their ages. It would look like this:
An associative array can be converted back into a list array just by assigning it to a list array variable. A list array can be converted into an associative array by assigning it to an associative array variable. Ideally the list array will have an even number of elements:
Associative arrays do not have any order to their elements (they are just like hash tables) but is it possible to access all the elements in turn using the keys function and thevalues function:
When keys and values are called in a scalar context they return the number of key/value pairs in the associative array.
There is also a function each which returns a two element list of a key and its value. Every time each is called it returns another key/value pair:
When you run a perl program, or any script in UNIX, there will be certain environment variables set. These will be things like USER which contains your username and DISPLAY which specifies which screen your graphics will go to. When you run a perl CGI script on the World Wide Web there are environment variables which hold other useful information. All these variables and their values are stored in the associative %ENV array in which the keys are the variable names. Try the following in a perl program:
More interesting possibilities arise when we introduce control structures and looping. Perl supports lots of different kinds of control structures which tend to be like those in C, but are very similar to Pascal, too. Here we discuss a few of them.
To go through each line of an array or other list-like structure (such as lines in a file) Perl uses the foreach structure. This has the form
The next few structures rely on a test being true or false. In Perl any non-zero number and non-empty string is counted as true. The number zero, zero by itself in a string, and the empty string are counted as false. Here are some tests on numbers and strings.
You can also use logical and, or and not:
Perl has a for structure that mimics that of C. It has the form
Here is a program that reads some input from the keyboard and won't continue until it is the correct password
To test the opposite thing we can use the until statement in just the same way. This executes the block repeatedly until the expression is true, not while it is true.
Another useful technique is putting the while or until check at the end of the statement block rather than at the beginning. This will require the presence of the do operator to mark the beginning of the block and the test at the end. If we forgo the sorry. Again message in the above password program then it could be written like this.
Modify the program from the previous exercise so that each line of the file is read in one by one and is output with a line number at the beginning. You should get something like:
It is also possible to include more alternatives in a conditional statement:
The open function opens a file for input (i.e. for reading). The first parameter is the filehandle which allows Perl to refer to the file in future. The second parameter is an expression denoting the filename. If the filename was given in quotes then it is taken literally without shell expansion. So the expression '~/notes/todolist' will not be interpreted successfully. If you want to force shell expansion then use angled brackets: that is, use <~/notes/todolist> instead.
The close function tells Perl to finish with that file.
There are a few useful points to add to this discussion on file-handling. First, the open statement can also specify a file for output and for appending as well as for input. To do this, prefix the filename with a > for output and a >> for appending:
Second, if you want to print something to a file you've already opened for output then you can use the print statement with an extra parameter. To print a string to the file with the INFO filehandle use
Third, you can use the following to open the standard input (usually the keyboard) and standard output (usually the screen) respectively:
expression reads in the file entirely in one go. This is because the reading takes place in the context of an array variable. If @lines is replaced by the scalar $lines then only the next one line would be read in. In either case each line is stored complete with its newline character at the end.
Modify the above program so that the entire file is printed with a # symbol at the beginning of each line. You should only have to add one line and modify another. Use the $"variable. Unexpected things can happen with files, so you may find it helpful to use the -w option.
A regular expression is contained in slashes, and matching occurs with the =~ operator. The following expression is true if the string the appears in variable $sentence.
We could use a conditional as
In an RE there are plenty of special characters, and it is these that both give them their power and make them appear very complicated. It's best to build up your use of REs slowly; their creation can be something of an art form.
Here are some special RE characters and their meaning
There are even more options. Square brackets are used to match any one of the characters inside them. Inside square brackets a - indicates "between" and a ^ at the beginning means "not":
A vertical bar | represents an "or" and parentheses (...) can be used to group things together:
Here are some more special characters:
Clearly characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash. So:
As was mentioned earlier, it's probably best to build up your use of regular expressions slowly. Here are a few examples. Remember that to use them for matching they should be put in /.../ slashes
Previously your program counted non-empty lines. Alter it so that instead of counting non-empty lines it counts only lines with
In the above case the parameters are acceptable but ignored. When the subroutine is called any parameters are passed as a list in the special @_ list array variable. This variable has absolutely nothing to do with the $_ scalar variable. The following subroutine merely prints out the list that it was called with. It is followed by a couple of examples of its use.
Result of a subroutine is always the last thing evaluated. This subroutine returns the maximum of two input parameters. An example of its use follows.
The @_ variable is local to the current subroutine, and so of course are $_[0], $_[1], $_[2], and so on. Other variables can be made local too, and this is useful if we want to start altering the input parameters. The following subroutine tests to see if one string is inside another, spaces not withstanding. An example follows.
Also add another routine to generate an Excel CSV file report, in addition to the normal report.
Perl implements a class using a package, but the presence of a package doesn't imply the presence of a class. A package is just a namespace. A class is a package that provides subroutines that can be used as methods. A method is just a subroutine that expects, as its first argument, either the name of a package (for ``static'' methods), or a reference to something (for ``virtual'' methods).
A module is a file that (by convention) provides a class of the same name (sans the .pm), plus an import method in that class that can be called to fetch exported symbols. This module may implement some of its methods by loading dynamic C or C++ objects, but that should be totally transparent to the user of the module. Likewise, the module might set up an AUTOLOAD function to slurp in subroutine definitions on demand, but this is also transparent. Only the .pm file is required to exist.
Modules do have an apparent disadvantage. Speed. Since modules tend to be generic in nature, the code contained within tend to be large. And most of the time, you might be using just 10% of the features modules provide. In such cases, and if a second or two in the overall run-time makes a difference, you probably want to code manually.
Even in such situations, the easiest way is to download the module, see what you don't require and delete it. An easier and more elegant solution is to selectively import functions from the module.
CPAN provides guidelines on writing modules. So, if you think you have some code that nobody else has written (rare chance!) and can be modularized, do so by all means and submit to CPAN.
The DBI requires one or more driver modules to talk to databases. Oracle, Access and ODBC drivers might be of interest to us.
Please note that DBI only standardize the database interaction process. If you use Oracle driver and write SQL specific to Oracle, don't expect to port your project smoothly to Informix, just by changing the driver!
Python is also a cleaner language in that it does not generally allow you to be adventurous with data types. Consequently, Python code is much easier to maintain, as it gets bigger. Python is designed to be extensible. So, enforcing standards and extending the capabilities of the language are easier. With the advent of Java and increased OOP awareness, Python is very popular these days. Python has modules for GUI programming in Unix and Windows environments and that is one area where it is catching up.
It is a matter of personal preference to choose between Python and Perl. Generally, people who prefer C to C++ opt for Perl and C++ lovers go for Python. VB and ASP programmers also can migrate to Python smoothly, which might not be the case with Perl.
Personally, for all the elegance of Python, I do not program in that because of one reason. In Python, the blocks are specified by indentation and not by any construct like {} or BEGIN-END. This can really be irritating if your FTP client or editor decides to translate the file! An easy way out is to add {} as comments (like #{ ... #} !)
A good feature of Python is that the interpreter can be used interactively - like shell. This enables one to test out each line of code thoroughly. A similar application called perlshell is also available in Perl.
A 100% pure Java implementation of Python called JPython is the latest news on Python. This enables one to create Java byte-code out ot Python code.
Tcl has been around longer than Perl or Python and coupled with its GUI toolkit, Tk (hence the name Tcl/Tk) is very popular among Unix crowd.
Another recent addition to Tcl that attracts interest is Expect. Expect is a tool for automation - specifically for those tasks which needs interaction. A simple example of an Expect script could be logging in to a remote machine, running scripts there and logging out. Expect easily makes the life of system administrators much more pleasant!
Tcl/Tk plugins for browsers are also available. These can run Tclets inside your browser - many Tclets are already available to produce stunning GUI's. Tclets are probable alternates for java applets.
Visit http://www.php.net for more information.
PHP's popularity has been increasing almost exponentially. The only negative aspect was that mod_perl scripts execute faster. Now in version 4.0, PHP is driven by a completely rewritten engine called Zend and the performance gap is narrowing down a lot. Even the older versions are sufficiently fast to handle a medium complexity site of dynamic content.
My personal opinion is that, if you want to create interactive/dynamic web site without learning the complexities of CGI, PHP is the way to go - especially if you don't want to spend thousands to purchase ColdFusion or ASP. PHP can also be installed as a separate interpreter program like Perl or Awk. Combined with the ease with which you can write powerful programs in PHP and its easy integration with databases (Oracle/Sybase/Informix/ODBC/PostgreSQL/MySQL...), graphics libraries, network functions, e-mail etc, it is a viable and better choice to even Pro*C !
2.1 First Step
Ever since Kernighan and Ritchie came out with C programming language, people have started learning almost any programming language with the obligatory "Hello World" program. Let us do the same!Hello World!
Here is the basic perl program that we'll use to get started.
#! /usr/local/bin/perl
#
# prints a greeting.
#
print 'Hello world.'; # Print a message
Comments
A common Perl-pitfall is to write cryptic code. In that context, Perl do provide for comments, albeit not very flexible. Perl treats any thing from a hash # to the end of line as a comment. Block comments are not possible. So, if you want to have a block of comments, you must ensure that each line starts with #.Statements
Everything other than comments are Perl statements, which must end with a semicolon, like the last line above. Unlike C, you need not put a wrapping character \ for long statements. A Perl statement always ends with a semicolon.2.2 Running Perl
Type in the example program using a text editor, and save it. The first line of the program is a typical shell construct, which will make the shell start the interpreter and feed the remaining lines of the file as an input to the interpreter.After you've entered and saved the program make sure the file is executable by using the command
chmod u+x progname
at the UNIX prompt, where progname is the filename of the program. Now, to run the program, just type any of the following at the prompt.perl progname
./progname
progname
If something goes wrong then you may get error messages, or you may get nothing. You can always run the program with warnings using the commandperl -w progname
at the prompt. This will display warnings and other (hopefully) helpful messages before it tries to execute the program. To run the program with a debugger use the commandperl -d progname
When the file is executed Perl first compiles it and then executes that compiled version. Unlike many other interpreted languages, Perl scripts are compiled first, helping you to catch most of errors before program actually starts executing. In this context, the -w switch is very helpful. It will warn you about unused variables, suspicious statements etc.2.3 Scalars
Perl supports 3 basic types of variables, viz., scalars, lists and hashes. We will explore each of these little more.The most basic kind of variable in Perl is the scalar variable. Scalar variables hold both strings and numbers, and are remarkable in that strings and numbers are completely interchangeable. For example, the statement
$age = 27;
sets the scalar variable $age to 27, but you can also assign a string to exactly the same variable:$age = 'Twenty Seven';
Perl also accepts numbers as strings, like this:$priority = '9';
$default = '0009';
and can still cope with arithmetic and other operations quite happily. However, please note that the following code is a bit too much to ask for!$age = 'Twenty Seven';
$age = $age + 10;
For the curious, the above code will set $age to 10. Think why.In general variable names consists of numbers, letters and underscores, but they should not start with a number and the variable $_ is special, as we'll see later. Also, Perl is case sensitive, so $a and $A are different.Operations and Assignment
Perl uses all the usual C arithmetic operators:
$a = 1 + 2; # Add 1 and 2 and store in $a
$a = 3 - 4; # Subtract 4 from 3 and store in $a
$a = 5 * 6; # Multiply 5 and 6
$a = 7 / 8; # Divide 7 by 8 to give 0.875
$a = 9 ** 10; # Nine to the power of 10
$a = 5 % 2; # Remainder of 5 divided by 2
++$a; # Increment $a and then return it
$a++; # Return $a and then increment it
--$a; # Decrement $a and then return it
$a--; # Return $a and then decrement it
and for strings Perl has the following among others:$a = $b . $c; # Concatenate $b and $c
$a = $b x $c; # $b repeated $c times
To assign values Perl includes$a = $b; # Assign $b to $a
$a += $b; # Add $b to $a
$a -= $b; # Subtract $b from $a
$a .= $b; # Append $b onto $a
Note that when Perl assigns a value with $a = $b it makes a copy of $b and then assigns that to $a. Therefore the next time you change $b it will not alter $a.Other operators can be found on the perlop manual page. Type man perlop at the prompt.Interpolation
The following code prints apples and pears using concatenation:$a = 'apples';
$b = 'pears';
print $a.' and '.$b;
It would be nicer to include only one string in the final print statement, but the lineprint '$a and $b';
prints literally $a and $b which isn't very helpful. Instead we can use the double quotes in place of the single quotes:print "$a and $b";
The double quotes force interpolation of any codes, including interpreting variables. This is a much nicer than our original statement. Other codes that are interpolated include special characters such as newline and tab. The code \n is a newline and \t is a tab.Exercise
This exercise is to rewrite the Hello world program so that (a) the string is assigned to a variable and (b) this variable is then printed with a newline character. Use the double quotes and don't use the concatenation operator.2.4 Lists (Arrays)
A slightly more interesting kind of variable is the list variable which is an array of scalars (i.e. numbers and strings). From now on, we will use the terms list and array interchangeably.Array variables have the same format as scalar variables except that they are prefixed by an @ symbol. The statement
@food = ("apples", "pears", "eels");
@music = ("whistle", "flute");
assigns a three element list to the array variable @food and a two element list to the array variable @music.The array is accessed by using indices starting from 0, and square brackets are used to specify the index. The expression$food[2]
returns eels. Notice that the @ has changed to a $ because eels is a scalar.Array assignments
As in all of Perl, the same expression in a different context can produce a different result. The first assignment below explodes the @music variable so that it is equivalent to the second assignment.@moremusic = ("organ", @music, "harp");
@moremusic = ("organ", "whistle", "flute", "harp");
This should suggest a way of adding elements to an array. A neater way of adding elements is to use the statementpush(@food, "eggs");
which pushes eggs onto the end of the array @food. To push two or more items onto the array use one of the following forms:push(@food, "eggs", "lard");
push(@food, ("eggs", "lard"));
push(@food, @morefood);
The push function returns the length of the new list. So does $#food !To remove the last item from a list and return it use the pop function. From our original list the pop function returns eels and @food now has two elements:
$grub = pop(@food); # Now $grub = "eels"
It is also possible to assign an array to a scalar variable. As usual context is important. The line
$f = @food;
assigns the length of @food, but$f = "@food";
turns the list into a string with a space between each element. This space can be replaced by any other string by changing the value of the special $" variable. This variable is just one of Perl's many special variables, most of which have odd names.When you get overloaded with oddity, use the English module which lets you name these variables in more user-friendly (i.e. to English-speaking people) way.Arrays can also be used to make multiple assignments to scalar variables:
($a, $b) = ($c, $d); # Same as $a=$c; $b=$d;
($a, $b) = @food; # $a and $b are the first two
# items of @food.
($a, @somefood) = @food; # $a is the first item of @food
# @somefood is a list of the
# others.
(@somefood, $a) = @food; # @somefood is @food and
# $a is undefined.
The last assignment occurs because arrays are greedy, and @somefood will swallow up as much of @food as it can. Therefore that form is best avoided.Finally, you may want to find the index of the last element of a list. To do this for the @food array use the expression$#food
Displaying arrays
Since context is important, it shouldn't be too surprising that the following all produce different results:print @food; # By itself
print "@food"; # Embedded in double quotes
print @food.""; # In a scalar context
2.5 Hashes (Associative Arrays)
Ordinary list arrays allow us to access their element by number. The first element of array @food is $food[0]. The second element is $food[1], and so on. But Perl also allows us to create arrays which are accessed by string. These are called associative arrays or hashes.To define an associative array we use the usual parenthesis notation, but the array itself is prefixed by a % sign. Suppose we want to create an array of people and their ages. It would look like this:
%ages = ("Michael Caine", 39,
"Dirty Den", 34,
"Angie", 27,
"Willy", "21 in dog years",
"The Queen Mother", 108);
Now we can find the age of people with the following expressions$ages{"Michael Caine"}; # Returns 39
$ages{"Dirty Den"}; # Returns 34
$ages{"Angie"}; # Returns 27
$ages{"Willy"}; # Returns "21 in dog years"
$ages{"The Queen Mother"}; # Returns 108
Notice that like list arrays each % sign has changed to a $ to access an individual element because that element is a scalar. Unlike list arrays the index (in this case the person's name) is enclosed in curly braces, the idea being that associative arrays are fancier than list arrays.An associative array can be converted back into a list array just by assigning it to a list array variable. A list array can be converted into an associative array by assigning it to an associative array variable. Ideally the list array will have an even number of elements:
@info = %ages; # @info is a list array. It
# now has 10 elements
$info[5]; # Returns the value 27 from
# the list array @info
%moreages = @info; # %moreages is an associative
# array. It is the same as %ages
Operators
Associative arrays do not have any order to their elements (they are just like hash tables) but is it possible to access all the elements in turn using the keys function and thevalues function:
foreach $person (keys %ages)
{
print "I know the age of $person\n";
}
foreach $age (values %ages)
{
print "Somebody is $age\n";
}
When keys is called it returns a list of the keys (indices) of the associative array. When values is called it returns a list of the values of the array. These functions return their lists in the same order, but this order has nothing to do with the order in which the elements have been entered.When keys and values are called in a scalar context they return the number of key/value pairs in the associative array.
There is also a function each which returns a two element list of a key and its value. Every time each is called it returns another key/value pair:
while (($person, $age) = each(%ages))
{
print "$person is $age\n";
}
Environment variables
When you run a perl program, or any script in UNIX, there will be certain environment variables set. These will be things like USER which contains your username and DISPLAY which specifies which screen your graphics will go to. When you run a perl CGI script on the World Wide Web there are environment variables which hold other useful information. All these variables and their values are stored in the associative %ENV array in which the keys are the variable names. Try the following in a perl program:
print "You are called $ENV{'USER'} and you are ";
print "using display $ENV{'DISPLAY'}\n";
2.6 Control Structures
More interesting possibilities arise when we introduce control structures and looping. Perl supports lots of different kinds of control structures which tend to be like those in C, but are very similar to Pascal, too. Here we discuss a few of them.
foreach
To go through each line of an array or other list-like structure (such as lines in a file) Perl uses the foreach structure. This has the form
foreach $morsel (@food) # Visit each item in turn
# and call it $morsel
{
print "$morsel\n"; # Print the item
print "Yum yum\n"; # That was nice
}
The actions to be performed each time are enclosed in a block of curly braces. The first time through the block $morsel is assigned the value of the first item in the array @food. Next time it is assigned the value of the second item, and so until the end. If @food is empty to start with then the block of statements is never executed.Testing
The next few structures rely on a test being true or false. In Perl any non-zero number and non-empty string is counted as true. The number zero, zero by itself in a string, and the empty string are counted as false. Here are some tests on numbers and strings.
$a == $b # Is $a numerically equal to $b?
# Beware: Don't use the = operator.
$a != $b # Is $a numerically unequal to $b?
$a eq $b # Is $a string-equal to $b?
$a ne $b # Is $a string-unequal to $b?
You can also use logical and, or and not:
($a && $b) # Is $a and $b true?
($a || $b) # Is either $a or $b true?
!($a) # is $a false?
for
Perl has a for structure that mimics that of C. It has the form
for (initialise; test; inc)
{
first_action;
second_action;
etc
}
First of all the statement initialise is executed. Then while test is true the block of actions is executed. After each time the block is executed inc takes place. Here is an example for loop to print out the numbers 0 to 9.for ($i = 0; $i < 10; ++$i) # Start with $i = 1
# Do it while $i < 10
# Increment $i before repeating
{
print "$i\n";
}
while and until
Here is a program that reads some input from the keyboard and won't continue until it is the correct password
#!/usr/local/bin/perl
print "Password? "; # Ask for input
$a = ; # Get input
chop $a; # Remove the newline at end
while ($a ne "fred") # While input is wrong...
{
print "sorry. Again? "; # Ask again
$a = ; # Get input again
chop $a; # Chop off newline again
}
The curly-braced block of code is executed while the input does not equal the password. The while structure should be fairly clear, but this is the opportunity to notice several things. First, we can we read from the standard input (the keyboard) without opening the file first. Second, when the password is entered $a is given that value including the newline character at the end. The chop function removes the last character of a string which in this case is the newline.To test the opposite thing we can use the until statement in just the same way. This executes the block repeatedly until the expression is true, not while it is true.
Another useful technique is putting the while or until check at the end of the statement block rather than at the beginning. This will require the presence of the do operator to mark the beginning of the block and the test at the end. If we forgo the sorry. Again message in the above password program then it could be written like this.
#!/usr/local/bin/perl
do
{
print "Password? "; # Ask for input
$a = ; # Get input
chop $a; # Chop off newline
}
while ($a ne "fred") # Redo while wrong input
Exercise
Modify the program from the previous exercise so that each line of the file is read in one by one and is output with a line number at the beginning. You should get something like:
1 root:oYpYXm/qRO6N2:0:0:Super-User:/:/bin/csh
2 sysadm:*:0:0:System V Administration:/usr/admin:/bin/sh
3 diag:*:0:996:Hardware Diagnostics:/usr/diags:/bin/csh
etc
You may find it useful to use the structurewhile ($line = )
{
...
}
When you have done this see if you can alter it so that line numbers are printed as 001, 002, ..., 009, 010, 011, 012, etc. To do this you should only need to change one line by inserting an extra four characters. Perl's clever like that.if-else
Of course Perl also allows if/then/else statements. These are of the following form:if ($a)
{
print "The string is not empty\n";
}
else
{
print "The string is empty\n";
}
For this, remember that an empty string is considered to be false. It will also give an "empty" result if $a is the string 0.It is also possible to include more alternatives in a conditional statement:
if (!$a) # The ! is the not operator
{
print "The string is empty\n";
}
elsif (length($a) == 1) # If above fails, try this
{
print "The string has one character\n";
}
elsif (length($a) == 2) # If that fails, try this
{
print "The string has two characters\n";
}
else # Now, everything has failed
{
print "The string has lots of characters\n";
}
In this, it is important to notice that the elsif statement really does have an "e" missing.Sometimes, it is more readable to use unless instead of if (!...) . The switch-case statement familiar to C programmers are not available in Perl. You can simulate it in other ways. See the manual pages.Exercise
From the previous exercise you should have a program which prints out the password file with line numbers. Change it so that works with the text file. Now alter the program so that line numbers aren't printed or counted with blank lines, but every line is still printed, including the blank ones. Remember that when a line of the file is read in it will still include its newline character at the end.2.7 File operations
Here is the basic perl program which does the same as the UNIX cat command on a certain file.#!/usr/local/bin/perl
#
# Program to open the password file, read it in,
# print it, and close it again.
$file = '/etc/passwd'; # Name the file
open(INFO, $file); # Open the file
@lines = ; # Read it into an array
close(INFO); # Close the file
print @lines; # Print the array
The open function opens a file for input (i.e. for reading). The first parameter is the filehandle which allows Perl to refer to the file in future. The second parameter is an expression denoting the filename. If the filename was given in quotes then it is taken literally without shell expansion. So the expression '~/notes/todolist' will not be interpreted successfully. If you want to force shell expansion then use angled brackets: that is, use <~/notes/todolist> instead.
The close function tells Perl to finish with that file.
There are a few useful points to add to this discussion on file-handling. First, the open statement can also specify a file for output and for appending as well as for input. To do this, prefix the filename with a > for output and a >> for appending:
open(INFO, $file); # Open for input
open(INFO, ">$file"); # Open for output
open(INFO, ">>$file"); # Open for appending
open(INFO, "<$file"); # Also open for input
Second, if you want to print something to a file you've already opened for output then you can use the print statement with an extra parameter. To print a string to the file with the INFO filehandle use
print INFO "This line goes to the file.\n";
Third, you can use the following to open the standard input (usually the keyboard) and standard output (usually the screen) respectively:
open(INFO, '-'); # Open standard input
open(INFO, '>-'); # Open standard output
In the above program the information is read from a file. The file is the INFO file and to read from it Perl uses angled brackets. So the statement@lines = ;
reads the file denoted by the filehandle into the array @lines. Note that the Exercise
Modify the above program so that the entire file is printed with a # symbol at the beginning of each line. You should only have to add one line and modify another. Use the $"variable. Unexpected things can happen with files, so you may find it helpful to use the -w option.
Extending pipes
You can very easily substitute reading a file to reading a pipe. The following example shows reading the ouput of the ps command.open(PS,"ps -aef|") or die "Cannot open ps \n";
while(){
print ;
}
close(PS);
2.8 String Processing
One of the most useful features of Perl (if not the most useful feature) is its powerful string manipulation facilities. At the heart of this is the regular expression (RE) which is shared by many other UNIX utilities.Regular expressions
A regular expression is contained in slashes, and matching occurs with the =~ operator. The following expression is true if the string the appears in variable $sentence.
$sentence =~ /the/
The RE is case sensitive, so if$sentence = "The quick brown fox";
then the above match will be false. The operator !~ is used for spotting a non-match. In the above example$sentence !~ /the/
is true because the string the does not appear in $sentence.The $_ special variable
We could use a conditional as
if ($sentence =~ /under/)
{
print "We're talking about rugby\n";
}
which would print out a message if we had either of the following$sentence = "Up and under";
$sentence = "Best winkles in Sunderland";
But it's often much easier if we assign the sentence to the special variable $_ which is of course a scalar. If we do this then we can avoid using the match and non-match operators and the above can be written simply asif (/under/)
{
print "We're talking about rugby\n";
}
The $_ variable is the default for many Perl operations and tends to be used very heavily.More on REs
In an RE there are plenty of special characters, and it is these that both give them their power and make them appear very complicated. It's best to build up your use of REs slowly; their creation can be something of an art form.
Here are some special RE characters and their meaning
. # Any single character except a newline
^ # The beginning of the line or string
$ # The end of the line or string
* # Zero or more of the last character
+ # One or more of the last character
? # Zero or one of the last character
and here are some example matches. Remember that should be enclosed in /.../ slashes to be used.t.e # t followed by anthing followed by e
# This will match the
# tre
# tle
# but not te
# tale
^f # f at the beginning of a line
^ftp # ftp at the beginning of a line
e$ # e at the end of a line
tle$ # tle at the end of a line
und* # un followed by zero or more d characters
# This will match un
# und
# undd
# unddd (etc)
.* # Any string without a newline. This is because
# the . matches anything except a newline and
# the * means zero or more of these.
^$ # A line with nothing in it.
There are even more options. Square brackets are used to match any one of the characters inside them. Inside square brackets a - indicates "between" and a ^ at the beginning means "not":
[qjk] # Either q or j or k
[^qjk] # Neither q nor j nor k
[a-z] # Anything from a to z inclusive
[^a-z] # No lower case letters
[a-zA-Z] # Any letter
[a-z]+ # Any non-zero sequence of lower case letters
At this point you can probably skip to the end and do at least most of the exercise. The rest is mostly just for reference.A vertical bar | represents an "or" and parentheses (...) can be used to group things together:
jelly|cream # Either jelly or cream
(eg|le)gs # Either eggs or legs
(da)+ # Either da or dada or dadada or...
Here are some more special characters:
\n # A newline
\t # A tab
\w # Any alphanumeric (word) character.
# The same as [a-zA-Z0-9_]
\W # Any non-word character.
# The same as [^a-zA-Z0-9_]
\d # Any digit. The same as [0-9]
\D # Any non-digit. The same as [^0-9]
\s # Any whitespace character: space,
# tab, newline, etc
\S # Any non-whitespace character
\b # A word boundary, outside [] only
\B # No word boundary
Clearly characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash. So:
\| # Vertical bar
\[ # An open square bracket
\) # A closing parenthesis
\* # An asterisk
\^ # A carat symbol
\/ # A slash
\\ # A backslash
and so on.Some example REs
As was mentioned earlier, it's probably best to build up your use of regular expressions slowly. Here are a few examples. Remember that to use them for matching they should be put in /.../ slashes
[01] # Either "0" or "1"
\/0 # A division by zero: "/0"
\/ 0 # A division by zero with a space: "/ 0"
\/\s0 # A division by zero with a whitespace:
# "/ 0" where the space may be a tab etc.
\/ *0 # A division by zero with possibly some
# spaces: "/0" or "/ 0" or "/ 0" etc.
\/\s*0 # A division by zero with possibly some
# whitespace.
\/\s*0\.0* # As the previous one, but with decimal
# point and maybe some 0s after it. Accepts
# "/0." and "/0.0" and "/0.00" etc and
# "/ 0." and "/ 0.0" and "/ 0.00" etc.
# Check for valid currency value
^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{1,2})?$
# Check for valid email address
^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*$
Exercise
Previously your program counted non-empty lines. Alter it so that instead of counting non-empty lines it counts only lines with
- the letter x
- the string the
- the string the which may or may not have a capital t
- the word the with or without a capital. Use \b to detect word boundaries.
Substitution & Translation
Just like the sed and tr utilities in Unix, you have s/// and tr/// in Perl. The former is for substitution and the later is for translation.$bar =~ s/this/that/g; # change this to that in $bar
$path =~ s|/usr/bin|/usr/local/bin|;
s/\bgreen\b/mauve/g; # don't change wintergreen
s/Login: $foo/Login: $bar/; # run-time pattern
$count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count
$program =~ s {
/\* # Match the opening delimiter.
.*? # Match a minimal number of characters.
\*/ # Match the closing delimiter.
} []gsx; # Delete (most) C comments.
s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively
for ($variable) { # trim white space in $variable, cheap
s/^\s+//;
s/\s+$//;
}
s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields
#Note the use of $ instead of \ in the last example. Unlike sed,
#we use the \ form in only the left hand side.
#Anywhere else it's $.
$myname = "BABU";
$myname =~ tr/[A-Z]/[a-z]/ ; # yields babu
Splitting
Perl provides a split function to split strings, based on REs. The syntax issplit /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split
If EXPR is omitted, $_ is used. If PATTERN is also omitted, splits on whitespaces, after skipping leading whitespaces. LIMIT sets the maximum fields returned - so this can be used to split partially. Some examples are given below:# process the password file
open(PASSWD, '/etc/passwd');
while () {
($login, $passwd, $uid, $gid,
$gcos, $home, $shell) = split(/:/);
# note that $shell still has a new line.
# use chop or chomp to remove the newline
#...
($login, $passwd, $remainder) = split(/:/, $_, 3);
# here we use LIMIT to set the number of fields
}
We also have join which is the opposite of split. For fixed length strings, we have unpack and pack functions.2.9 Subroutines
Like any good programming language Perl allows the user to define their own functions, called subroutines. They may be placed anywhere in your program but it's probably best to put them all at the beginning or all at the end. A subroutine has the formsub mysubroutine
{
print "Not a very interesting routine\n";
print "This does the same thing every time\n";
}
regardless of any parameters that we may want to pass to it. All of the following will work to call this subroutine. Notice that a subroutine is called with an & character in front of the name:&mysubroutine; # Call the subroutine
&mysubroutine($_); # Call it with a parameter
&mysubroutine(1+2, $_); # Call it with two parameters
Parameters
In the above case the parameters are acceptable but ignored. When the subroutine is called any parameters are passed as a list in the special @_ list array variable. This variable has absolutely nothing to do with the $_ scalar variable. The following subroutine merely prints out the list that it was called with. It is followed by a couple of examples of its use.
sub printargs
{
print "@_\n";
}
&printargs("perly", "king"); # Example prints "perly king"
&printargs("frog", "and", "toad"); # Prints "frog and toad"
Just like any other list array the individual elements of @_ can be accessed with the square bracket notation:sub printfirsttwo
{
print "Your first argument was $_[0]\n";
print "and $_[1] was your second\n";
}
Again it should be stressed that the indexed scalars $_[0] and $_[1] and so on have nothing to with the scalar $_ which can also be used without fear of a clash.Returning values
Result of a subroutine is always the last thing evaluated. This subroutine returns the maximum of two input parameters. An example of its use follows.
sub maximum
{
if ($_[0] > $_[1])
{
$_[0];
}
else
{
$_[1];
}
}
$biggest = &maximise(37, 24); # Now $biggest is 37
The &printfirsttwo subroutine above also returns a value, in this case 1. This is because the last thing that subroutine did was a print statement and the result of a successfulprint statement is always 1.Local variables
The @_ variable is local to the current subroutine, and so of course are $_[0], $_[1], $_[2], and so on. Other variables can be made local too, and this is useful if we want to start altering the input parameters. The following subroutine tests to see if one string is inside another, spaces not withstanding. An example follows.
sub inside
{
local($a, $b); # Make local variables
($a, $b) = ($_[0], $_[1]); # Assign values
$a =~ s/ //g; # Strip spaces from
$b =~ s/ //g; # local variables
($a =~ /$b/ || $b =~ /$a/); # Is $b inside $a
# or $a inside $b?
}
&inside("lemon", "dole money"); # true
In fact, it can even be tidied up by replacing the first two lines withlocal($a, $b) = ($_[0], $_[1]);
2.10 More information
Only a very brief of Perl is covered in this tutorial. The easiest way to lern Perl is to look at existing code. The Perl manual pages and FAQ's are really superb and will help you a lot. Unless until you are sure, run Perl with the -w switch!3. Useful Examples
3.1 Processing colon delimited Excel files
This example takes in a text file as parameter and creates a set of SQL statements which will create Oracle tables.#! /usr/local/bin/perl
# this is the normal perl formatting mechanism
# we don't've any standard headers, so we make
# STDOUT_TOP as blank.
format STDOUT_TOP =
.
# now we format and print the column types nicely
# the @<<<... is the picture and for each picture line, we
# must have the next line made of the variables.
# during printing, picture is substituted by the actual
# values in the corresponding variables
# @< will left align, @> for right align, @| for center-align
format STDOUT =
@<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<
$column_name, $data_type, $nullable
.
# you can have multiple table definitions in one input file
# so keep a count
$tableno=$colno=0;
# similarly, we will store the comments in hashes
%colcmnts=%tblcmnts=();
LINE: while(<>){
chomp; # remove the new line character
next LINE if (/^$/); # ignore null lines
next LINE if (/^\s*#/); # ignore comment lines
($column_name, $data_type, $precision, $nullable, $comment) = split /:/;
$column_name =~ tr/[A-Z]/[a-z]/ ; #lowercase column name
if ($data_type eq ""){ #no data-type ? then this is a table
$table_name = $column_name;
$colno=0;
print ");\n\n" unless ($tableno==0);
print "CREATE TABLE $table_name AS (\n";
$tableno++;
#store table comment if there is a comment
$tblcmnts{$table_name} = $comment unless ($comment eq "");
next LINE;
}
#column comment
$colcmnts{"$table_name.$column_name"} = $comment unless ($comment eq "");
$colno++;
#if precision is specified, we need to put it inside parantheses
$data_type .="($precision)" unless ($precision eq "");
if ($colno==1) {
$column_name = " $column_name";
} else {
$column_name = ",$column_name";
}
$data_type =~ tr/[a-z]/[A-Z]/ ; # uppercase datatype
$nullable =~ tr/[a-z]/[A-Z]/ ; # upper case "not null"
# print to the format defined before
write ;
}
print ");\n\n" ;
# print table comments
foreach my $key (sort keys(%tblcmnts)) {
$comment = "'$tblcmnts{$key}'";
print "COMMENT ON TABLE $key IS $comment;\n";
}
print "\n\n" ;
# print column comments
foreach my $key (sort keys(%colcmnts)) {
$comment = "'$colcmnts{$key}'";
print "COMMENT ON COLUMN $key IS $comment;\n";
}
print "\n\n" ;
__END__
# from here is the sample input file
# columns are
# column(table):data type:precision:nullable:comment
# for table name, data type is null
BARS_BATCH::::Batch Header Table
batch_number:number:5:not null:The batch number(sequential)
deposit_date:date::not null:Deposit date of the batch
payments:number:3:not null:Number of payments in the batch
payment_amount:number:11,2:not null:Dollar amount of the batch
pieces:number:3:not null:Number of cheques in the batch
payment_method:varchar2:2:not null:Method of payment(Cheque,Cash...)
clerk:varchar2:10:not null:Who entered the batch
origin:varchar2:2:not null:Where the batch originated (Field, HO...)
dirty:varchar2:1::Is the batch marked as dirty(1,0)
actual_payments:number:3::Number of payments actually entered
actual_amount:number:11,2::Amount actually entered
creadt:date::not null:Date batch was created
modidt:date:::Date batch was last modified
sts:varchar2:1:not null:Status of the batch
errcode:varchar2:16::Errors in the batch
BARS_GIFTS::::Batch Detail Table
batch_number:number:5:not null:The batch number(sequential)
doc_number:number:2:not null:The gift doc number(sequential within batch)
page:number:2:not null:Page number
account_id:number:8:not null:Active/new account id for the member
source:varchar2:14:not null:Active source
fund:varchar2:16::Active fund
gift_type:varchar2:2::Active gift type
credit_account:varchar2:4:not null:Active credit account (check the size!)
handle_flag:varchar2:1::Special handling flag
payment_amount:number:11,2:not null:Gift amount
total_payment_amount:number:11,2:not null:Cheque amount
creadt:date::not null:Date gift was created
modidt:date:::Date gift was last modified
errcode:varchar2:16::Errors in the gift
BARS_ACCOUNTS::::New members
account_id:number:8:not null:New account id for the member
title:varchar2:8::Title for the new member
first_name:varchar2:20::First name
middle_name:varchar2:20::Middle name
last_name:varchar2:40:not null:Last name(In TA, can be null)
suffix:varchar2:8::Suffix
phone_number:varchar2:15::Phone number
street_number:varchar2:8::Street number
street_name:varchar2:30::Street name
apt_no:varchar2:8::Apartment number
zipcode:varchar2:5::Zipcode
zipcode_ext:varchar2:5::Zipcode extension
city:varchar2:30::City
state:varchar2:2::State
freeline:varchar2:50::Free comments
extraline:varchar2:50::Free comments 2
BARS_CODES::::Default codes for LOVs
code_type:varchar2:2:not null:Code type
code:varchar2:20:not null:Code
codelb:varchar2:40:not null:Description
3.2 Processing fixed format text files
Following is an extract from one of our SQL*Loader control files. Note that the ocr_gift_fate and ocr_deposit_date columns are same and that there is a filler between positions 21 and 27.INTO TABLE ACQUIRED_DATA
WHEN record_type = 'T'
(
record_type POSITION(001:001) CHAR
"DECODE (:record_type, 'T', 'BT', 'X', 'FT', 'D', 'D', 'O')",
ocr_batch_number POSITION(002:010) CHAR,
ocr_gift_date POSITION(011:021) CHAR,
ocr_deposit_date POSITION(011:021) CHAR,
target_payment_num POSITION(027:029) INTEGER EXTERNAL,
target_payment_amt POSITION(030:040) DECIMAL EXTERNAL
)
Here is our Perl code to read all the T records, split the record into corresponding variables and then print the batch number, payment number and the amount.#! /usr/local/bin/perl -w
# read standard input
LINE : while(<>) {
#ignore records other than batch headers
next LINE unless /^T/;
# remove the new line character
chomp;
# split the record!
($rec_type, $ocr_batch_number, $ocr_gift_date,
$filler, $target_payment_num,
$target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);
#convert the number fields from scalar string to scalar number!
$target_payment_num += 0;
$target_payment_amount += 0;
#voila! print it
print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n";
}
3.3 Report Generation and Formatting
This is same as the previous code except that we now format the output nicely (well, it is debatable whether this is nice!). We have also added a subroutine commify that will format numbers by adding commas.#! /usr/local/bin/perl
# use the POSIX module to access only the function
# strftime - to format a date nicely
use POSIX(strftime);
# and use it right-away to format current time
# as MM/DD/YY HH:MI
$today=strftime("%m/%d/%y %H:%M",localtime());
# this subroutine adds commas to a number
#
sub commify {
my $input = shift;
$input = reverse $input;
$input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
return reverse $input;
}
#define the page header
## $% is the page number
format STDOUT_TOP=
THE NATURE CONSERVANCY
@<<<<<<<<<<<<< Upload File Batch Report Page : @>>>
$today,$%
------------------------------------------------------------------
Batch Number Gift Date Payments Amount
------------------------------------------------------------------
.
# define the page
format STDOUT=
@<<<<<<<<<<<<<<<< @<<<<<<<<<< @>>>>>> @>>>>>>>>>>>>>>>>>>>>>
$ocr_batch_number,$ocr_gift_date,$target_payment_num,$target_payment_amount
.
# $= is the lines per page . Normal printers have this as 59
$= = 59;
# initialize the variables that hold report totals
$sum_num = $sum_amount = 0;
# read standard input
LINE : while(<>) {
#ignore records other than batch headers
next LINE unless /^T/;
# remove the new line character
chomp;
# split the record!
($rec_type, $ocr_batch_number, $ocr_gift_date,
$filler, $target_payment_num,
$target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);
#convert the number fields from scalar string to scalar number!
$target_payment_num += 0;
$target_payment_amount += 0;
# add to the totals
$sum_num += $target_payment_num;
$sum_amount += $target_payment_amount;
#add commas to the number
$target_payment_num = &commify($target_payment_num);
# dollar amount should have 2 decimal places
$target_payment_amount = "\$".&commify(sprintf("%.2f",$target_payment_amount));
#voila! print it
#print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n";
write;
}
##
# print a line before printing totals
#
$ocr_batch_number = "---------";
$ocr_gift_date = "-----------";
$target_payment_num = "------";
$target_payment_amount = "----------------------";
write;
##
# print totals
#
$ocr_batch_number = "TOTAL";
$ocr_gift_date = "";
$target_payment_num = &commify($sum_num);
$target_payment_amount = "\$".&commify(sprintf("%.2f",$sum_amount));
write;
3.4 DBM Databases
DBM is a standard UNIX database, which store data as key-value pairs. In this example, we will read in the batch records, check against a DBM database whether the batch exists in that. If it exists, we will print an error and if not we will insert the batch and values. This code helps in checking for duplicate uploading of batches.#! /usr/local/bin/perl
# set the name of your DBM file
$DBM_FILE = "batches.db";
# this will create two files, one for data and one for index
dbmopen %HASH, $DBM_FILE, 0666
or die "Can't open $DBM_FILE: $!\n";
# read standard input
LINE : while(<>) {
#ignore records other than batch headers
next LINE unless /^T/;
# remove the new line character
chomp;
# split the record!
($rec_type, $ocr_batch_number, $ocr_gift_date,
$filler, $target_payment_num,
$target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);
#convert the number fields from scalar string to scalar number!
$target_payment_num += 0;
$target_payment_amount += 0;
# key is the batch number
# value is batch date + payments + amount
# all joined by :
($Key,$Value) = ($ocr_batch_number,"$ocr_gift_date:$target_payment_num:$target_payment_amount");
##
# check whether this batch is already loaded
if ( defined($HASH{$Key}) ) {
# if so, print an error
($b_date,$b_payments,$b_amount)=split(/:/,$HASH{$Key});
print "Error: The batch $Key ($b_payments for \$$b_amount) is already uploaded\n";
} else {
# else, add to the batch database
$HASH{$Key} = $Value;
}
}
dbmclose %HASH;
3.5 Exercise
Using the examples above, write a program to read all batch records from an input file, verify against a DBM database and print a formatted report. Duplicate batches should also be indicated in the report. Try to split the tasks (verifying against the database, reporting etc) into individual subroutines.Also add another routine to generate an Excel CSV file report, in addition to the normal report.
4. Modules
(The following section is borrowed directly from Tim Bunce's modules file, available at your nearest CPAN site.)Perl implements a class using a package, but the presence of a package doesn't imply the presence of a class. A package is just a namespace. A class is a package that provides subroutines that can be used as methods. A method is just a subroutine that expects, as its first argument, either the name of a package (for ``static'' methods), or a reference to something (for ``virtual'' methods).
A module is a file that (by convention) provides a class of the same name (sans the .pm), plus an import method in that class that can be called to fetch exported symbols. This module may implement some of its methods by loading dynamic C or C++ objects, but that should be totally transparent to the user of the module. Likewise, the module might set up an AUTOLOAD function to slurp in subroutine definitions on demand, but this is also transparent. Only the .pm file is required to exist.
4.1 Where to get them?
CPAN - Comprehensive Perl Archive Network is the one-stop shop for Perl archives, modules, scripts and even documentation. The URL is http://www.cpan.org.4.2 Modules vs. coding
The advantage of modules is that, you really don't need to know about how the software works, but how to use it. Modules can drastically reduce development time. Most modules have good documentation and are well tested. These also tend to thoroughly follow standards. As a programmer, you might not be interested in reading hundreds of pages of standards' documentation, just to code something quickly!Modules do have an apparent disadvantage. Speed. Since modules tend to be generic in nature, the code contained within tend to be large. And most of the time, you might be using just 10% of the features modules provide. In such cases, and if a second or two in the overall run-time makes a difference, you probably want to code manually.
Even in such situations, the easiest way is to download the module, see what you don't require and delete it. An easier and more elegant solution is to selectively import functions from the module.
CPAN provides guidelines on writing modules. So, if you think you have some code that nobody else has written (rare chance!) and can be modularized, do so by all means and submit to CPAN.
4.3 Well known modules
CGI - web programming
CGI.pm is the most used module for CGI scripts. These days, page design and content management are mostly done independently and it is rare to find a good CGI programmer with a good aesthetic sense! CGI.pm makes programmer's life easy, without making him bother much about the HTML tags (and also avoiding heated discussions with the page designer!).DBI - common database interface
To quote Tim Bunce, the architect and author of DBI:DBI is a database access Application Programming Interface (API)
for the Perl Language. The DBI API Specification defines a set
of functions, variables and conventions that provide a consistent
database interface independant of the actual database being used.
In simple language, the DBI interface allows users to access multiple database types transparently. So, if you are connecting to an Oracle, Informix, mSQL, Sybase or whatever database, you don't need to know the underlying mechanics of the 3GL layer. The API defined by DBI will work on all these database types.A similar benefit is gained by the ability to connect to two different databases by different vendors within the one perl script, i.e., I want to read data from an Oracle database and insert it back into an Informix database all within one program. The DBI layer allows you to do this simply and powerfully.The DBI requires one or more driver modules to talk to databases. Oracle, Access and ODBC drivers might be of interest to us.
Please note that DBI only standardize the database interaction process. If you use Oracle driver and write SQL specific to Oracle, don't expect to port your project smoothly to Informix, just by changing the driver!
GUI - interfaces to GUI toolkits
Perl also provides modules for various GUI toolkits on Unix like Gtk, Tk, XForms etc. A beta version for Win32 GUI is also available.5. Other choices
5.1 Python - OOP
Developed by Guido Van Rossum, a programmer from Netherlands, Python, freely available from http://www.python.org is an increasingly popular scripting language. Like Perl, it is also available for major platforms. Unlike Perl, Python was created as an Object Oriented Language. Perl does not support OOP as such, but one can easily simulate objects.Python is also a cleaner language in that it does not generally allow you to be adventurous with data types. Consequently, Python code is much easier to maintain, as it gets bigger. Python is designed to be extensible. So, enforcing standards and extending the capabilities of the language are easier. With the advent of Java and increased OOP awareness, Python is very popular these days. Python has modules for GUI programming in Unix and Windows environments and that is one area where it is catching up.
It is a matter of personal preference to choose between Python and Perl. Generally, people who prefer C to C++ opt for Perl and C++ lovers go for Python. VB and ASP programmers also can migrate to Python smoothly, which might not be the case with Perl.
Personally, for all the elegance of Python, I do not program in that because of one reason. In Python, the blocks are specified by indentation and not by any construct like {} or BEGIN-END. This can really be irritating if your FTP client or editor decides to translate the file! An easy way out is to add {} as comments (like #{ ... #} !)
A good feature of Python is that the interpreter can be used interactively - like shell. This enables one to test out each line of code thoroughly. A similar application called perlshell is also available in Perl.
A 100% pure Java implementation of Python called JPython is the latest news on Python. This enables one to create Java byte-code out ot Python code.
5.2 TCL - GUI, Expect, Commercial support
TCL, the Tool Command Language was invented by Dr.John Ousterhout when he was working with Sun Labs. Later he formed Scriptics Co., which now continues development of Tcl and provides commercial support.Tcl has been around longer than Perl or Python and coupled with its GUI toolkit, Tk (hence the name Tcl/Tk) is very popular among Unix crowd.
Another recent addition to Tcl that attracts interest is Expect. Expect is a tool for automation - specifically for those tasks which needs interaction. A simple example of an Expect script could be logging in to a remote machine, running scripts there and logging out. Expect easily makes the life of system administrators much more pleasant!
Tcl/Tk plugins for browsers are also available. These can run Tclets inside your browser - many Tclets are already available to produce stunning GUI's. Tclets are probable alternates for java applets.
5.3 PHP - Web Scripting
PHP started as a Hypertext Preprocessor and was written by Rasmus Lerdorf a couple of years back. Its simplicity and elegance soon made it a very viable alternative to ASP and Java servlets for server side scripting for the web. Syntactically PHP is lot similar to Perl. It is also very much object oriented and very very easy to learn.Visit http://www.php.net for more information.
PHP's popularity has been increasing almost exponentially. The only negative aspect was that mod_perl scripts execute faster. Now in version 4.0, PHP is driven by a completely rewritten engine called Zend and the performance gap is narrowing down a lot. Even the older versions are sufficiently fast to handle a medium complexity site of dynamic content.
My personal opinion is that, if you want to create interactive/dynamic web site without learning the complexities of CGI, PHP is the way to go - especially if you don't want to spend thousands to purchase ColdFusion or ASP. PHP can also be installed as a separate interpreter program like Perl or Awk. Combined with the ease with which you can write powerful programs in PHP and its easy integration with databases (Oracle/Sybase/Informix/ODBC/PostgreSQL/MySQL...), graphics libraries, network functions, e-mail etc, it is a viable and better choice to even Pro*C !
6. References
There are lots of materials available about Perl on the Internet. Of course, the first stop is the Perl homepage. You can also subscribe to The Perl Journal, which is slightly advanced.6.1 Books
- Programming Perl by Larry Wall is the standard book on Perl. Also known as the camel book it is not for easy reading. However, if you consider serious Perl programming, this is a must. Published by O'Reilly and Associates, it is available for around $30-$40.
- Learning Perl by Randal L Schwartz is a very good book to get you started. It is much easier to read than the camel book. Again from O'Reilly and Associates, it is available for around $20-$30.
- Perl Cookbook by Tom Christiansen is a very good book to look for quick solutions. If you have an idea of Perl, this is the book to buy, for very well explained real life examples. This book illustrates the Perl motto There Is More Than One Way To Do It to the core ! From O'Reilly and Associates, it is available for around $30-$40.
6.2 WWW
6.3 Scripts' Archives
- http://www.cpan.org
- http://www.freecode.com
- http://www.weberdev.com
- http://www.scriptsearch.com
- http://www.developer.com
0 comments:
Post a Comment