Chapter 6

Reading from and Writing to Files


CONTENTS

So far, you've learned to read input from the standard input file, which stores data that is entered from the keyboard. You've also learned how to write to the standard output file, which sends data to your screen. In today's lesson, you'll learn the following:

Opening a File

Before you can read from or write to a file, you must first open the file. This operation tells the operating system that you are currently accessing the file and that no one else can change it while you are working with it. To open a file, call the library function open.

The syntax for the open library function is


open (filevar, filename);

When you call open, you must supply two arguments:

The File Variable

The first argument passed to open is the name that the Perl interpreter uses to refer to the file. This name is also known as the file variable (or the file handle).

A file-variable name can be any sequence of letters, digits, and underscores, as long as the first character is a letter.

The following are legal file-variable names:


filename

MY_NAME

NAME2

A_REALLY_LONG_FILE_VARIABLE_NAME

The following are not legal file-variable names:


1NAME

A.FILE.NAME

_ANOTHERNAME

if

if is not a valid file-variable name because it has another meaning: as you've seen, it indicates the start of an if statement. Words such as if that have special meanings in Perl are known as reserved words and cannot be used as names.

Tip
It's a good idea to use all uppercase letters for your file-variable names. This makes it easier to distinguish file-variable names from other variable names and from reserved words.

The Filename

The second item passed to open is the name of the file you want to open. For example, if you are running Perl on a UNIX file system, and your current working directory contains a file named file1 that you would like to open, you can open it as follows:


open(FILE1, "file1");

This statement tells Perl that you want to open the file file1 and associate it with the file variable FILE1.

If you want to open a file in a different directory, you can specify the complete pathname, as follows:


open(FILE1, "/u/jqpublic/file1");

This opens the file /u/jqpublic/file1 and associates it with the file variable FILE1.

NOTE
If you are running Perl on a file system other than UNIX, use the filename and directory syntax that is appropriate for your system. The Perl interpreter running on that system will be able to figure out where your file is located.

The File Mode

When you open a file, you must decide how you want to access the file. There are three different file-access modes (or, simply, file modes) available in Perl:

read modeEnables the program to read the existing contents of the file but does not enable it to write into the file
write modeDestroys the current contents of the file and overwrites them with the output supplied by the program
append modeAppends output supplied by the program to the existing contents of the file

By default, open assumes that a file is to be opened in read mode. To specify write mode, put a > character in front of the filename that you pass to open, as follows:


open (OUTFILE, ">/u/jqpublic/outfile");

This opens the file /u/jqpublic/outfile for writing and associates it with the file variable OUTFILE.

To specify append mode, put two > characters in front of the filename, as follows:


open (APPENDFILE, ">>/u/jqpublic/appendfile");

This opens the file /u/jqpublic/appendfile in append mode and associates it with the file variable APPENDFILE.

NOTE
Here are a few things to remember when opening files:
  • When you open a file for writing, any existing contents are destroyed.
  • You cannot read from and write to the same file at the same time.
  • When you open a file in append mode, the existing contents are not destroyed, but you cannot read the file while writing to it.

Checking Whether the Open Succeeded

Before you can use a file opened by the open function, you should first check whether the open function actually is giving you access to the file. The open function enables you to do this by returning a value indicating whether the file-opening operation succeeded:

As you can see, the values returned by open correspond to the values for true and false in conditional expressions. This means that you can use open in if and unless statements. The following is an example:


if (open(MYFILE, "/u/jqpublic/myfile")) {

        # here's what to do if the file opened

}

The code inside the if statement is executed only if the file has been successfully opened. This ensures that your programs read or write only to files that you can access.

NOTE
If open returns false, you can find out what went wrong by using the file-test operators, which you'll learn about later today.

Reading from a File

Once you have opened a file and determined that the file is available for use, you can read information from it.

To read from a file, enclose the file variable associated with the file in angle brackets (< and >), as follows:


$line = <MYFILE>;

This statement reads a line of input from the file specified by the file variable MYFILE and stores the line of input in the scalar variable $line.

Listing 6.1 is a simple program that reads input from a file and writes it to the standard output file.


Listing 6.1. A program that reads lines from a file and prints them.

1:  #!/usr/local/bin/perl

2:  

3:  if (open(MYFILE, "file1")) {

4:          $line = <MYFILE>;

5:          while ($line ne "") {

6:                  print ($line);

7:                  $line = <MYFILE>;

8:          }

9:  }



$ program6_1

Here is a line of input.

Here is another line of input.

Here is the last line of input.

$

Line 3 opens the file file1 in read mode, which means that the file is to be made available for reading. file1 is assumed to be in the current working directory. The file variable MYFILE is associated with the file file1.

If the call to open returns a nonzero value, the conditional expression


open(MYFILE, "file1")

is assumed to be true, and the code inside the if statement is executed.

Lines 4-8 print the contents of file1. The sample output shown here assumes that file1 contains the following three lines:


Here is a line of input.

Here is another line of input.

Here is the last line of input.

Line 4 reads the first line of input from the file specified by the file variable MYFILE, which is file1. This line of input is stored in the scalar variable $line.

Line 5 tests whether the end of the file specified by MYFILE has been reached. If there are no more lines left in MYFILE, $line is assigned the empty string.

Line 6 prints the text stored in $line, which is the line of input read from MYFILE.

Line 7 reads the next line of MYFILE, preparing for the loop to start again.

File Variables and the Standard Input File

Now that you have seen how Perl programs read input from files in read mode, take another look at a statement that reads a line of input from the standard input file.


$line = <STDIN>;

Here's what is actually happening: The Perl program is referencing the file variable STDIN, which represents the standard input file. The < and > on either side of STDIN tell the Perl interpreter to read a line of input from the standard input file, just as the < and > on either side of MYFILE in


$line = <MYFILE>;

tell the Perl interpreter to read a line of input from MYFILE.

STDIN is a file variable that behaves like any other file variable representing a file in read mode. The only difference is that STDIN does not need to be opened by the open function because the Perl interpreter does that for you.

Terminating a Program Using die

In Listing 6.1, you saw that the return value from open can be tested to see whether the program actually has access to the file. The code that operates on the opened file is contained in an if statement.

If you are writing a large program, you might not want to put all of the code that affects a file inside an if statement, because the distance between the beginning of the if statement and the closing brace (}) could get very large. For example:


if (open(MYFILE, "file1")) {

        # this could be many pages of statements!

}

Besides, after a while, you'll probably get tired of typing the spaces or tabs you use to indent the code inside the if statement. Perl provides a way around this using the library function die.

The syntax for the die library function is


die (message);

When the Perl interpreter executes the die function, the program terminates immediately and prints the message passed to die.

For example, the statement


die ("Stop this now!\n");

prints the following on your screen and terminates the program:


Stop this now!

Listing 6.2 shows how you can use die to smoothly test whether a file has been opened correctly.


Listing 6.2. A program that uses die when testing for a successful file open operation.

1:  #!/usr/local/bin/perl

2:  

3:  unless (open(MYFILE, "file1")) {

4:          die ("cannot open input file file1\n");

5:  }

6:  

7:  # if the program gets this far, the file was

8:  # opened successfully

9:  $line = <MYFILE>;

10: while ($line ne "") {

11:         print ($line);

12:         $line = <MYFILE>;

13: }



$ program6_2

Here is a line of input.

Here is another line of input.

Here is the last line of input.

$

This program behaves the same way as the one in Listing 6.1, except that it prints out an error message when it can't open the file.

Line 3 opens the file and tests whether the file opened successfully. Because this is an unless statement, the code inside the braces ({ and }) is executed unless the file opened successfully.

Line 4 is the call to die that is executed if the file does not open successfully. This statement prints the following message on the screen and exits:


cannot open input file file1

Because line 4 terminates program execution when the file is not open, the program can make it past line 5 only if the file has been opened successfully.

The loop in lines 9-13 is identical to the loop you saw in Listing 6.1. The only difference is that this loop is no longer inside an if statement.

NOTE
Here is another way to write lines 3-5:
open (MYFILE, "file1") || die ("Could not open file");
Recall that the logical OR operator only evaluates the expression on its right if the expression on its left is false. This means that die is called only if open returns false (if the open operation fails).

Printing Error Information Using die

If you like, you can have die print the name of the Perl program and the line number of the statement containing the call to die. To do this, leave off the trailing newline character in the character string, as follows:


die ("Missing input file");

If the Perl program containing this statement is called myprog, and this statement is line 14 of myprog, this call to die prints the following and exits:


Missing input file at myprog line 14.

Compare this with


die ("Missing input file\n");

which simply prints the following before exiting:


Missing input file

Specifying the program name and line number is useful in two cases:

Reading into Array Variables

Perl enables you to read an entire file into a single array variable. To do this, assign the file variable to the array variable, as follows:


@array = <MYFILE>;

This reads the entire file represented by MYFILE into the array variable @array. Each line of the file becomes an element of the list that is stored in @array.

Listing 6.3 is a simple program that reads an entire file into an array.


Listing 6.3. A program that reads an entire input file into an array.

1:  #!/usr/local/bin/perl

2:  

3:  unless (open(MYFILE, "file1")) {

4:          die ("cannot open input file file1\n");

5:  }

6:  @input = <MYFILE>;

7:  print (@input);



$ program6_3

Here is a line of input.

Here is another line of input.

Here is the last line of input.

$

Lines 3-5 open the file, test whether the file has been opened successfully, and terminate the program if the file cannot be opened.

Line 6 reads the entire contents of the file represented by MYFILE into the array variable @input. @input now contains a list consisting of the following three elements:


("Here is a line of input.\n",

 "Here is another line of input.\n",

 "Here is the last line of input.\n")

Note that a newline character is included as the last character of each line.

Line 7 uses the print function to print the entire file.

Writing to a File

After you have opened a file in write or append mode, you can write to the file you have opened by specifying the file variable with the print function. For example, if you have opened a file for writing using the statement


open(OUTFILE, ">outfile");

the following statement:


print OUTFILE ("Here is an output line.\n");

writes the following line to the file specified by OUTFILE, which is the file called outfile:


Here is an output line.

Listing 6.4 is a simple program that reads from one file and writes to another.


Listing 6.4. A program that opens two files and copies one into another.

1:  #!/usr/local/bin/perl

2:  

3:  unless (open(INFILE, "file1")) {

4:          die ("cannot open input file file1\n");

5:  }

6:  unless (open(OUTFILE, ">outfile")) {

7:          die ("cannot open output file outfile\n");

8:  }

9:  $line = <INFILE>;

10: while ($line ne "") {

11:         print OUTFILE ($line);

12:         $line = <INFILE>;

13: }


This program writes nothing to the screen because all output is directed to the file called outfile.

Lines 3-5 open file1 for reading. If the file cannot be opened, line 4 is executed, which prints the following message on the screen and terminates the program:


cannot open input file file1

Lines 6-8 open outfile for writing; the > in >outfile indicates that the file is to be opened in write mode. If outfile cannot be opened, line 7 prints the message


cannot open output file outfile

on the screen and terminates the program.

The only other line in the program that you have not seen in other listings in this lesson is line 11, which writes the contents of the scalar variable $line on the file specified by OUTFILE.

Once this program has completed, the contents of file1 are copied into outfile.


Here is a line of input.

Here is another line of input.

Here is the last line of input.

Make sure that files you open in write mode contain nothing valuable. When the open function opens a file in write mode, any existing contents are destroyed.

The Standard Output File Variable

If you want, your program can reference the standard output file by referring to the file variable associated with the output file. This file variable is named STDOUT.

By default, the print statement sends output to the standard output file, which means that it sends the output to the file associated with STDOUT. As a consequence, the following statements are equivalent:


print ("Here is a line of output.\n");

print STDOUT ("Here is a line of output.\n");

NOTE
You do not need to open STDOUT because Perl automatically opens it for you.

Merging Two Files into One

In Perl, you can open as many files as you like, provided you define a different file variable for each one. (Actually, there is an upper limit on the number of files you can open, but it's fairly large and also system-dependent.) For an example of a program that has multiple files open at one time, take a look at Listing 6.5. This program merges two files by creating an output file consisting of one line from the first file, one line from the second file, another line from the first file, and so on. For example, if an input file named merge1 contains the lines


a1

a2

a3

and another file, merge2, contains the lines


b1

b2

b3

then the resulting output file consists of


a1

b1

a2

b2

a3

b3


Listing 6.5. A program that merges two files.

1:  #!/usr/local/bin/perl

2:  

3:  open (INFILE1, "merge1") ||

4:          die ("Cannot open input file merge1\n");

5:  open (INFILE2, "merge2") ||

6:          die ("Cannot open input file merge2\n");

7:  $line1 = <INFILE1>;

8:  $line2 = <INFILE2>;

9:  while ($line1 ne "" || $line2 ne "") {

10:         if ($line1 ne "") {

11:                 print ($line1);

12:                 $line1 = <INFILE1>;

13:         }

14:         if ($line2 ne "") {

15:                 print ($line2);

16:                 $line2 = <INFILE2>;

17:         }

18: }



$ program6_5

a1

b1

a2

b2

a3

b3

$

Lines 3 and 4 show another way to write a statement that either opens a file or calls die if the open fails. Recall that the || operator first evaluates its left operand; if the left operand evaluates to true (a nonzero value), the right operand is not evaluated because the result of the expression is true.

Because of this, the right operand, the call to die, is evaluated only when the left operand is false-which happens only when the call to open fails and the file merge1 cannot be opened.

Lines 5 and 6 repeat the preceding process for the file merge2. Again, either the file is opened successfully or the program aborts by calling die.

The program then loops repeatedly, reading a line of input from each file each time. The loop terminates only when both files have been exhausted. If one file is empty but the other is not, the program just copies the line from the non-empty file to the standard output file.

Note that the output from this program is printed on the screen. If you decide that you want to send this output to a file, you can do one of two things:

For a discussion of the second method, see the following section.

Redirecting Standard Input and Standard Output

When you run programs on UNIX, you can redirect input and output using < and >, respectively, as follows:


myprog <input >output

Here, when you run the program called myprog, the input for the program is taken from the file specified by input instead of from the keyboard, and the output for the program is sent to the file specified by output instead of to the screen.

When you run a Perl program and redirect input using <, the standard input file variable STDIN now represents the file specified with <. For example, consider the following simple program:


#!/usr/local/bin/perl

$line = <STDIN>;

print ($line);

Suppose this program is named myperlprog and is called with the command


myperlprog <file1

In this case, the statement


$line = <STDIN>;

reads a line of input from file1 because the file variable STDIN represents file1.

Similarly, specifying > on the command file redirects the standard output file from the screen to the specified file. For example, consider this command:


myperlprog <file1 >outfile

It redirects output from the standard output file to the file called outfile. Now, the following statement writes a line of data to outfile:


print ($line);

The Standard Error File

Besides the standard input file and the standard output file, Perl also defines a third built-in file variable, STDERR, which represents the standard error file. By default, text sent to this file is written to the screen. This enables the program to send messages to the screen even when the standard output file has been redirected to write to a file. As with STDIN and STDOUT, you do not need to open STDERR because it automatically is opened for you.

Listing 6.6 provides a simple example of the use of STDERR. The output shown in the input-output example assumes that the standard input file and standard output file have been redirected to files using < and >, as in


myprog <infile >outfile

Therefore, the only output you see is what is written to STDERR.


Listing 6.6. A program that writes to the standard error file.

1:  #!/usr/local/bin/perl

2:  

3:  open(MYFILE, "file1") ||

4:          die ("Unable to open input file file1\n");

5:  print STDERR ("File file1 opened successfully.\n");

6:  $line = <MYFILE>;

7:  while ($line ne "") {

8:          chop ($line);

9:          print ("\U$line\E\n");

10:         $line = <MYFILE>;

11: }



$ program6_6

File file1 opened successfully.

$

This program converts the contents of a file into uppercase and sends the converted contents to the standard output file.

Line 3 tries to open file1. If the file cannot be opened, line 4 is executed. This calls die, which prints the following message and terminates:


Unable to open input file file1

NOTE
The function die sends its messages to the standard error file, not the standard output file. This means that when a program terminates, the message printed by die always appears on your screen, even when you have redirected output to a file.

If the file is opened successfully, line 5 writes a message to the standard error file, which indicates that the file has been opened. As you can see, the standard error file is not reserved solely for errors. You can write anything you want to STDERR at any time.

Lines 6-11 read one line of file1 at a time and write it out in uppercase (using the escape characters \U and \E, which you learned about on Day 3, "Understanding Scalar Values").

Closing a File

When you are finished reading from or writing to a file, you can tell the Perl interpreter that you are finished by calling the library function close.

The syntax for the close library function is


close (filevar);

close requires one argument: the file variable representing the file you want to close. Once you have closed the file, you cannot read from it or write to it without invoking open again.

Note that you do not have to call close when you are finished with a file: Perl automatically closes the file when the program terminates or when you open another file using a previously defined file variable. For example, consider the following statements:


open (MYFILE, ">file1");

print MYFILE ("Here is a line of output.\n");

open (MYFILE, ">file2");

print MYFILE ("Here is another line of output.\n");

Here, when file2 is opened for writing, file1 automatically is closed. The file variable MYFILE is now associated with file2. This means that the second print statement sends the following to file2:


Here is another line of output.

DO use the <> operator, which is an easy way to read input from several files in succession. See the section titled "Reading from a Sequence of Files," later in this lesson, for more information on the <> operator.
DON'T use the same file variable to represent multiple files unless it is absolutely necessary. It is too easy to lose track of which file variable belongs to which file, especially if your program is large or has many nested conditional statements.

Determining the Status of a File

Many of the example programs in today's lesson call open and test the returned result to see whether the file has been opened successfully. If open fails, it might be useful to find out exactly why the file could not be opened. To do this, use one of the file-test operators.

Listing 6.7 provides an example of the use of a file-test operator. This program is a slight modification of Listing 6.6, which is an uppercase conversion program.


Listing 6.7. A program that checks whether an unopened file actually exists.

1:  #!/usr/local/bin/perl

2:  

3:  unless (open(MYFILE, "file1")) {

4:          if (-e "file1") {

5:                 die ("File file1 exists, but cannot be opened.\n");

6:          } else {

7:                 die ("File file1 does not exist.\n");

8:          }

9:  }

10: $line = <MYFILE>;

11: while ($line ne "") {

12:         chop ($line);

13:         print ("\U$line\E\n");

14:         $line = <MYFILE>;

15: }



$ program6_7

File file1 does not exist.

$

Line 3 attempts to open the file file1 for reading. If file1 cannot be opened, the program executes the if statement starting in line 4.

Line 4 is an example of a file-test operator. This file-test operator, -e, tests whether its operand, a file, actually exists. If the file file1 exists, the expression -e "file1" returns true, the message File file1 exists, but cannot be opened. is displayed, and the program exits. If file1 does not exist, -e "file1" is false, and the library function die prints the following message before exiting:


File file1 does not exist.

File-Test Operator Syntax

All file-test operators have the same syntax as the -e operator used in Listing 6.7.

The syntax for the file-test operators is


-x expr

Here, x is an alphabetic character and expr is any expression. The value of expr is assumed to be a string that contains the name of the file to be tested.

Because the operand for a file-test operator can be any expression, you can use scalar variables and string operators in the expression if you like. For example:


$var = "file1";

if (-e $var) {

        print STDERR ("File file1 exists.\n");

}

if (-e $var . "a") {

        print STDERR ("File file1a exists.\n");

}

In the first use of -e, the contents of $var, file1, are assumed to be the name of a file, and this file is tested for existence. In the second case, a is appended to the contents of file1, producing the string file1a. The -e operator then tests whether a file named file1a exists.

NOTE
The Perl interpreter does not get confused by the expression
-e $var . "a"
because the . operator has higher precedence than the -e operator. This means that the string concatenation is performed first.
The file-test operators have higher precedence than the comparison operators but lower precedence than the shift operators. To see a complete list of the Perl operators and their precedences, refer to Day 4, "More Operators."

The string can be a complete path name, if you like. The following is an example:


if (-e "/u/jqpublic/file1") {

        print ("The file exists.\n");

}

This if statement tests for the existence of the file /u/jqpublic/file1.

Available File-Test Operators

Table 6.1 provides a complete list of the file-test operators available in Perl. In this table, name is a placeholder for the name of the operand being tested.

Table 6.1. The file-test operators.

Operator
Description
-b
Is name a block device?
-c
Is name a character device?
-d
Is name a directory?
-e
Does name exist?
-f
Is name an ordinary file?
-g
Does name have its setgid bit set?
-k
Does name have its "sticky bit" set?
-l
Is name a symbolic link?
-o
Is name owned by the user?
-p
Is name a named pipe?
-r
Is name a readable file?
-s
Is name a non-empty file?
-t
Does name represent a terminal?
-u
Does name have its setuid bit set?
-w
Is name a writable file?
-x
Is name an executable file?
-z
Is name an empty file?
-A
How long since name accessed?
-B
Is name a binary file?
-C
How long since name's inode accessed?
-M
How long since name modified?
-O
Is name owned by the "real user" only?*
-R
Is name readable by the "real user" only?*
-S
Is name a socket?
-T
Is name a text file?
-W
Is name writable by the "real user" only?*
-X
Is name executable by the "real user" only?*
* In this case, the "real user" is the userid specified at login, as opposed to the effective user ID, which is the userid under which you currently are working. (On some systems, a command such as /user/local/etc/suid enables you to change your effective user ID.)

The following sections describe some of the more common file-test operators and show you how they can be useful. (You'll also learn about more of these operators on Day 12, "Working with the File System.")

More on the -e Operator

When a Perl program opens a file for writing, it destroys anything that already exists in the file. This might not be what you want. Therefore, you might want to make sure that your program opens a file only if the file does not already exist.

You can use the -e file-test operator to test whether or not to open a file for writing. Listing 6.8 is an example of a program that does this.


Listing 6.8. A program that tests whether a file exists before opening it for writing.

1:  #!/usr/local/bin/perl

2:  

3:  unless (open(INFILE, "infile")) {

4:          die ("Input file infile cannot be opened.\n");

5:  }

6:  if (-e "outfile") {

7:          die ("Output file outfile already exists.\n");

8:  }

9:  unless (open(OUTFILE, ">outfile")) {

10:         die ("Output file outfile cannot be opened.\n");

11: }

12: $line = <INFILE>;

13: while ($line ne "") {

14:         chop ($line);

15:         print OUTFILE ("\U$line\E\n");

16:         $line = <INFILE>;

17: }



$ program6_8

Output file outfile already exists.

$

This program is the uppercase conversion program again; most of it should be familiar to you.

The only difference is lines 6-8, which use the -e file-test operator to check whether the output file outfile exists. If outfile exists, the program aborts, which ensures that the existing contents of outfile are not lost.

If outfile does not exist, the following expression fails:


-e "outfile"

and the program knows that it is safe to open outfile because it does not already exist.

Using File-Test Operators in Expressions

If you don't need to know exactly why your program is failing, you can combine all of the tests in Listing 6.8 into a single statement, as follows:


open(INFILE, "infile") && !(-e "outfile") &&

     open(OUTFILE, ">outfile") || die("Cannot open files\n");

Can you see how this works? Here's what is happening: The && operator, logical AND, is true only if both of its operands are true. In this case, the two && operators indicate that the subexpression up to, but not including, the || is true only if all three of the following are true:


open(INFILE, "infile")

!(-e "outfile")

open(OUTFILE, ">outfile")

All three are true only when the following conditions are met:

If any of these subexpressions is false, the entire expression up to the || is false. This means that the subexpression after the || (the call to die) is executed, and the program aborts.

Note that each of the three subexpressions associated with the && operators is evaluated in turn. This means that the subexpression


!(-e "outfile")

is evaluated only if


open(INFILE, "infile")

is true, and that the subexpression


open(OUTFILE, ">outfile")

is evaluated only if


!(-e "outfile")

is true. This is exactly the same logic that Listing 6.8 uses.

If any of the subexpressions is false, the Perl interpreter doesn't evaluate the rest of them because it knows that the final result of


open(INFILE, "infile") && !(-e "outfile") &&

     open(OUTFILE, ">outfile")

is going to be false. Instead, it goes on to evaluate the subexpression to the right of the ||, which is the call to die.

This program logic is somewhat complicated, and you shouldn't use it unless you feel really comfortable with it. The if statements in Listing 6.8 do the same thing and are easier to understand; however, it's useful to know how complicated statements such as the following one work because many Perl programmers like to write code that works in this way:


open(INFILE, "infile") && !(-e "outfile") &&

     open(OUTFILE, ">outfile") || die("Cannot open files\n");

In the next few days, you'll see several more examples of code that exploits how expressions work in Perl. "Perl hackers"-experienced Perl programmers-often enjoy compressing multiple statements into shorter ones, and they delight in complexity. Be warned.

Testing for Read Permission-the -r Operator

Before you can open a file for reading, you must have permission to read the file. The -r file-test operator tests whether you have permission to read a file.

Listing 6.9 checks whether the person running the program has permission to access a particular file.


Listing 6.9. A program that tests for read permission on a file.

1:  #!/usr/local/bin/perl

2:  

3:  unless (open(MYFILE, "file1")) {

4:          if (!(-e "file1")) {

5:                  die ("File file1 does not exist.\n");

6:          } elsif (!(-r "file1")) {

7:                  die ("You are not allowed to read file1.\n");

8:          } else {

9:                  die ("File1 cannot be opened\n");

10:         }

11: }



$ program6_9

You are not allowed to read file1.

$

Line 3 of this program tries to open file1. If the call to open fails, the program tries to find out why.

First, line 4 tests whether the file actually exists. If the file exists, the Perl interpreter executes line 6, which tests whether the file has the proper read permission. If it does not, die is called; it then prints the following message and exits:


You are not allowed to read file1.

NOTE
You do not need to use the -e file-test operator before using the -r file-test operator. If the file does not exist, -r returns false because you can't read a file that isn't there.
The only reason to use both -e and -r is to enable your program to determine exactly what is wrong.

Checking for Other Permissions

You can use file-test operators to test for other permissions as well. To check whether you have write permission on a file, use the -w file-test operator.


if (-w "file1") {

        print STDERR ("I can write to file1.\n");

} else {

        print STDERR ("I can't write to file1.\n");

}

The -x file-test operator checks whether you have execute permission on the file (in other words, whether the system thinks this is an executable program, and whether you have permission to run it if it is), as illustrated here:


if (-x "file1") {

        print STDERR ("I can run file1.\n");

} else {

        print STDERR ("I can't run file1.\n");

}

NOTE
If you are the system administrator (for example, you are running as user ID root) and have permission to access any file, the -r and -w file-test operators always return true if the file exists. Also, the -x test operator always returns true if the file is an executable program.

Checking for Empty Files

The -z file-test operator tests whether a file is empty. This provides a more refined test for whether or not to open a file for writing: if the file exists but is empty, no information is lost if you overwrite the existing file.

Listing 6.10 shows how to use -z.


Listing 6.10. A program that tests whether the file is empty before opening it for writing.

1:  #!/usr/local/bin/perl

2:  

3:  if (-e "outfile") {

4:          if (!(-w "outfile")) {

5:                 die ("Missing write permission for outfile.\n");

6:          }

7:          if (!(-z "outfile")) {

8:                  die ("File outfile is non-empty.\n");

9:          }

10: }

11: # at this point, the file is either empty or doesn't exist,

12: # and we have permission to write to it if it exists



$ program6_10

File outfile is non-empty.

$

Line 3 checks whether the file outfile exists using -e. If it exists, it can only be opened if the program has permission to write to the file; line 4 checks for this using -w.

Line 7 uses -z to test whether the file is empty. If it is not, line 7 calls die to terminate program execution.

The opposite of -z is the -s file-test operator, which returns a nonzero value if the file is not empty.


$size = -s "outfile";

if ($size == 0) {

        print ("The file is empty.\n");

} else {

        print ("The file is $size bytes long.\n");

}

The -s file-test operator actually returns the size of the file in bytes. It can still be used in conditional expressions, though, because any nonzero value (indicating that the file is not empty) is treated as true.

Listing 6.11 uses -s to return the size of a file that has a name which is supplied via the standard input file.


Listing 6.11. A program that prints the size of a file in bytes.

1:  #!/usr/local/bin/perl

2:  

3:  print ("Enter the name of the file:\n");

4:  $filename = <STDIN>;

5:  chop ($filename);

6:  if (!(-e $filename)) {

7:          print ("File $filename does not exist.\n");

8:  } else {

9:          $size = -s $filename;

10:         print ("File $filename contains $size bytes.\n");

11: }



$ program6_11

Enter the name of the file:

file1

File file1 contains 128 bytes.

$

Lines 3-5 obtain the name of the file and remove the trailing newline character.

Line 6 tests whether the file exists. If the file doesn't exist, the program indicates this.

Line 9 stores the size of the file in the scalar variable $size. The size is measured in bytes (one byte is equivalent to one character in a character string).

Line 10 prints out the number of bytes in the file.

Using File-Test Operators with File Variables

You can use file-test operators on file variables as well as character strings. In the following example the file-test operator -z tests the file represented by the file variable MYFILE:


if (-z MYFILE) {

        print ("This file is empty!\n");

}

As before, this file-test operator returns true if the file is empty and false if it is not.

Remember that file variables can be used only after you open the file. If you need to test a particular condition before opening the file (such as whether the file is nonzero), test it using the name of the file.

Reading from a Sequence of Files

Many UNIX utility programs are invoked using the following command syntax:


programname file1 file2 file3 ...

A program that uses this command syntax operates on all of the files specified on the command line in order, starting with file1. When file1 has been processed, the program then proceeds on to file2, and so on until all of the files have been exhausted.

In Perl, it's easy to write programs that process an arbitrary number of files because there is a special operator, the <> operator, that does all of the file-handling work for you.

To understand how the <> operator works, recall what happens when you put < and > around a file variable:


$list = <MYFILE>;

This statement reads a line of input from the file represented by the file variable MYFILE and stores it in the scalar variable $list. Similarly, the statement


$list = <>;

reads a line of input and stores it in the scalar variable $list; however, the file from which it reads is contained on the command line. Suppose, for example, a program containing a statement using the <> operator, such as the statement


$list = <>;

is called myprog and is called using the command


$ myprog file1 file2 file3

In this case, the first occurrence of the <> operator reads the first line of input from file1. Successive occurrences of <> read more lines from file1. When file1 is exhausted, <> reads the first line from file2, and so on. When the last file, file3, is exhausted, <> returns an empty string, which indicates that all the input has been read.

NOTE
If a program containing a <> operator is called with no command-line arguments, the <> operator reads input from the standard input file. In this case, the <> operator is equivalent to <STDIN>.
If a file named in a command-line argument does not exist, the Perl interpreter writes the following message to the standard error file:
Can't open name: No such file or directory
Here, name is a placeholder for the name of the file that the Perl interpreter cannot find. In this case, the Perl interpreter ignores name and continues on with the next file in the command line.

To see how the <> operator works, look at Listing 6.12, which displays the contents of the files specified on the command line. (If you are familiar with UNIX, you will recognize this as the behavior of the UNIX utility cat.) The output from Listing 6.12 assumes that files file1 and file2 are specified on the command line and that each file contains one line.


Listing 6.12. A program that displays the contents of one or more files.

1:  #!/usr/local/bin/perl

2:  

3:  while ($inputline = <>) {

4:         print ($inputline);

5:  }



$ program6_12 file1 file2

This is a line from file1.

This is a line from file2.

$

Once again, you can see how powerful and useful Perl is. This entire program consists of only five lines, including the header comment and a blank line.

Line 3 both reads a line from a file and tests to see whether the line is the empty string. Because the assignment operator = returns the value assigned, the expression


$inputline = <>

has the value "" (the null string) if and only if <> returns the null string, which happens only when there are no more lines to read from any of the input files. This is exactly the point at which the program wants to stop looping. (Recall that a "blank line" in a file is not the same as the null string because the blank line contains the newline character.) Because the null string is equivalent to false in a conditional expression, there is no need to use a conditional operator such as ne.

When line 3 is executed for the first time, the first line in the first input file, file1, is read and stored in the scalar variable $inputline. Because file1 contains only one line, the second pass through the loop, and the second execution of line 3, reads the first line of the second input file, file2.

After this, there are no more lines in either file1 or file2, so line 3 assigns the null string to $inputline, which terminates the loop.

When it reaches the end of the last file on the command line, the <> operator returns the empty string. However, if you use the <> operator after it has returned the empty string, the Perl interpreter assumes that you want to start reading input from the standard input file. (Recall that <> reads from the standard input file if there are no files on the command line.)
This means that you have to be a little more careful when you use <> than when you are reading using <MYFILE> (where MYFILE is a file variable). If MYFILE has been exhausted, repeated attempts to read using <MYFILE> continue to return the null string because there isn't anything left to read.

Reading into an Array Variable

As you have seen, if you read from a file using <STDIN> or <MYFILE> in an assignment to an array variable, the Perl interpreter reads the entire contents of the file into the array, as follows:


@array = <MYFILE>;

This works also with <>. For example, the statement


@array = <>;

reads all the contents all of the files on the command line into the array variable @array.

As always, be careful when you use this because you might end up with a very large array.

Using Command-Line Arguments as Values

As you've seen, the <> operator assumes that its command-line arguments are files. For example, if you start up the program shown in Listing 6.12 with the command


$ program6_12 myfile1 myfile2

the Perl interpreter assumes that the command-line arguments myfile1 and myfile2 are files and displays their contents.

Perl enables you to use the command-line arguments any way you want by defining a special array variable called @ARGV. When a Perl program starts up, this variable contains a list consisting of the command-line arguments. For example, the command


$ program6_12 myfile1 myfile2

sets @ARGV to the list


("myfile1", "myfile2")

NOTE
The shell you are running (sh, csh, or whatever you are using) is responsible for turning a command line such as
program6_12 myfile1 myfile2
into arguments. Normally, any spaces or tab characters are assumed to be separators that indicate where one command-line argument stops and the next begins. For example, the following are identical:
program6_12 myfile1 myfile2
program6_12 myfile1 myfile2
In each case, the command-line arguments are myfile1 and myfile2.
See your shell documentation for details on how to put blank spaces or tab characters into your command-line arguments.

As with all other array variables, you can access individual elements of @ARGV. For example, the statement


$var = $ARGV[0];

assigns the first element of @ARGV to the scalar variable $var.

You even can assign to some or all of @ARGV if you like. For example:


$ARGV[0] = 43;

If you assign to any or all of @ARGV, you overwrite what was already there, which means that any command-line arguments overwritten are lost.

To determine the number of command-line arguments, assign the array variable to a scalar variable, as follows:


$numargs = @ARGV;

As with all array variables, using an array variable in a place where the Perl interpreter expects a scalar variable means that the length of the array is used. In this case, $numargs is assigned the number of command-line arguments.

C programmers should take note that the first element of @ARGV, unlike argv[0] in C, does not contain the name of the program. In Perl, the first element of @ARGV is the first command-line argument.
To get the name of the program, use the system variable $0, which is discussed on Day 17, "System Variables."

To see how you can use @ARGV in a program, examine Listing 6.13. This program assumes that its first argument is a word to look for. The remaining arguments are assumed to be files in which to look for the word. The program prints out the searched-for word, the number of occurrences in each file, and the total number of occurrences.

This example assumes that the files file1 and file2 are defined and that each file contains the single line


This file contains a single line of input.

This example is then run with the command


$ programname single file1 file2

where programname is a placeholder for the name of the program. (If you are running the program yourself, you can name the program anything you like.)


Listing 6.13. A word-search and counting program.

1:  #!/usr/local/bin/perl

2:  

3:  print ("Word to search for: $ARGV[0]\n");

4:  $filecount = 1;

5:  $totalwordcount = 0;

6:  while ($filecount <= @ARGV-1) {

7:          unless (open (INFILE, $ARGV[$filecount])) {

8:                 die ("Can't open input file $ARGV[$filecount]\n");

9:          }

10:         $wordcount = 0;

11:         while ($line = <INFILE>) {

12:                 chop ($line);

13:                 @words = split(/ /, $line);

14:                 $w = 1;

15:                 while ($w <= @words) {

16:                         if ($words[$w-1] eq $ARGV[0]) {

17:                                 $wordcount += 1;

18:                         }

19:                         $w++;

20:                 }

21:         }

22:         print ("occurrences in file $ARGV[$filecount]: ");

23:         print ("$wordcount\n");

24:         $filecount++;

25:         $totalwordcount += $wordcount;

26: }

27: print ("total number of occurrences: $totalwordcount\n");



$ program6_13 single file1 file2

Word to search for: single

occurrences in file file1: 1

occurrences in file file2: 1

total number of occurrences: 2

$

Line 3 prints the word to search for. The program assumes that this word is the first argument in the command line and, therefore, is the first element of the array @ARGV.

Lines 7-9 open a file named on the command line. The first time line 7 is executed, the variable $filecount has the value 1, and the file whose name is in $ARGV[1] is opened. The next time through, $filecount is 2 and the file named in $ARGV[2] is opened, and so on. If a file cannot be opened, the program terminates.

Line 11 reads a line from a file. As before, the conditional expression


$line = <INFILE>

reads a line from the file represented by the file INFILE and assigns it to $line. If the file is empty, $line is assigned the null string, the conditional expression is false, and the loop in lines 11-21 is terminated.

Line 13 splits the line into words, and lines 15-20 compare each word with the search word. If the word matches, the word count for this file is incremented. This word count is reset when a new file is opened.

ARGV and the <> Operator

In Perl, the <> operator actually contains a hidden reference to the array @ARGV. Here's how it works:

  1. When the Perl interpreter sees the <> for the first time, it opens the file whose name is stored in $ARGV[0].
  2. After opening the file, the Perl interpreter executes the following library function:
    shift(@ARGV);
    This library function gets rid of the first element of @ARGV and moves every other element over one. This means that element x of @ARGV becomes element x-1.
  3. The <> operator then reads all of the lines of the file opened in step 1.
  4. When the <> operator exhausts an input file, the Perl interpreter goes back to step 1 and repeats the cycle again.

If you like, you can modify your program to retrieve a value from the command line and then fix @ARGV so that the <> operator can work properly. If you modify Listing 6.13 to do this, the result is Listing 6.14.


Listing 6.14. A word-search and counting program that uses <>.

1:  #!/usr/local/bin/perl

2:  

3:  $searchword = $ARGV[0];

4:  print ("Word to search for: $searchword\n");

5:  shift (@ARGV);

6:  $totalwordcount = $wordcount = 0;

7:  $filename = $ARGV[0];

8:  while ($line = <>) {

9:          chop ($line);

10:         @words = split(/ /, $line);

11:         $w = 1;

12:         while ($w <= @words) {

13:                 if ($words[$w-1] eq $searchword) {

14:                         $wordcount += 1;

15:                 }

16:                 $w++;

17:         }

18:         if (eof) {

19:                 print ("occurrences in file $filename: ");

20:                 print ("$wordcount\n");

21:                 $totalwordcount += $wordcount;

22:                 $wordcount = 0;

23:                 $filename = $ARGV[0];

24:         }

25: }

26: print ("total number of occurrences: $totalwordcount\n");



$ program6_14 single file1 file2

Word to search for: single

occurrences in file file1: 1

occurrences in file file2: 1

total number of occurrences: 2

$

Line 3 assigns the first command-line argument, the search word, to the scalar variable $searchword. This is necessary because the call to shift in line 5 destroys the initial value of $ARGV[0].

Line 5 adjusts the array @ARGV so that the <> operator can use it. To do this, it calls the library function shift. This function "shifts" the elements of the list stored in @ARGV. The element in $ARGV[1] is moved to $ARGV[0], the element in $ARGV[2] is moved to $ARGV[1], and so on. After shift is called, @ARGV contains the files to be searched, which is exactly what the <> operator is looking for.

Line 7 assigns the current value of $ARGV[0] to the scalar variable $filename. Because the <> operator in line 8 calls shift, the value of $ARGV[0] is lost unless the program does this.

Line 8 uses the <> operator to open the file named in $ARGV[0] and to read a line from the file. The array variable @ARGV is shifted at this point.

Lines 9-16 behave as in Listing 6.13. The only difference is that the search word is now in $searchword, not in $ARGV[0].

Line 18 introduces the library function eof. This function indicates whether the program has reached the end of the file being read by <>. If eof returns true, the next use of <> opens a new file and shifts @ARGV again.

Lines 19-23 prepare for the opening of a new file. The number of occurrences of the search word is printed, the current word count is added to the total word count, and the word count is reset to 0. Because the new filename to be opened is in $ARGV[0], line 23 preserves this filename by assigning it to $filename.

NOTE
You can use the <> operator to open and read any file you like by setting the value of @ARGV yourself. For example:
@ARGV = ("myfile1", "myfile2");
while ($line = <>) {
...
}
Here, when the statement containing the <> is executed for the first time, the file myfile1 is opened and its first line is read. Subsequent executions of <> each read another line of input from myfile1. When myfile1 is exhausted, myfile2 is opened and read one line at a time.

Opening Pipes

On machines running the UNIX operating system, two commands can be linked using a pipe. In this case, the standard output from the first command is linked, or piped, to the standard input to the second command.

Perl enables you to establish a pipe that links a Perl output file to the standard input file of another command. To do this, associate the file with the command by calling open, as follows:


open (MYPIPE, "| cat >hello");

The | character tells the Perl interpreter to establish a pipe. When MYPIPE is opened, output sent to MYPIPE becomes input to the command


cat >hello

Because the cat command displays the contents of the standard input file when called with no arguments, and >hello redirects the standard output file to the file hello, the open statement given here is identical to the statement


open (MYPIPE, ">hello");

You can use a pipe to send mail from within a Perl program. For example:


open (MESSAGE, "| mail dave");

print MESSAGE ("Hi, Dave!  Your Perl program sent this!\n");

close (MESSAGE);

The call to open establishes a pipe to the command mail dave. The file variable MESSAGE is now associated with this pipe. The call to print adds the line


Hi, Dave!  Your Perl program sent this!

to the message to be sent to user ID dave.

The call to close closes the pipe referenced by MESSAGE, which tells the system that the message is complete and can be sent. As you can see, the call to close is useful here because you can control exactly when the message is to be sent. (If you do not call close, MESSAGE is closed-and the message is sent-when the program terminates.)

Summary

Perl accesses files by means of file variables. File variables are associated with files by the open statement.

Files can be opened in any of three modes: read mode, write mode, and append mode. A file opened in read mode cannot be written to; a file opened in either of the other modes cannot be read. Opening a file in write mode destroys the existing contents of the file.

To read from an opened file, reference it using <name>, where name is a placeholder for the name of the file variable associated with the file. To write to a file, specify its file variable when calling print.

Perl defines three built-in file variables:

You can redirect STDIN and STDOUT by specifying < and >, respectively, on the command line. Messages sent to STDERR appear on the screen even if STDOUT is redirected to a file.

The close function closes the file associated with a particular file variable. close never needs to be called unless you want to control exactly when a file is to be made inaccessible.

The file-test operators provide a way of retrieving information on a particular file. The most common file-test operators are

You can use -w and -z to ensure that you do not overwrite a non-empty file.

The <> operator enables you to read data from files specified on the command line. This operator uses the built-in array variable @ARGV, whose elements consist of the items specified on the command line.

Perl enables you to open pipes. A pipe links the output from your Perl program to the input to another program.

Q&A

Q:How many files can I have open at one time?
A:Basically, as many as you like. The actual limit depends on the limitations of your operating system.
Q:Why does adding a closing newline character to the text string affect how die behaves?
A:Perl enables you to choose whether you want the filename and line number of the error message to appear. If you add a closing newline character to the string, the Perl interpreter assumes that you want to control how your error message is to appear.
Q:Which is better: to use <>, or to use @ARGV and shift when appropriate?
A:As is often the case, the answer is "It depends." If your program treats almost all of the command-line arguments as files, it is better to use <> because the mechanics of opening and closing files are taken care of for you. If you are doing a lot of unusual things with @ARGV, it is better not to manipulate it to use <>, because things can get complicated and confusing.
Q:Can I open more than one pipe at a time?
A:Yes. Your operating system keeps all of the various commands and processes organized and keeps track of which output goes with which input.
Q:Can I redirect STDERR?
A:Yes, but there is (normally) no reason why you should. STDERR's job is to report extraordinary conditions, and you usually want to see these, not have them buried in a file somewhere.
Q:How many command-line arguments can I specify?
A:Basically, as many as your command-line shell can handle.
Q:Can I write to a file and then read from it later?
A:Yes, but you can't do both at the same time. To read from a file you have written to, close the file by calling close and then open the file in read mode.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. Define the following terms:
    a.    file variable
    b.    reserved word
    c.    file mode
    d.    append mode
    e.    pipe
  2. From where does the <> operator read its data?
  3. What do the following file-test operators do?
    a.  -r
    b.  -x
    c.  -s
  4. What are the contents of the array @ARGV when the following Perl program is executed?
    $ myprog file1 file2 file3
  5. How do you indicate that a file is to be opened:
    a.    In write mode?
    b.    In append mode?
    c.    In read mode?
    d.    As a pipe?
  6. What is the relationship between @ARGV and the <> operator?

Exercises

  1. Write a program that takes the values on the command line, adds them together, and prints the result.
  2. Write a program that takes a list of files from the command line and examines their size. If a file is bigger than 10,000 bytes, print
    File name is a big file!
    where name is a placeholder for the name of the big file.
  3. Write a program that copies a file named file1 to file2, and then appends another copy of file1 to file2.
  4. Write a program that counts the total number of words in the files specified on the command line. When it has counted the words, it sends a message to user ID dave indicating the total number of words.
  5. Write a program that takes a list of files and indicates, for each file, whether the user has read, write, or execute permission.
  6. BUG BUSTER: What is wrong with the following program?
    #!/usr/local/bin/perl

    open (OUTFILE, "outfile");
    print OUTFILE ("This is my message\n");