Texas A&M Supercomputing Facility Texas A&M University Texas A&M Supercomputing Facility

Exercise 5 : Processing input

Laboratory Exercise for Introduction to Perl

This assumes you have already set up your initial directory with the init-perllab command (instructions).


Printing selected fields (loginpid.pl)

  1. Change to the Lab5 directory:

    cd /scratch/mylogin/PerlLab/Lab5

    The directory path may be different if you installed the files in another location.
  2. If this directory does not exist, run the init-perllab command again:

    /g/software/bin/init-perllab

    The setup program may warn that it is only installing Lab5 and skipping existing directories. Simply hit Enter to confirm. Once complete, change to the newly created directory:

    cd /scratch/mylogin/PerlLab/Lab5

    The directory path may be different if you installed the files in another location.
  3. List the directory contents:

    ls -l

    You should see the files: INSTRUCTIONS README Solutions countlogin.pl loginpid.pl psout
  4. Look at the loginpid.pl file:

    cat loginpid.pl

    The contents of this file are as follows (note the line numbers have been added for clarity and are not in the actual file):
    1. #! /usr/bin/perl
    2.  
    3. # loginpid.pl - print the login name and process id (PID) from
    4. # each line of "psout". Print a count of processes.
    5. #
    6. # The "psout" file contains the output of the ps command. The
    7. # first line contains column headers. Each line after that contains
    8. # an entry for each process running on the machine. The first field
    9. # is the username (login id), the second field is the process id (PID),
    10. # and the remaining fields have additional information. For the purpose
    11. # of this program, you only need to capture the first two columns and
    12. # keep a total of the number of processes. Print the header row
    13. # (the first two titles) but don't count it.
    14. #
    15. # USER PID START TIME COMMAND
    16. # krish 396 Oct01 0:03 sshd: krish@pts/3
    17. # krish 397 Oct01 0:00 -ksh
    18. # bedros 1299 Oct06 0:00 /bin/bash /g/home/pbs/torq...
    19. #
    20. # This selection has 3 processes and the first two columns are:
    21. #
    22. # USER PID
    23. # krish 396
    24. # krish 397
    25. # bedros 1299
    26.  
    27. use strict;
    28. use warnings;
    29.  
    30. use IO::File;
    31.  
    32. # open the file "psout" for reading, $fh contains a filehandle
    33. my $fh = IO::File->new("<psout") or die $!;
    34.  
    35. my $count = 0;
    36.  
    37. # read the file
    38. # for each line (including header), print only the first 2 fields
    39. # suggestion: use s/// substitution to replace full line with
    40. # the first two fields, using grouping ( )
    41. # count the number of lines (not including the header line)
    42. # suggestion: use a while(<$fh>) loop
    43.  
    44. # close the file
    45. $fh->close;
    46.  
    47. # print total
  5. Look at the psout file:

    less psout

    The less command allows you to view the file a page at a time.
  6. Edit the loginpid.pl program so it will print only the first two columns of as well as counting the total number of processes. Note that the first line contains column headers. You should print the column headers for the first two columns, but don't count this as a process. Each remaining line contains an entry for a process running on the machine. The first column is the username (login id) of the owner of the process. The second column is a unique process id (PID).

    See Extracting matches in the Perl regular expression tutoral, or search for this section in the man page available on eos:

    man perlretut
EXPECTED OUTPUT -- SOLUTION


Building a table of counts (countlogin.pl)

  1. Look at the countlogin.pl file:

    cat countlogin.pl

    The contents of this file are as follows (note the line numbers have been added for clarity and are not in the actual file):
    1. #! /usr/bin/perl
    2.  
    3. # countlogin.pl - count the number of login processes for each user name
    4. # in the ps output saved in the file "psout"
    5. #
    6. # The "psout" file contains the output of the ps command. The
    7. # first line contains column headers. Each line after that contains
    8. # an entry for each process running on the machine. The first field
    9. # is the username (login id), the second field is the process id (PID),
    10. # and the last field is the command. You can determine that a process
    11. # is a login shell because it starts with a minus sign ("-"), followed
    12. # by the name of the shell.
    13. #
    14. # USER PID START TIME COMMAND
    15. # krish 396 Oct01 0:03 sshd: krish@pts/3
    16. # krish 397 Oct01 0:00 -ksh
    17. # bedros 1299 Oct06 0:00 /bin/bash /g/home/pbs/torq...
    18. #
    19. # In this sample, only process # 397, which is owned by user "krish"
    20. # is a login shell. For this program, you can ignore all the other
    21. # processes which do not match the pattern for a login shell.
    22. #
    23.  
    24. use strict;
    25. use warnings;
    26.  
    27. use IO::File;
    28.  
    29. # open the file "psout" for reading, $fh contains a filehandle
    30. my $fh = IO::File->new("<psout") or die $!;
    31.  
    32. # read the first line and ignore it (scalar prevents the read operation
    33. # from defaulting to reading the whole file as a list)
    34. scalar <$fh>;
    35.  
    36. # initialize the hash table you will use to count the logins
    37. my %count = ();
    38.  
    39. # read the remainder of the file
    40. # skip lines which are not a shell (a dash "-" followed by a word at
    41. # the end of the line
    42. # check to see if the hash entry exists using:
    43. # exists $count{$login}
    44. # if it does not exist, create it and initialize it to 0
    45. # increment the count for that user name
    46. #
    47. # Suggestion: while (<$fh>) loop
    48.  
    49. # close the file
    50. $fh->close;
    51.  
    52. # print a list of the usernames and the process count for each
  2. Edit the program so it will open the file "psout", read the contents, keep a running count of the number of processes by user (i.e., a hash table), and, when the file has been completely read, print the final total of the number of processes per user. The output should look something like:
    output
       amrish   1
      aya3706   1
       bedros   1
      c0s2008   1
        chao1   1
      dba1359   1
      diego07   1
       donzis   1
      eosagus   3
      etrufan   1
       kjacks   4
        krish   1
        lmkli   1
      m0m391a   1
      natesal   2
          ntp   1
      pingluo   1
      qlf1582   1
      rlb3511   2
      s0j3095   1
        tskim   1
     vtunesag   1
          xfs   1
    

    Note: To test whether the hash table already contains a value for a given key, use the function exists(). To get a list of the contents of a hash table, use the function keys(). To sort that list, use the function sort().
SOLUTION