Advanced Tutorial edit

File properties and default permissions.

An "inode" is a data structure that describes a file. Within any file system, the number of inodes, and hence the maximum number of files, is set when the filesystem is created. An inode holds most of the important information about the file, including the on-disk address of the file's data blocks. Each inode has its own unique identification number, called an "i-number". An inode also stores the file's ownership, access mode, timestamp, and type.

% stat F                - Display the status of file F, including:
                          device number, device type, inode number,
                          access rights, number of hard links, UID, GID,
                          total size in bytes, number of blocks allocated,
                          time of last access, time of last modification,
                          and time of last change.

The following is the ouput of the command "stat welcome.html":

  File: "welcome.html"
  Size: 7970       Blocks: 16        Regular File
Access: (0644/-rw-r--r--)         Uid: (11374/  abatko)  Gid: (   10/ wheel)
Device: 4          Inode: 1627279    Links: 1
Access: Thu Sep  9 11:44:37 1999
Modify: Mon Sep  6 15:47:12 1999
Change: Mon Sep  6 15:47:12 1999

Links.

It is possible to have special files called "links" that point to other files. UNIX provides two different kinds of links, namely hard links, and soft links. A soft link is merely a pointer to a file that is associated with a set of data blocks, whereas a hard link is another name for the set of blocks associated with the file to which it points; in essence, a hard link contains those data blocks.

Amongst other information, the filesystem associates a file's data blocks with an inode, a name, and the number of links to the data blocks.

Consider a regular file "foo". Upon creation, foo has one link to its data blocks. By making a hard link to foo, called "fooH", we create another regular file, making another link to foo's data blocks, thus increasing the link count to 2. Next if we make a soft link to foo, called "fooS", we create a special file known as a symbolic link, that refers to foo, but does not increment the link count; this is because a symbolic link has its own inode. If we were to delete foo, then fooS would become a "dangling link" since what it had once pointed to is now gone, namely the file foo; however, fooH will still have full access foo's data blocks, and the original inode's link count will be reduced by 1.

Assuming we had not deleted anything yet, if we delete fooH, then foo will still exist, and only the number of links to foo will be decremented, totalling 1. This is because data blocks are only "lost" when the link count goes to zero. Thus fooS will still be a valid symbolic link.

When speaking of links, the default is a hard link; thus if speaking of soft links, or symbolic links, an explicit distinction must be made.

In summary, when a file is created it has one link (a hard link) to its data blocks, namely its own name. When a hard link is created, another link is made to the same data blocks, thus increasing the number of hard links to a particular set of data blocks. However when a soft link (a symbolic link) is made, the number of links to the data blocks does not increase, instead the symbolic link acts as a pointer.

There are some important difference between hard and symbolic links. A hard link contains the data to which it was meant to point, thus hard links can take a lot of space. A symbolic link contains only the path to the file it points to. Unlike hard links, symbolic links (aka symlinks) can span filesystems, or even computer systems if a network file system is being used.

% ln F1 F2              - Create a hard link to file F1, naming it F2.
% ln -s F1 F2           - Create a soft link to file F1, naming it F2.
% ls -l F               - List in long format, information for file F.
                          If the file is a symbolic link, the filename is
                          printed followed by "->" and the path name of the
                          referenced file.
% ls -Ll F              - List in long format, information for the file linked
                          to by symbolic link.  The -L flag can be considered
                          as one that dereferences a symbolic link.

Advanced commands.

Advanced commands are ones that an intermediate is not concerned with. Not knowing about the existance of such commands does not inhibit getting work done; however they can be considered as power tools.

% tee F                 - Replicate the standard output, sending the copy to
                          file F.  This command is useful when used with a
                          pipe, since one copy of the ouput (and thus be
                          redirected to the pipe), while the second copy will
                          go to file F.

% script F              - Make a typescript of a terminal session, saving the
                          dialogue in file F.  If no file name is provided,
                          the typescript is saved in a file called typescript.
                          Press Ctrl-D or type `exit` to quit script.

% stty -a               - Write to standard output all of the option setting
                          for the terminal.

% split -l n F          - Split a file F into a set of files having at most
                          n lines each.  The original file F is left unchanged.

% splitvt               - Run two shells in a split window.  Use Ctrl-W to
                          toggle between the windows.

% xargs -n num U A...   - Construct a command line consisting of the utility U
                          and the argument(s) A(...).  Invoke the constructed
                          command line and wait for its completion.  The -n
                          flag specifies how many standard input arguments
                          to use.

% basename F s          - Strip all directory components from the file name F,
                          as well as the possible suffix s.

Regular Expressions:

A regular expression is a pattern that describes a strings. Used in combination with the grep utility, regular expressions aid in searching for character patterns in files (described later).

% man 7 regex           - User's regular expressions manual.
% man 3 regex           - Programmer's regular expressions manual.

Do a man on grep, and search for the part REGULAR EXPRESSIONS.

Most of the following is a shameless transcription of selected portions of the aforementioned. Note that regular expressions are defined in POSIX 1003.2, and come in two forms: modern or "extended", and obsolete or "basic".

% grep P F              - Search file F for all occurrences of pattern P.

Most characters including all letters and digits, are regular expressions that match themselves.

  a                     - Match the single character 'a'.
  hello                 - Match the sequence of characters 'hello'.
  i85                   - Match the sequence of characters 'i85'.

Any metacharacter with special meaning may be quoted by preceding it with a backslash. In basic regular expressions the metacharacters (described later) are

PROBLEM HERE WITH AFT...

  .   ?   *   +   ^   $   {   |   (   )   [   \

  
- Match the single character '\'.

- Match the two characters '
'.

A "bracket" expressions is a list of characters enclosed by [ and ] matches any single character in that list; if the first character in the list is the caret ^ then it matches any character not in the list.

  [234567]              - Match any single digit from '0' to '9'
  [^3x]                 - Match any single character other then '3', or 'x'.

A range of ASCII characters may be specified by giving the first and last characters, separated by a hyphen.

  [2-7]                 - Match any single character in the ASCII range from
                          '2' to '7'.
  [a-z]                 - Match any single character in the ASCII range from
                          'a' to 'z'.
  [0-9A-Za-z]           - Depending upon the ASCII character encoding, this may
                          match any character that is a digit or an upper or
                          lower case letter.  Note that ranges cannot share
                          endpoints.  Note that there are predefined classes of
                          characters that are independent of encoding, and are
                          thus portable.

A collating-sequence can be enclosed in ``[.'' and ``.]''.

  [[.ch.]]*c            - Matchs the first five characters of chchccc.

Most metacharacters lose their special meaning inside lists, or "bracketed" expressions. To include a literal ] place it first in the list (following a possible caret ^. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last.

  []a-d]                - Match any single character in the list of ']' and
                          the range 'a' to 'd'.
  [ab^d]                - Match any single character in the list of 'a', 'b',
                          '^', and 'd'.
  [ad2-]                - Match any single character in the list of 'a', 'd',
                          '2', and '-'.

The period . matches any single character.

A regular expression matching a single character may be followed by one of several repetition operators:

  ?                     - The preceding item will be matched 0 or 1 times.
  *                     - The preceding item will be matched 0 or more times.
  +                     - The preceding item will be matched 1 or more times.
  {n}                   - The preceding item is matched exactly n times.
  {n,}                  - The preceding item is matched n or more times.
  {,m}                  - The preceding item is optional and is matched at
                          most m times.
  {n,m}                 - The preceding item is matched at least n times, but
                          not more than m times.

Two regular expressions may be concatenated. Two regular expressions may be joined by the infix operator | resulting in a regular expression matching any string in either subexpression.

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. Parenthesis ( ) override these precedence rules.

Note that "basic" regular expressions are somewhat different the "extended" regular expressions. Two differences to keep in mind are that delimiters for bounds are \{ and \}, and parentheses for nested subexpressions are \( and \).

There is one new type of basic atom (a regular expression enclosed in ``()''), namely the back reference \ followed by a non-zero decimal digit d. It matches the same sequence of characters matched by the dth parenthesized subexpression.

  \([bc]\)\1            - Matches bb or cc but not bc.

Matching patterns in files.

Grep searches the named input file(s) for lines containing a match to the given pattern. Grep understands "basic" (obsolete) regular expressions, and "extended" (modern) regular expressions. The pattern given to grep is by default (implicitly) interpretted as a basic regular expression. It can also be made explicit by the flag -G. To interpret the pattern as an extended regular expression, use the -E flag.

% grep P F              - Search the input file F for lines containing a match
                          to pattern P, a basic regular expression.
% grep -G P F           - Interpret pattern P as a basic regular expression
                          (default).
% grep -E P F           - Interpret pattern P as an extended regular
                          expression.
% grep -N P F           - Grep.  If a match is found, print N lines of leading
                          and trailing context.
% grep -c P F           - Grep.  Suppress normal output; Print a count of
                          matching lines.
% grep -i P F           - Grep.  Ignoring case distinictions in both the
                          pattern and input file.
% grep -n P F           - Grep.  Prefix each line of output with the line
                          number within input file F.
% grep -v P F           - Grep.  Invert the sense of matching, to select
                          non-matching lines.
% grep P                - Grep standard input for pattern P.
% grep P -              - Grep standard input for pattern P.

Command-line Operability.

Luc Boulianne's Theorem (aka Luc's Theorem):

"The study of Computer Science is the study of Minimizing Keystrokes."

bash:

  C-a                   - Move to the (s)tart of the current line.
  C-e                   - Move to the (e)nd of the current line.
  C-f                   - Move (f)orward a character.
  C-b                   - Move (b)ack a character.
  M-f                   - Move (f)orward to the end of the next word.
  M-b                   - Move (b)ack to the start of this, or the previous
                          word.

  C-p                   - Fetch the (p)revious command from the history list,
                          moving back in this list.
  C-n                   - Fetch the (n)ext command from the history list,
                          moving forward in the list.

  C-h                   - delete the character behind the cursor.
  C-d                   - (d)elete the character under the cursor.
  M-d                   - (d)elete from the cursor to the end of the current
                          word, or if between words, to the end of the next
                          word.

  C-w                   - kill the (w)ord behind the cursor.
  C-k                   - (k)ill from the cursor to the end of the line.
  C-u                   - (u)nix-line-discard from cursor to beginning of line.

  C-y                   - (y)ank the top of the kill ring into the buffer at
                          the cursor.

  C-l                   - Clear the screen, leaving the current line at the top
                          of the screen.

  C-t                   - (t)ranspose characters:  drag the character before
                          point forward over the character at point.
  M-t                   - (t)ranspose words:  drag the word behind the cursor
                          past the word in front of the cursor.

  C-_                   - Incremental undo, separately remembered for each
                          line.
  C-x C-u               - Incremental undo, separately remembered for each
                          line.

  M-#                   - Make the current line a shell comment.

make.

make is a utility for maintaining, updating, and regenerating groups of related programs and files. The purpose of the make utility is to automatically determine which peices of a large program need to be recompiled, and issue the commands to recompile them. make can be used with any programming language whose compiler can be invoked from the shell. Note that make is not limited to programs; it can be used to update files from others, whenever the others change.

The command ``make'' relies on a file called and named "Makefile", which you must write to describe the relationships among files in your program and the commands for updating each file.

make executes commands in the makefile associated with each target, typically to create or update a file of the same name.

A target entry has the form:

            target [:|::] [dependency] ... [; command ] ...
                           [command ]
                           ...

If no target is specified upon invokation of make, all the targets are checked recursively against their dependencies.

Once a makefile exists, typing `make` suffices to perform all necessary recompilations.

% make                  - Perform all necessary recompilations to programs
                          specified in the file called Makefile.
                          The make program uses the makefile database and the
                          last-modification times of the files to decide which
                          of the files need to be updated.

The following is a great example of a short Makefile. If the first non-TAB character is a ``@'', the following command will not be printed before being exectued.

<MAKEFILE>

PODFILE = hash.pod
TITLE   = 'Perl Hash Howto'

OUTFILE = index.html
JUNK    = pod2htm*

all: $(PODFILE)
        @pod2html --infile=$(PODFILE) --outfile=$(OUTFILE) --title=$(TITLE)
        @rm -f $(JUNK)
        @if [ -r $(OUTFILE) ] ; then \
            chmod 644 $(OUTFILE); \
        fi

</MAKEFILE>

The following is another example of a Makefile:

<MAKEFILE>

COMPILER = gcc
MAIN_SOURCE = str2wrd.c
OBJ = someobjectfile.o
LIB = -lm
OUT_NAME = a.out

str2wrd: $(MAIN_SOURCE)
#       $(COMPILER) -o ($OUT_NAME) $(OBJ) $(LIB) $(MAIN_SOURCE)
        $(COMPILER) -o $(OUT_NAME) $(MAIN_SOURCE)

clean:
        \rm -f $(OBJ) $(OUT_NAME)

</MAKEFILE>

Revision Control System.

Programs, documentation, projects, and other such files that undergo frequent revisions or updates can be managed using the Revision Control System (RCS).

% man 1 rcsintro        - Manual containing an introduction to rcs.

Someone new to RCS need only learn two commands.

% ci                    - (c)heck (i)n.  Deposit the contents of a file into
                          an archival file called an RCS file.
% co                    - (c)heck (o)ut.  Retrive revisions from an RCS file.

Consider an assignment that will undergo frequent revisions. Let the file be called foo.c. Let's assume that foo.c resides at /courses/2000.1/cs537/ass/ass04/foo.c.

% cd ~/courses/2000.1/cs537/ass/ass04/
                        - Change directory to the place where foo.c lives.
% mkdir RCS             - Make an RCS directory called RCS.
% ci foo.c              - Check in file foo.c, thereby creating a corresponding
                          RCS file in the RCS directory, storing foo.c into it
                          as revision 1.1, and deleting foo.c.

Usefull tricks.

Use xargs to help kill processes:

% ps axwww | grep http | awk '{print $1}' | xargs -n1 kill -9
                        - Run ps, pipe it to grep.  Pipe the grep output to
                          awk, sending field 1 of each line to xargs.  xargs
                          executes `kill -9 on each incomming output of awk.

% tar -cvf - D1 | (cd /tmp; tar -xvf -)
                        - Tar-compress directory D1, to standard output, piping
                          it to a Tar-extract in /tmp while reading from
                          standard input.
% tar -cvf - jsse1.0 | (cd /usr/local ; tar -xvf -)

% for i in `grep "u1/" /var/etc/teaching.cs.mcgill.ca/passwd | grep \* | awk \
  -F: '{ print $1 }' ` ; do echo $i ; done

% perl -pi -e 's/hello/goodbye/g' F
                        - Inline text substitution of every occurence of the
                          word 'hello' for the word 'goodbye' in file F.

  ^Z                    - Suspends current job.  Some programs don't allow
                          suspention.  For example `pine` must be invoked
                          with -z to enable suspension.
  ~^Z                   - Suspend current login session.

Vim.

Vim is Vi IMproved. Vi stands for "visual editor". The underlying editor of both Vi and Vim is ed. Both Vi and Vim have two modes of operation: command mode and insert mode.

Range selection in vim can be done using v, V, and ^V.

  v                     - visual text selection per character
  V                     - visual text selection per line
  ^V                    - visual text selection per block (rectangular shape)

After making the desired selection, press 'y' to 'yank' (copy) the text into vim's buffer. Next move the cursor to a desired location and press 'p' to paste the selected text after the cursor, or 'P' to paste it before the cursor.

Searching and replacing. To the whole document (%), search (s), for 'hello' and replace it with 'goodbye', and confirm (c) the replacement. :%s/hello/goodbye/c

replace with goodbye (y/n/a/q/^E/^Y)?

  y                     - yes
  n                     - no
  a                     - all
  q                     - quit
  ^E                    - scroll up one line
  ^Y                    - scroll down one line

Both vi and vim can be invoked with a -r flag followed by the name of a .swp file. This will recover the swp file which may have been left behind after a system crash or corrupted session.

% vim -r F.swp          - Recover file F using swap file F.swp.
                          Note: after recovering the file using the swap file
                          you should delete the swap file.  Especially before
                          attempting to vim the recovered file again.

Type ":help" in Vim to get started. Type ":help subject" to get help on a specific subject.

Some usefull vim commands follow:

:set paste              - Turn on pasting mode.
:set nopaste            - Turn off pasting mode.
:set ai                 - Turn on autoindenting.
:set noai               - Turn off autoindenting.
:set textwidth=0        - Disable maximum text width.
:set textwidth=78       - Set max width that text can be insterted to 78.
:set list               - Show tabs as '^I' and end of line characters as '$'.
:set nolist             - Turn off list mode.
:syntax on              - Turn on syntax highlighting.
:syntax off             - Turn off syntax highlighting.

Variation of an Emacs/vi joke:

``Daddy! Daddy! Why are we hiding from the police?''
``Because they use Emacs, son, and we use vi.''

Subshells.

Like processes and subprocesses, when a shell starts another shell, the new shell is called a subshell. The child process (the subshell) inherits its parent's environment, however changes to the child's environment does not affect the parent. A shell script runs in a subshell.


Shell scripting (programming).

You can read Tom Christiansen's essay "Csh Programming Considered Harmful", to learn why csh (and by extension, tcsh) should not be used for writing shell scripts: http://www.cs.mcgill.ca/socsinfo/seminars/csh-whynot