blogger visitor

Techucation

A Blog by Malcolm Yoke Hean Low

 Subscribe in a reader




View Malcolm Low's profile on LinkedIn

Malcolm Low

Create Your Badge

 Subscribe in a reader


Enter your email address:

Delivered by FeedBurner

Quick Guide to nawk - Examples, Field Separators, Arrays

Posted on Tuesday, January 10, 2006 at 11:56 PM by Malcolm

Here is a quick guide to nawk.I prefer to use nawk over awk as it has more functionalities. Most systems now would have both programs installed. See also


To run nawk

  • From command line : nawk 'program' inputfile1 inputfile2 …
  • From a file : nawk -f programfile inputfile1 inputfile2 …

Structure of nawk program

  • A nawk program can consist of three sections: nawk 'BEGIN{…}{… /* BODY */ …}{END}' inputfile
  • Both 'BEGIN' and 'END' blocks are optional and are executed only once.
  • The body is executed for each line in the input file.

Field Separators

  • The following example adds the field '=' separator, in addition to the blank space separator : nawk 'BEGIN{FS = " *|="}{print $2}' input file.
  • For example, if the input file contains the line "Total = 500", then the output will be 500.

Printing Environment Variables

  • The following example appends the current path to a list of filenames/directories:
    ls -alg | nawk '{print ENVIRON["$PWD"] "/" $8}'

  • ENVIRON is an array of environment variables index by the individual variable name.

  • The variable FILENAME is a string that stores the current name of the file nawk is parsing.

Examples of usage

  • To kill all the jobs of the current user : kill -9 `ps -ef | grep $LOGNAME | nawk '{print $2}'`

Multi-dimensional array

  • To use 2D or multi-dimensional array, use comma to seperate the array index: matrix[3, 5] = $(i+5)

Another examples

  • The example below calculates the averages for 16 items from 10 sets of readings.
  • Example of an input line the program is trying to match : Total elapsed time is 560
    BEGIN{
      printf("--------- Execution Time -----------\n");
      item=16;
      set=10;
    }
    {# all new variables are initialized to 0for(;j < set;j++)
      for(i=0;i < item; i++)
      {# skip input until the second word matches "elapsed"while($2 != "elapsed")
      getline;# notice the use of array without declaring its# dimensionsum[i]+=$5;
    getline;
      }
    
    if(j==set){for(i=0;i < item;i++){
       
      # this and the next 2 lines are comments
      # you can use either print or printf for output 
      # print sum[i]/set;
       
      printf("Set %d : %6.3f\n",i,sum[i]/set);
    }
    j++;
      }
    }END{
      printf("-------------- End --------------");
    }
    

Examples from the man page

  • Write to the standard output all input lines for which field 3 is greater than 5:
    $3 > 5

  • Write every tenth line:
    (NR % 10) == 0

  • Write any line with a substring matching the regular expression:
    /(G|D)(2[0-9][[:alpha:]]*)/

  • Print any line with a substring containing a G or D, followed by a sequence of digits and characters:
    /(G|D)([[:digit:][:alpha:]]*)/

  • Write any line in which the second field contains a backslash:
    $2 ~ /\\/

  • Write any line in which the second field contains a backslash (alternate method). Note that backslash escapes are interpreted twice, once in lexical processing of the string and once in processing the regular expression.
    $2 ~ "\\\\"

  • Write the second to the last and the last field in each line, separating the fields by a colon:
    {OFS=":";print $(NF-1), $NF}

  • Write lines longer than 72 characters:
    {length($0) > 72}

  • Write first two fields in opposite order separated by the OFS:
    { print $2, $1 }

  • Same, with input fields separated by comma or space and tab characters, or both:
    BEGIN { FS = ",[\t]*|[\t]+" }{ print $2, $1 }

  • Add up first column, print sum and average:
    {s += $1 }END{print "sum is ", s, " average is", s/NR}

  • Write fields in reverse order, one per line (many lines out for each line in):
    { for (i = NF; i > 0; --i) print $i }

  • Write all lines between occurrences of the strings "start" and "stop":
    /start/, /stop/

  • Write all lines whose first field is different from the previous one:
    $1 != prev { print; prev = $1 }

  • Simulate the echo command:
    BEGIN { for (i = 1; i < ARGC; ++i) printf "%s%s", ARGV[i], i==ARGC-1?"\n":""}

  • Write the path prefixes contained in the PATH environment variable, one per line:
    BEGIN{n = split (ENVIRON["PATH"], path, ":") for (i = 1; i <= n; ++i) print path[i]}

Edited on: Sunday, May 27, 2012 12:18 PM

Posted in General (RSS)