notes - engineering - sem-2 - unix and shell programming - unit 4 simple filters - visvesvaraya technological university karnataka

USP

UNIT - 4

Simple Filters

4.1 Filters: Display, Beginning and End of File

There are some UNIX commands that accept input from standard input or files, perform some manipulation on it , and produces some output to the standard output. Since these commands perform some filtering operations on data , they are appropriately called as “Filters”. These filters are used to display the contents of a file in stored order, extract the lines of a specified file that contains a specific pattern etc.

Filters are the commands which accept data from standard input manipulate it and write the results to standard output. Filters are the central tools of the UNIX tool kit, and each filter performs a simple function. Some commands use delimiter, pipe (|) or colon (:). Many filters work well with delimited fields, and some simply won't work without them. The piping mechanism allows the standard output of one filter serve as standard input of another. The filters can read data from standard input when used without a filename as argument, and from the file otherwise.

The Simple Database: Several UNIX commands are provided for text editing shell programming. (emp.lst) - each line of this file has six fields separated by i delimiters. The details of an employee are stored in one single line. This text designed in fixed format and containing a personnel database. There are 15 lines, where each field is separated by the delimiter (|).

$ cat emp.ist

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

9876 | jai Sharma | director | production | 12/03/50 | 7000

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

2365 | barun sengupta | director | personnel | 11/05/47 | 7800

5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400

1006 | chanchal singhvi | director | sales | 03/09/38 | 6700

6213 | karuna ganguly |g.m. | accounts | 05/06/62 | 6300

1265 | s.n. dasgupta | manager | sales | 12/09/63 | 5600

4290 | jayant choudhary | executive | production | 07/09/50 | 6000

2476 | anil aggarwal | manager | sales | 01/05/59 | 5000

6521 | lalit choudhury | director | marketing | 26/09/45 | 8200

3212 | shyam saksena | d.g.m. | account | 12/12/55 | 6000

3564 | sudhir agarwal |executive | personnel | 06/07/47 | 7500

2345 | j.b.saxsena | g.m. | marketing | 12/03/45 | 8000

0110 | v.k. agrawal | g.m. | marketing | 31/12/40 | 9000

1. pr: paginating files: pr command adds suitable headers, footers and formatted text. pr adds five lines of margin at the top and bottom. The header shows the date and time of last modification of the file along with the filename and page number.

Syntax:

$ pr option filename

$ pr dept.lst

...blank lines...

May 06 10:38 1997 dept.lst page 1

01:accounts:6213

02:progs:5423

03:marketing:6521

05:production:9876

06:sales:1006

..blank lines.

pr options: The different options for pr command are:

-k prints k (integer) columns

-t to suppress the header and footer

-h to have a header of user's choice

-d double spaces input

- n will number each line and helps in debugging

- on offsets the lines by n spaces and increases left margin of page

For example, if a file xyz contains series of 20 numbers one in each line then -k and -t options will print the output as follows:

$cat xyz | pr –t -5

1 5 9 13 17

2 6 10 14 18

3 7 11 15 19

4 8 12 16 20

$ pr +10 chap01 # starts printing from page 10

$ pr -I 54 chap01 # this option sets the page length to 54

2. head - displaying the beginning of the file : The command displays the top of the file. It displays the first 10 lines of the file when used without an option.

Syntax:

$ head option filename

$ head emp.lst

Option:

-n to specify a line count

$ head –n 3 emp.lst

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

9876 | jai Sharma | director | production | 12/03/50 | 7000

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

3. tail: displaying the end of a file : This command displays the end of the file. It displays the last 10 lines of the file, when used without an option.

Syntax:

$ tail option filename

$ tail emp.lst

Option : -n to specify a line count

$ tail-n 3 emp.lst

3564 | sudhir agarwal |executive | personnel | 06/07/47 | 7500

2345 | j.b.saxsena | g.m. | marketing | 12/03/45 | 8000

0110 | v.k. agrawal | g.m. | marketing | 31/12/40 | 9000

Displays the last three lines of the file. We can also address lines from the beginning of the file instead of the end. The +count option allows to do that, where count represents the line number from where the selection should begin.

$ tail +11 emp.lst

Will display 11th line onwards.

Different options for tail are:

(i) Monitoring the file growth (-f)

(ii) Extracting bytes rather than lines (-c)

Use tail -f when we are running a program that continuously writes to a file, and we want to see how the file is growing. We have to terminate this command with the interrupt key.

4.2 Cut and Paste, Sorting

cut: slitting a file vertically : It is used for slitting the file vertically, head -n 5 I tee shortlist will select the first five lines of emp.lst and saves it to shortlist. We can cut by using-c option with a list of column numbers, delimited by a comma (cutting columns)

Syntax:

$ cut option filename

Options:

-c for cutting columns

-d for delimiters/field separator

-f for field

$ cut –c 6-22,24-32 shortlist

a.k.shukla g.m

jai Sharma director

sanika sar d.g.m.

barun sengupta director

n.k.gupta chairman

$ cut -c-3,6-22,28-34,55- shortlist

The expression 55- indicates column number 55 to end of line. Similarly, -3 is the same as 1-3. Most files don't contain fixed length lines, so we have to cut fields rather than columns (cutting fields).

$ cut -d "|" -f 2,3 shortlist |tee cutlist1

a.k.shukla |g.m

jai Sharma |director

sanika sar |d.g.m.

barun sengupta |director

n.k.gupta |chairman

Will display the second and third columns of shortlist and saves the output in cutlist1

(i) To print the remaining fields, we have

$ cut-d\|-f 1,4- shortlist > cutlist2

2. paste: pasting files : When we cut with cut, it can be pasted back with the paste command, vertically rather than horizontally. We can view two files side by side by pasting them. In the previous topic, cut was used to create the two files cutlist1 and cutlist2 containing two cut-out portions of the same file.

Syntax:

$ paste option filename

Options:

-d for adding delimiter

-s for joining lines

$ paste cutlisti cutlist2

a.k.shukla | g.m 2232 | sales | 12/12/52 | 6000

jai Sharma | director 9876 | production | 12/03/50 | 7000

sanika sar | d.g.m. 5678 | marketing | 19/04/43 | 6000

barun sengupta | director 2365 | personnel | 11/05/47 | 7800

n.k.gupta | chairman 5423 | admin | 30/08/56 | 5400

We can specify one or more delimiters with d

$ paste -d "|" cutlist1 cutlist2

a.k.shukla | g.m | 2232 | sales | 12/12/52 | 6000

jai Sharma | director | 9876 | production | 12/03/50 | 7000

sanika sar | d.g.m. | 5678 | marketing | 19/04/43 | 6000

barun sengupta | director | 2365 | personnel | 11/05/47 | 7800

n.k.gupta | chairman | 5423 | admin | 30/08/56 | 5400

Where each field will be separated by the delimiter |. Even though paste uses at least two files for concatenating lines, the data for one file can supplied through the standard input.

Let us consider that the file address book contains the details of three persons:

$cat addressbook

Sudhakar

vvsr.sudhakar@gmail.com

7689567860

Prateek

pratsin@yahoo.com

9128465857

Manisha

mani.vara@gmail.com

9763745348

Spaste -s-d "|| \n" addressbook -are used in a circular manner

Sudhakar |vvsr.sudhakar@gm

ail.com |7689567860

Prateek | pratsin@yahoo.com |9128465857

Manisha mani.vara@gmail.com |9763745348

3. sort: ordering a file : Sorting is the ordering of data in ascending or descending sequence. The sort command orders a file and by default, the entire line is sorted

Syntax:

$ sort option filename

$sort shortlist

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

2365 | barun sengupta | director | personnel | 11/05/47 | 7800

5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

9876 | jai Sharma | director | production | 12/03/50 | 7000

This default sorting sequence can be altered by using certain options. We can also sort one or more keys (fields) or use a different ordering rule.

sort options:

The important sort options are:

-t char uses delimiter char to identify fields

-k n sorts on nth field

-k m,n starts sort on mth field and ends sort on nth field

-k m.n starts sort on nth column of mth field

-u removes repeated lines

-n sorts numerically

-r reverses sort order

-f folds lowercase to equivalent uppercase

-m list merges sorted files in list

-c checks if file is sorted

-o filename places output in file filename

$sort -t"|" -k 2 shortlist

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

2365 | barun sengupta | director | personnel | 11/05/47 | 7800

9876 | jai Sharma | director | production | 12/03/50 | 7000

5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

$sort -t"|"-r-k 2 shortlist

$sort-"|"-k 2 r shortlist

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400

9876 | jai Sharma | director | production | 12/03/50 | 7000

2365 | barun sengupta | director | personnel | 11/05/47 | 7800

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

$sort-t"|" -k 3,3 -k 2,2 shortlist

5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

2365 | barun sengupta | director | personnel | 11/05/47 | 7800

9876 | jai Sharma | director | production | 12/03/50 | 7000

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

$sort -t"|"-k 5.7,5.8 shortlist

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

2365 | barun sengupta | director | personnel | 11/05/47 | 7800

9876 | jai Sharma | director | production | 12/03/50 | 7000

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400

when sort acts on numericals, strange things can happen. When we sort a file containing only numbers, we get a curious result. This can be overridden by –n (numeric) option.

$sort numfile

$sort -n numfile

$cut -d "|"-f3 emp.lst | sort -u | tee desigx.lst

Chairman

d.g.m

director

executive

g.m.

manager

Removing repeated lines can be possible using -u option as shown above. If we cut out the designation filed from emp.lst, we can pipe it to sort to find out the unique designations that occur in the file.

Other sort options are:

sort-o sortedlist -k 3 shortlist #output stored in sortedlist

sort -o shortlist shortlist #output stored in same file

The -c option is used to check whether the file has actually been sorted in the default order.

sort -c shortlist

The -m option is used to merge two or more files that are sorted individually.

sort -m foo1 foo2 foo3

4.3 Grep, tr, uniq and comp Command

grep [Global Regular Expression Printing] :-

This command is used to search for a specified pattern form a specified file and display those lines containing the patter.

Syntax:-

grep [-option] pattern <filename>

Where options

-b ignores spaces, tab.

-i Ignore case

-v Displays only the lines that do not match the specified pattern.

-e Displays the total number of occurrences of the pattern in the file.

-n Displays the resultant lines along with their line number.

Example:-

$cat emp.ext

1001 Ram Computer CS

1002 Merry Electronics ET

1003 John Computer CS

$grep “CS” emp.txt

o/p:- 1001 Ram Computer CS

1003 John Computer CS

Regular Expression Character Set

*: Represents any number of characters

?: Represents any single character.

[r1-r2]: Range

[^abcd] : Matches a single character which is not a,b,c or d.

^[character]: Matches the lines that are beginning with the character specified in <Character>

[character]$ :Matches the lines that are ending with the character specified in <character>

Example:-

$grep “Com*” emp.txt

o/p:- 1001 Ram Computer CS

1003 John Computer CS

Related commands with grep:- 1.egrep [ Extended grep]

2.fgrep [ Fixed grep]

egrep :- This command offers additional features than grep. Multiple patterns can be searched by using pipe symbol

2. uniq command: locate repeated and non-repeated lines : When we concatenate or merge files, we will face the problem of duplicate entries creeping in. We saw how sort removes them with the -u option. UNIX offers a special tool to handle these lines-the uniq command.

Syntax:

$ uniq option filename

Consider a sorted dept.lst that includes repeated lines:

$cat dept.lst

01|accounts |6213

02|admin |5423

03|marketing | 6521

03| marketing |6521

04|personnel |2365

05|production |9876

06|sales |1006

displays all lines with duplicates. Where as,

$uniq dept.lst

01 |accounts |6213

02 |admin |5423

03 |marketing | 6521

04 |personnel |2365

05 | production |9876

06 |sales |1006

simply fetches one copy of each line and writes it to the standard output. Since uniq requires a sorted file as input, the general procedure is to sort a file and pipe its output to uniq. The following pipeline also produces the same output, except that the output is saved in a file:

sort dept.lst | uniq - uniqlist

Options : Selecting the non-repeated lines (-u):

cut -d "|" -f3 emp.lst | sort | uniq -u

chairman

Selecting the duplicate lines (-d):

cut -d "|" -f3 emp.lst | sort | uniq -d

d.g.m.

director

executive

g.m.

manager

Counting frequency of occurrence (c):

cut-d"|"-f3 emp.lst |sort| uniq -c

1 chairman

2 d.g.m.

4 director

2 executive

4 g.m.

2 manager

3. tr command: translating characters: The tr filter manipulates the individual characters in a line. It translates characters using one or two compact expressions.

Syntax:

tr options expn1 expn2 standard input

It takes input only from standard input it doesn't take a filename as argument. By default, it translates each character in expression1 to its mapped counterpart in expression2. The first character in the first expression is replaced with the first character in the second expression, and similarly for the other characters.

$tr '|/' ‘~’<emp.lst | head -n 3

2233 ~ a.k.shukla ~ g.m ~ sales ~ 12-12-52 ~ 6000

9876 ~ jai sharma ~ director ~ production ~ 12-03-50 ~ 7000

5678 ~ sanika sar ~ d.g.m. ~ marketing ~ 19-04-43 ~ 6000

It is easy to define the two expressions as two separate variables and then evaluate in double quotes.

exp1=’|/' ; exp2= ‘~_‘

tr "$exp1" "$exp2" < emp.lst

Changing case of text is possible from lower to upper for first three lines of the file.

$head -n 3 emp.lst | tr '[a-z]' '[A-Z]’

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

9876 | jai Sharma | director | production | 12/03/50 | 7000

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

Deleting characters (-d):

tr -d'|/’< emp.lst | head -n 3

2233a.k.shukla g.m sales 1212526000

9876jai Sharma director production 1203507000

5678sanika sar d.g.m. marketing 1904436000

Compressing multiple consecutive charecters (-s):

tr-s’ ‘< emp.lst | head -n 3

2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000

9876 | jai Sharma | director | production | 12/03/50 | 7000

5678 | sanika sar | d.g.m. | marketing | 19/04/43 | 6000

Complementing values of expression (-c):

tr-cd '|/' <emp.lst| head –n 3

||||//|||||//|||||//|

4. cmp command : Comparing two files : The cmp command is used to compare files. The syntax is as follows:

Syntax:

$ cmp option filename filename

$ cmp file1 file2

file1 file2 differ: char 9, line 1

The file1 and file2 are compared byte by byte, and the location of first mismatch is echoed to the screen.

$ cat file1 file2

Sumit

Sudhakar

Yogiraj

Mrinal

_._._._._._._._._._

Sumet

Sudhakur

Yogeraj

Mrunal

If we want to list out all the differences in the two files then we will use

$cmp –| file1 file2

4 151 145 //Fourth character has the octal values

151 and 145

13 141 165

19 151 145

26 51 165

The comm command: listing common records: A comm command compare line to the sorted files filel file2. It produces three column output. First column shows compare line lines unique to the first file, second column shows lines unique to the second file third column shows lines common to both file.

Syntax:

$ comm [options] <file1> <file2>

$ cat file1 file2

Australia

China

India

Japan

New Zealand

_._._._._._._._._._

California

China

India

Nepal

Tanzania

$ comm file1 file2

Australia

California

China

India

Japan

Nepal

New Zealand

Tanzania

Options:

-1 Suppress printing of column 1

- 2 Suppress printing of column 2

- 3 Suppress printing of column 3

- 12 prints only lines in column 3

- 13 prints only lines in column 2

- 23 prints only lines in column 1

$ comm -12 file1 file2

China

India

The diff command: Displaying suggestion to make both files identical

A diff command can be used to display file differences. Output consist of lines of contest from each file, with file1 tagged by a < symbol and file file2 tagged by a > symbol. Context lines are preceded by the following commands.

a-append, d-delete, c-change

Syntax:

$ comm [options] <file1> <file2>

$ cat file1 file2

c.k. shukla

chanchal singhvi

s.n. dasgupta

Sanika Sar

_._._._._._._._._._

anil aggarwal

barun sengupta

c.k. shukla

lalit chowdhury

s.n. dasgupta

$ diff file1 file2

0 a 1,2 // append line 1 to 2 of second file after line 0 of first file

> anil agarwal

> barun sengupta

2 c 4 // change line 2 of first file with line 4 of second file

< chanchal singhvi

_._._._._._._._._._

> lalit chowdhury

4 d 5 // delete line 4 of first file and line 5 of second file

< Sanika Sar

References

Sumitabha Das: UNIX – Concepts and Applications, 4th Edition, Tata McGraw Hill, 2006.
Behrouz A. Forouzan and Richard F. Gilberg: UNIX and Shell Programming, Cengage Learning, 2005.
M.G. Venkateshmurthy: UNIX & Shell Programming, Pearson Education, 2005.

Sign Up

Index

Notes

Highlighted

Underlined

Browse by Topics

Notes

Highlighted

Underlined