Linux三剑客(awk)

原创已于 2022-07-31 23:33:41 修改 · 900 阅读

2 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#linux #服务器 #bash

于 2022-07-31 19:38:32 首次发布

Linux 专栏收录该内容

6 篇文章

订阅专栏

本文详细介绍了Linux中的awk命令，包括基本语法、打印、BEGIN和END块、列分隔符设置、变量操作、if函数、printf函数、while函数、运算符使用以及正则表达式实例。awk是一个强大的文本分析工具，可用于处理和分析文本文件，通过设置模式和动作，实现对文件内容的灵活处理。文中还展示了如何利用awk进行字符串拼接、文件行计数、过滤和格式化输出等实际操作。

学习内容

学习Linux awk命令

命令详解

AWK 是一种处理文本文件的语言，是一个强大的文本分析工具。
之所以叫 AWK 是因为其取了三位创始人 Alfred Aho，Peter Weinberger, 和 Brian Kernighan 的 Family Name 的首字符。

基本语法

awk [option] 'pattern{action}' file1,file2,...filen

示例

基本打印

测试文本

#test.txt
this is a test file 
this is a test file 
this is a test file 
this is a test file 
this is a test file

awk '{print $1,$2}'是指打印出这个文件的第一列和第二列。当我们不指定分隔符的时候，awk会默认按照空格来进行分割，当字符中间的空格有多个的时候，awk会将连续的空格理解为一个分隔符。

[root@hadooptest01 opt]# cat test.txt | awk '{print $1,$2}'
this is
this is
this is
this is
this is

awk '{print $0}'的时候，会将这一列的值全部打印出来

[root@hadooptest01 opt]# cat test.txt | awk '{print $0}'
this is a test file 
this is a test file 
this is a test file 
this is a test file 
this is a test file

拼接字符串

[root@hadooptest01 opt]# cat test.txt | awk '{print $1,$2,"pig"}'
this is pig
this is pig
this is pig
this is pig
this is pig

BEGIN 和 END

收尾添加相关字符
awk [option] ‘pattern{action}’ file1,file2,…filen
当我们把pattern的模式设置问BEGIN或者END的时候，它就可以在我们输出文件的时候，添加文件的首尾字符串，需要注意的是，BEGIN和END不能写为begin或者end。

BEGIN 和 END。
BEGIN{ 这里面放的是执行前的语句 }
END {这里面放的是处理完所有的行后要执行的语句 }
{这里面放的是处理每一行时要执行的语句}

[root@hadooptest01 opt]# cat test.txt | awk 'BEGIN{print "this is awk begin"} {print $1,$2,"pig"} END{print "this is awk end"}'
this is awk begin
this is pig
this is pig
this is pig
this is pig
this is pig
this is awk end

列分隔符

列分隔符
awk -F，的时候，awk命令就是用，作为分割符号，来分割这个文件中的内容了。

#test.txt
this,is,a,dog
this,is,a,dog
this,is,a,dog
this,is,a,dog
#注意 逗号空格
[root@hadooptest01 opt]# cat test.txt | awk -F, '{print $1}'
this
this
this
this

多分隔符

#log.txt
2 this is a test
3 Do you like awk
This's a test
10 There are orange,apple,mongo

//使用空格分割
[root@hadooptest01 opt]# awk '{print $1,$2,$5}' log.txt 
2 this test
3 Do awk
This's a 
10 There 

//使用多个分隔符.先使用空格分割，然后对分割结果再使用","分割
[root@hadooptest01 opt]# awk -F '[ ,]' '{print $1,$2,$5}' log.txt 
2 this test
3 Do awk
This's a 
10 There apple

设置变量

awk -v # 设置变量

log1.txt
1
3
a
5
b
# a=3
[root@hadooptest01 opt]# awk -va=3 '{print $1,$1+a}' log1.txt 
1 4
3 6
a 3
5 8
b 3
# a=3 b=1
[root@hadooptest01 opt]# awk -va=3 -vb=1 '{print $1,$1+a,$1+b}' log1.txt 
1 4 2
3 6 4
a 3 1
5 8 6
b 3 1

NR FNR : 输出顺序号 NR, 匹配文本行号

# log.txt 
this is a test
Do you like awki
This's a test
There are orange,apple,mongo

[root@hadooptest01 opt]# awk '{print NR,FNR,$0}' log.txt 
1 1 this is a test
2 2 Do you like awki
3 3 This's a test
4 4 There are orange,apple,mongo

NR FNR 区别 FNR是把不同文件的行号进行了区分，而NR没有对文件的行号进行区分

//NR FNR 区别
[root@hadooptest01 opt]# cat test1.txt 
this is a test program
this is a shell test program
[root@hadooptest01 opt]# cat test2.txt 
this#is#a#test#program
this#is#a#shell#test#program

[root@hadooptest01 opt]# awk '{print NR ,$0}' test1.txt test2.txt 
1 this is a test program
2 this is a shell test program
3 this#is#a#test#program
4 this#is#a#shell#test#program

[root@hadooptest01 opt]# awk '{print FNR ,$0}' test1.txt test2.txt 
1 this is a test program
2 this is a shell test program
1 this#is#a#test#program
2 this#is#a#shell#test#program

//打印行数>1的文本
lzx ➤ awk 'NR>1' test1.txt
this is a shell test program

ARGC和ARGV.
ARGV[0]指的是awk这个命令，这一点是awk命令规定的，其他的参数都是值得是后面处理的文件的名称，
ARGC指的是ARGV数组的值的个数，在本例子中，它的值是3。

[root@dev01 yeyz_shell]# awk 'BEGIN{print "aaa",ARGV[1]}' test1 test2
aaa test1

[root@dev01 yeyz_shell]# awk 'BEGIN{print "aaa",ARGV[2]}' test1 test2
aaa test2

[root@dev01 yeyz_shell]# awk 'BEGIN{print "aaa",ARGV[0],ARGV[1],ARGV[2]}' test1 test2
aaa awk test1 test2

[root@dev01 yeyz_shell]# awk 'BEGIN{print "aaa",ARGV[0],ARGV[1],ARGV[2],ARGC}' test1 test2
aaa awk test1 test2

OFS 指定分隔符

[root@hadooptest01 opt]# awk '{print $1,$2}' OFS="***" log.txt 
this***is
Do***you
This's***a
There***are

NF:打印最后一行

 #test.txt
this is a test file
this is a test file
this is a test file
this is a test file
this is a test file

//打印第一行和最后一行
lzx ➤ awk '{print $1,$NF}' test.txt
this file
this file
this file
this file
this file

if函数

打印第一列的最长字符的长度

#cat test.txt
1
11
111
11111

lzx ➤ awk 'BEGIN{cnt=0} {if(length($1)>cnt){cnt=length($1)}} END{print cnt}' test.txt
5

printf函数

printf函数:
%s 字符类型
%d 数值类型
占15格的字符串
‘-’ 表示左对齐，默认是右对齐
printf默认不会在行尾自动换行，加\n

# test.txt
小刘 111111
小左 222222

lzx ➤ awk '{printf "用户名:%s 用户id %d\n",$1,$2}' test.txt
用户名:小刘 用户id 111111
用户名:小左 用户id 222222

lzx ➤ awk '{printf "|%-15s| %-10s|\n", $1,$2}' test.txt
|小刘         | 111111    |
|小左         | 222222    |

while函数

每行打印3次

# test.txt
小刘 111111
小左 222222

lzx ➤ awk '{i=0;while(i<3) {print $0;i++}}' test.txt
小刘 111111
小刘 111111
小刘 111111
小左 222222
小左 222222
小左 222222

lzx ➤ awk '{i=0;while(i<3) {print "名字: 用户ID:",$1,$2;i++}}' test.txt
名字: 用户ID: 小刘 111111
名字: 用户ID: 小刘 111111
名字: 用户ID: 小刘 111111
名字: 用户ID: 小左 222222
名字: 用户ID: 小左 222222
名字: 用户ID: 小左 222222

运算符

在这里插入图片描述

1.过滤第一列大于2 的
[root@hadooptest01 opt]# awk '$1>2' log1.txt 
3
a
5
b

2.找出第一列等于1 的
[root@hadooptest01 opt]# awk '$1==1' log1.txt 
1

3.第一列大于2 的 第二列等于e的
#log1.txt
1
3 e
a
5
b
[root@hadooptest01 opt]# awk '$1>2 && $2=="e"' log1.txt 
3 e

使用正则

~ 和 !~ 匹配正则表达式和不匹配正则表达式

# log.txt
this is a test
Do you like awki
This's a test
There are orange,apple,mongo

1.第一列包含 th
[root@hadooptest01 opt]# awk '$1 ~ /th/ {print $1,$2}' log.txt 
this is

2.第一列以Th开头的
[root@hadooptest01 opt]# awk '$1 ~ /^Th/ {print $1,$2}' log.txt 
This's a
There are

3.第一列不以Th开头的
[root@hadooptest01 opt]# awk '$1 !~ /^Th/ {print $1,$2}' log.txt 
this is
Do you

4.第一列不包含is的
[root@hadooptest01 opt]# awk '$1 !~/is/ {print $1,$2}' log.txt 
Do you
There are
[root@hadooptest01 opt]# awk '!/is/ {print $1,$2}' log.txt 
Do you
There are

好用实例

计算文件大小

[root@hadooptest01 opt]# ls -l *.txt
-rw-r--r--. 1 root root 12 Jul 31 18:35 log1.txt
-rw-r--r--. 1 root root 75 Jul 31 18:40 log.txt
-rw-r--r--. 1 root root 42 Jul 31 17:47 test.txt
[root@hadooptest01 opt]# ls -l *.txt | awk '{print sum+=$5}'
12
87
129

[root@hadooptest01 opt]# ls -l *.txt | awk '{sum+=$5} END{print sum}'
129