Sed Substitute Command: Basics, Backreferences, and File Editing

Sed stands for “Stream EDitor”: it is a free and open source utility installed by default on all Linux and Unix-based operating systems out there. What it does, is performing text manipulation on files, but it can also be used as part of pipeline and supports the use of regular expressions. In this tutorial, we learn the basics of the sed substitute command.

In this tutorial you will learn:

  • The sed “substitute” command basic syntax
  • How to use backreferences
  • Some of the most used “substitute” command flags
  • How to modify a file in place and optionally create a backup of it
Introduction to sed substitute command
Introduction to sed substitute command
Category Requirements, Conventions or Software Version Used
System Distribution agnostic
Software sed
Other None
Conventions # – requires given linux-commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux-commands to be executed as a regular non-privileged user

The “substitute” command

Substitute is probably the most known and used sed command: it helps us replace text patterns in a file in a non-interactive way. The first thing we should do, if we want to learn how to use it, it’s to take a look at its syntax. Let’s see an example: suppose we have a file called lotr.txt containing the famous ring poem written by John Ronald Reuel Tolkien:

Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all, and in the darkness bind them,
In the Land of Mordor where the Shadows lie.



Now, imagine we want to replace all occurrences of the word “Ring”, with the word “foo”, without using a full-fledged text editor, perhaps from a shell script. To perform such action using sed, we would run:

$ sed 's/Ring/foo/g' lotr.txt

As soon as we launch the command, the processed content of the file will appear on the standard output:

Three foos for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One foo to rule them all, One foo to find them,
One foo to bring them all, and in the darkness bind them,
In the Land of Mordor where the Shadows lie.

As expected, each occurrence of the word “Ring”, was replaced with “foo”; the original file, however, remained unchanged: this is because, by default, sed simply writes to standard output. Let’s analyze what we did in the example above. The sed utility accepts a series of command which are represented by a single letter, and sometimes, some kind of arguments. In this case, we used the s command (substitute). The syntax of the command is the following:

s/regexp/replacement/flags

The utility tries to match the specified regular expression (“regexp”) in the specified file (or stream); each match of the expression is substituted with “replacement”.

Using backreferences

As part of both the regexp, and the replacement, we can use backreferences, which let us reference sub-parts of a matched regular expression enclosed in escaped parenthesis, which create capturing groups. Let’s see an example:

$ sed 's/\([a-z]\)\1\([ ,/]\)/\1\2/g' lotr.txt

In the example above, we specified the following regular expression:

\([a-z]\)\1\([ ,.]\)

The first [a-z] pattern matches any ASCII character from “a” to “z”; as you can see, it is written between escaped parenthesis: this creates a capturing group which allows us to reference the matched text later using \1, where “1” means: first capturing group.



This is what we did immediately after, to match “any double character”. Finally, we defined another capturing group which includes the [,.] expression: this (sub)expression matches a “space”, a “comma” or a “dot”. As you can imagine, we can reference the second capturing group, by using \2. The whole expression therefore means: “any double character followed by a space, a comma, or a dot”.

In the “substitution” part of the command, we used backreferences again, to replace the whole matched pattern with a single occurrence of the character matched by the expression contained in the first capturing group, followed by the match of the expression contained in the second one (again: a space, a comma or a dot).

You can also notice that after the “substitution” part of the command, we used the g letter: this is a flag which makes all matches in a line to be substituted instead of just the first one (the latter is the default sed behavior). We will talk specifically about flags in the next section. As a result of the command, any double letter at the end of a word is replaced by a single occurrence. This is the result:

Thre Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them al, One Ring to find them,
One Ring to bring them al, and in the darknes bind them,
In the Land of Mordor where the Shadows lie.

In the “substitution” part of the command, we can also use the unescaped & character, which references the whole regular expression match.

Sed “substitute” command flags

When using the s command, we can specify a series of flags which can modify its behavior. Let’s see some of them, and their effect.

The “g” flag

The gflag modifies the behavior of the sed “substitute” command so that all matches of a regexp in a line, are substituted, instead of just the first one. Just as an example, take a look at the 6th line of the ring poem: the word “Ring” appears two times. Let’s run the “substitute” command, as we did in the first example, but without the g flag:

$ sed 's/Ring/foo/' lotr.txt

We obtain:

One foo to rule them all, One Ring to find them,

As you can see, only the first occurrence of the word “Ring” was substituted with “foo”. If we use the g flag, instead, both occurrences are affected by the substitution.

The “i” flag

The i flag causes the regular expression, used in the s command, to become case-insensitive. For example, the regex in the following command will match “Ring” even if “ring” is specified:

$ sed 's/ring/foo/i' lotr.txt

Using a number as a flag

When a number is used as a flag, the behavior of the sed s command changes so that only n matches of the regex in a line are substituted by “replacement” (sed works on line-basis). If we run:

$ sed 's/Ring/foo/1' lotr.txt

We obtain the following output:

Three foos for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One foo to rule them all, One Ring to find them,
One foo to bring them all, and in the darkness bind them,
In the Land of Mordor where the Shadows lie.

The “p” flag

The p flag changes the behavior of sed, so that it outputs the lines in which a substitution was performed. This is particularly useful when sed is used with the -n option, which makes it silent. When the two things are combined, only changed lines are printed:

$ sed -n 's/Ring/foo/gp' lotr.txt

The command returns:

Three foos for the Elven-kings under the sky,
One foo to rule them all, One foo to find them,
One foo to bring them all, and in the darkness bind them,



The ones above are only some of the flags which can be used with the sed “substitute” command; the complete and detailed list can be found by reading the official documentation of the “s” command.

Substituting text in place

As we already said, the output of the sed “substitute” command is printed on the standard output, so the original file is not altered. If we want to change the content of a file in place, all we have to do is to invoke sed with the -i option (short for --in-place), e.g:

$ sed -i 's/Ring/foo/g' lotr.txt

It is also possible to provide a suffix as argument to the option. When we do so, a backup file is created with the suffix we specified. For example, if we run:

$ sed --in-place='.bk' 's/Ring/foo/g' lotr.txt

If we were to launch the command above, the target file would be modified in place; its original content, however, would be saved in a backup file called lotr.txt.bk.

Conclusions

In this article, we learned the basics of the sed “substitute” command. We saw what is the syntax of the command, how to use regular expressions and backreferences. We also saw some of the flags that can be used with the “s” command, like i to make the regexp case-insensitive, and g to substitute all regexp matches in a line. Finally, we saw how to modify a file in place and optionally create a backup of it before the changes are written.



Comments and Discussions
Linux Forum