sed.py

A full implementation of sed in Python


Contents


 

General Information

Usage as a command line utility

Usage as a Python module

Sed dialect

Testing

Timing

To do list

 


General Information


Description

sed.py is a full and working Python implementation of sed. Its reference is GNU sed 4.2 of which it implements almost all commands and features. It may be used as a command line utility or it can be used as a module to bring sed functionality to Python scripts.

A complete set of tests is available as well as a testing utility. These tests include scripts from various origins and cover all aspects of sed functionalities.

 


Platform

sed.py is a Python script and should run on any platform where a recent version of Python is installed.

Detailed compatibility status:

Python 3
Fully compatible
Python 2.7.4 and above
Fully compatible
Python 2.7 to Python 2.7.3
Fully compatible except regexps of the form ((.*)*). This causes one of the script from Chang suite to fail.
Python 2.6
Fully compatible except regexps of the form ((.*)*). argparse module must be installed.
Python 2.5 and below
Not tested

Compatibility status applies also to the testing utility test-suite.py.

 


License

sed.py is released under the MIT license.

 


Links and contact

 


Usage as a command line utility


 

sed.py may be used as console program receiving information from the command line. The format of the command line is:

sed.py [options] -e<script expression> <input text file>
sed.py [options] -f<script file> <input text file>

Note that sed.py accepts only one script file or expression, and only one input file. options may be one or both of:

-ndisable automatic printing
-ruse extended regular expressions

sed.py may also use redirection to receive its input or send its output with the usual syntax:

cat myfile | sed.py -f myscript1 | sed.py -f myscript2 > myresultfile

It is also possible for sed.py to receive its input from the keyboard by omitting any input file:

sed.py -f myscript

It is a Windows command line limitation that redirection does not work when calling directly a python script on the command line (check this for explanation). In that case, it is required to explicitly call python. Assuming python is in the path:

> type myfile | python sed.py -f myscript > myresultfile

It is also possible to hide the call in a batch file and even call it without any extension:

> type sed.bat
python sed.py %1 %2 %3 %4 %5
> type myfile | sed -f myscript > myresultfile
...

 


Usage as a Python module


 

An example covering all necessary symbols:

from sed import Sed, SedException

sed = Sed()
try:
    sed.no_autoprint = True
    sed.regexp_extended = False
    sed.load_script('myscript.sed')
    sed.apply('myinput.txt')
except SedException as e:
    print e.message
except:
    raise

Note that sed.apply() returns the list of lines printed by the script. As a default, these lines are printed to stdout. sed.apply() has an output parameter which enables to inhibit printing the lines (output=None) or enables to redirect the output to some text file (output=somefile.txt).

The script may also be read from a string by using sed.load_string(my_script_string).

 


sed dialect


sed.py implements all standard commands and regular expression features of sed. Its reference is GNU sed 4.2. It implements almost all its features except the most specific ones.

GNU sed manual page can serve as a reference for sed.py given the differences described in the following.

 


Addresses

numberstandard behavior
$standard behavior
/regexp/standard behavior
/regexp/Iimplemented
\%regexp%standard behavior
address,addressstandard behavior
address!standard behavior
0,/regexp/not implemented
first~stepnot implemented
addr1,+Nnot implemented
addr1,~Nnot implemented

 


Regular expressions

charstandard behavior
*standard behavior
\+standard behavior
\?standard behavior
\{i\} \{i,j} \{i,\}standard behavior
\(regexp\)standard behavior
.standard behavior
^standard behavior. When not at start of regexp, matches as itself
$standard behavior. When not at end of regexp, matches as itself
[list] [^list]standard behavior. [.ch.], [=a=], [:space:] are not implemented
regexp1\|regexp2standard behavior
regexp1regexp2standard behavior
\digitstandard behavior (back reference)
\n \tstandard behavior (extensions \s\S etc. are not handled)
\charstandard behavior (disable special regexp characters)

Note that for any combination of quantifiers (*, +, ?, {}), consecutive quantifiers or a quantifier starting a regexp will launch an error. This is true in basic or extended regular expression modes.

 


Extended regular expressions

Using the -r switch enables to simplify regular expressions by removing the antislah character before the special characters +, ?, (, ), |, { and }. If these characters must appear as regular characters in a regexp, they must be slashed.

 


Commands

a\
text
Compliant with GNU sed description (including one liner syntax and double address extensions)
b labelCompliant with GNU sed description
: labelCompliant with GNU sed description
c\
text
Compliant with GNU sed description (including single line and double address extensions)
dCompliant with GNU sed description
DCompliant with GNU sed description
=Compliant with GNU sed description (including double address extension)
gCompliant with GNU sed description
GCompliant with GNU sed description
hCompliant with GNU sed description
HCompliant with GNU sed description
i\
text
Compliant with GNU sed description (including single line and double address extensions)
lCompliant with GNU sed description (length parameter not implemented)
nCompliant with GNU sed description
NCompliant with GNU sed description
pCompliant with GNU sed description
PCompliant with GNU sed description
qCompliant with GNU sed description (except exit code extension)
r filenameCompliant with GNU sed description (including double address extension but not reading from stdin)
sCompliant with GNU sed description (except escape sequences in replacement (\L, \l, \U, \u, \E), modifiers e and M/m, and combination of modifier g and number)
t labelCompliant with GNU sed description
w filenameCompliant with GNU sed description (including double address extension but not writing to stdout or stderr)
xCompliant with GNU sed description
yCompliant with GNU sed description
#Compliant with GNU sed description. (comments start anywhere in the line.)

The other commands specific to GNU sed are not implemented.


Testing


 

Description

The working of sed.py is tested and compared to the behavior of GNU sed with a set of tests and a testing utility.

The tests are either coded in text files with .suite extension or may be stored in test directories as standard sed scripts.

The test suites are:

unit.suitea text file containing unitary tests
chang.suitea text file containing scripts from Roger Chang web site
test-suite1a set of scripts from GNU sed test suite
test-suite2a set of scripts from the seder's grab-bag, Rosetta code web site and GitHub (lisp!)
test-suite3additional unitary tests better stored in a folder with some extra data text files
test-suite4a set of scripts from the sed $HOME

Note that the goal of these tests is not to check the correctness of the scripts but to verify that sed.py and GNU sed have the same behavior.

 


Testing utility

Tests are launched and checked with the test-suite.py Python script. This script uses either sed.py to run the sed scripts, or any sed executable. This enables to compare the working of sed.py with the one of GNU sed.

The calling syntax is:

test-suite.py <testsuite> [number] [-b executable] [-x list of script references]
testsuiteeither a text file with .suite extension or a test directory
numberan optional reference number of a test, when present only this tests is run
executablean optional name or path of a sed executable to use for testing
list of script referencesan optional list of tests to exclude for instance when a feature is not implemented. A script reference is either the title of the test for tests stored in modules, or the the name of the script file.

 


Text file test suites

When tests are stored in a text file (with .suite extension), they are made of four elements:

The four elements of a test are separated with lines made of three identical characters, for instance:

---
Test substitution with global flag
---
s/an/AN/g
---
In Xanadu did Kubhla Khan
---
In XANadu did Kubhla KhAN
---

Note also that:

 


Directory test suites

When tests are stored in a directory, they are represented by three or four files with same name but different extensions:

Some other files may be used when using reading or writing commands in scripts. In that case, the expected written files must be named with extension '.wgoodN' where N is the number of the expected written file.


Timing


 

A python implementation of sed has to face legitimate questions about timing. Fortunately, results are not bad. Unfortunately, they seem correlated with version number. Timings are given in seconds.

Platform GNU sed 4.2.1 sed.py python 2.6 sed.py python 2.7 sed.py python 3.4
Windows7, Intel Xeon 3.2 GHz, 6 Gb RAM 19.4 19.1 22.6 26.9
Windows XP, Intel Pentium4 3.2 GHz, 4 Gb RAM 47.5 50.7 56.5 71.2
Linux, Intel Pentium4 3.2 GHz, 4 Gb RAM - - 51.0 -

Test conditions:

 


To do list


 

At one moment, one has to decide what will be in the release to come, and what can be delayed. Here are some features which would be nice to have but can be delayed to a future version.