Remove common lines from files with Python


I am digging Python. I am writing small pieces of code that does one thing and does it well, kind of like building a solid, reliable Lego piece. When I have a collection of them, I can snap ’em together to make something useful. In fact, I’ve used Python to generate some content behind the wiki I built, http://www.haidongji.com/wiki

One useful thing that I wrote recently is to solve this problem: suppose you have two files, 1.txt and 2.txt, your objective is to remove lines that exist in both files from 1.txt. I came up with 4 lines of Python code (including the import statement) to solve it. I am a bit amused by this, although I don’t necessarily like this style of programming. It is clever, but can be hard to understand and maintain later on. Here is the code. Just for demo purposes, no error handling!

[sourcecode language=”python”]
#!/usr/bin/env python

import fileinput

for line in fileinput.input(“1.txt”, inplace=1):
if line not in open(“2.txt”, “r”):
print line,
[/sourcecode]

Note the comma at the end of the print statement. It is necessary, otherwise you will have extra newline characters in your file.

To create a simple test, create 1.txt with the English alphabet, with each letter occupying a line. Then create 2.txt, say with the letters in the word “haidong”, again with each letter taking a line. Run the code and see what happens.

,

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.