Today, we will discuss removing duplicate lines from a file in Python
You may also wish to read Object Oriented Concepts in Python as mentioned :
Python Class and Object Example
Inheritance in Python
Packages in Python
Exceptions in Python
How to remove duplicate lines from a file in perl
Lets discuss in two ways as mentioned below :
1) Removing duplicate lines and print the lines in order (When Order is important)
- Using normal way
2) Removing duplicate lines and print the lines in any order (When Order is NOT important)
- Using SET concept in Python. SET concept in python does not consider Order.
Note:
Both the scripts read duplicated content from file_with_duplicates.txt,
Read the above file and remove duplicate lines and finally
Print to file_without_duplicates.txt
Input: file_with_duplicates.txt
1) remove_duplicate_lines_from_file_with_order.py
- Using Normal Way - it take cares of order of the lines
Output: file_without_duplicates.txt
2) remove_duplicate_lines_from_file_without_order.py
- Using SET concept - it does not consider the order of the lines
Output: file_without_duplicates.txt
Note:
Both the scripts read duplicated content from file_with_duplicates.txt,
Read the above file and remove duplicate lines and finally
Print to file_without_duplicates.txt
Input: file_with_duplicates.txt
Mother Teresa Winston Churchill Abraham Lincoln Mahatma Gandhi Winston Churchill Mother Teresa Abraham Lincoln
1)
infile = open('file_with_duplicates.txt', 'r')outfile = open('file_without_duplicates.txt', 'w')lines_seen = set()for line in infile:if line not in lines_seen:outfile.write(line)lines_seen.add(line)outfile.close()
1) remove_duplicate_lines_from_file_with_order.py
- Using Normal Way - it take cares of order of the lines
#!/usr/bin/python
try:
input_file = open("file_with_duplicates.txt", "r")
output_file = open("file_without_duplicates.txt", "w")
unique = []
for line in input_file:
line = line.strip()
if line not in unique:
unique.append(line)
input_file.close()
for i in range(0, len(unique)-1):
unique[i] += "\n"
output_file.writelines(unique)
output_file.close
except FileNotFoundError:
print('\n File NOT Found Error')
sys.exit
except IOError:
print('\n IO Error')
sys.exit
Output: file_without_duplicates.txt
Mother Teresa Winston Churchill Abraham Lincoln Mahatma Gandhi
2) remove_duplicate_lines_from_file_without_order.py
- Using SET concept - it does not consider the order of the lines
#!/usr/bin/python
try:
input_file = open("file_with_duplicates.txt", "r")
output_file = open("file_without_duplicates.txt","w")
#The main drawback of using sets is, the order of the lines may not be same as in input file
uniquelines = set(input_file.read().split("\n"))
output_file.write("".join([line + "\n" for line in uniquelines]))
input_file.close()
output_file.close()
except FileNotFoundError:
print('\n File NOT Found Error')
sys.exit
except IOError:
print('\n IO Error')
sys.exit
Output: file_without_duplicates.txt
Abraham Lincoln Winston Churchill Mother Teresa Mahatma Gandhi
You may also wish to read Object Oriented Concepts in Python as mentioned :
Python Class and Object Example
Inheritance in Python
Packages in Python
Exceptions in Python
How to remove duplicate lines from a file in perl
No comments:
Post a Comment