Mar 14, 2013

How to remove duplicate lines from a file in perl

Today, we discuss about the following :
1) How to open the directory & read all the files
2) In each file, remove duplicate lines from a file and keep it in same path/file or different path/file
4) How to create time-stamp in Perl
3) How to create a log file and log the changes

Mention the corresponding path where you want to copy

e.g., If you want to remove duplicated line and copy in the same file
my $input_dir  = "/nfs/fm/disks/my_files";
my $output_dir = "/nfs/fm/disks/my_files";
my $log_dir    = "/nfs/fm/disks/my_files";

e.g., If you want to copy the output of the files in different locations
my $input_dir  = "/nfs/fm/disks/input_dir";
my $output_dir = "/nfs/fm/disks/output_dir";
my $log_dir    = "/nfs/fm/disks/log_fir";

remove_duplicates.pl

#!/usr/bin/perl

use strict;
use warnings;

my $input_dir  = "/nfs/fm/disks/my_files";
my $output_dir = "/nfs/fm/disks/my_files"; 
my $log_dir    = "/nfs/fm/disks/my_files";

my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst);

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$sec  = sprintf("%02d",$sec);
$min  = sprintf("%02d",$min);
$hour = sprintf("%02d",$hour);
$mday = sprintf("%02d", $mday);
$mon  = sprintf("%02d", $mon+1);
$year = sprintf("%04d", $year+1900);

my $output_tag = $mday .'_'. $mon .'_'. $year .'_'. $hour .'_'. $min;

opendir(IN_DIR, $input_dir) or die $!; 
opendir(OUT_DIR, $output_dir) or die $!; 
opendir(LOG_DIR, $log_dir) or die $!; 

open (LOG_FILE, "> $log_dir/remove_duplicate_lines.log");

print LOG_FILE "------------- START ------------ \n";

my (@invalid_list_files, %file_without_dup );

while (my $file = readdir(IN_DIR)) {

    next if $file =~ /^\./;

    print LOG_FILE "\n Folder Path : " , "$input_dir/$file";    

    unless (-f "$input_dir/$file") {
        print LOG_FILE "\n Not a File : " . $file;
        next;
    }
        
    print LOG_FILE "\n File Name : " . $file  . "\n";

    #Reading the File 
    open(IN_FILE, "<$file") ||  die "\n Cant open file for reading: " . $file;
    my @lines = 
    close(IN_FILE);

    my @newlines;
    foreach my $each_line (@lines) {
        if (not defined $file_without_dup{$each_line}) {
            push(@newlines, $each_line);
        }
        $file_without_dup{$each_line} = 1;
    }

    #Writing into Output Files
    open(OUT_FILE, ">$output_dir/$file") || die "-W- Cant open file for writing: " . "$output_dir/$file";
    print OUT_FILE @newlines;
    close(OUT_FILE);
    
}
close (LOG_FILE);
closedir(IN_DIR);
closedir(OUT_DIR);
closedir(LOG_DIR);

1;



You may also wish to read Object Oriented Concepts in Python as mentioned :
Python Class and Object Example
Inheritance in Python
Packages in Python
Exceptions in Python


No comments:

Post a Comment