Jul 10, 2013

Perl Remove Special Characters From File

While reading some kind of log files as mentioned below, we need to get rid of these special characters.

Because of these special characters, it makes the Developers job tough :
To parse the content of a file
To convert the special characters from the file

Let us explain how can we get rid of these special characters with a simple example

test.log
^[[1;31mTest 1^[[0m
^[[1;31mTest 2^[[0m
^[[1;31mTest 3^[[0m
^[[1;31mTest 4^[[0m
^[[1;31mTest 5^[[0m  


In the above test.log file, first of all, what is displayed as ^[ is not ^ and [
But it is the ASCII ESC character, produced by Esc or Ctrl[ (the ^ notation means the Ctrl key).

We can use the following regular expression :

s/\e\[[\d;]*[a-zA-Z]//g;

Note: 
\e represents escape character in the above regular expression (substituting instead of ^[ )
We can shorten from [a-zA-Z] to just [mK], based on the requirement
You can make use of the above regular expression while parsing the file as well (line by line)

In case if you want to backup file (test.log.bak) instead of changing in the original file (test.log) then use the following :
perl -pi.bak -e 's/\e\[[\d;]*[a-zA-Z]//g' test.log

The following will remove the special chars in test.log
perl -pi -e 's/\e\[[\d;]*[a-zA-Z]//g' test.log

Output after removing
Test 1
Test 2
Test 3
Test 4
Test 5
  



No comments:

Post a Comment