Mar 17, 2013

Python Regular Expressions (re.compile Vs re.match)

Today we discuss about the following topics :
1) Pre-compiled Regular Expressions - Uses & Advantages (General Concept)
2) Explain the difference between Python re.compile Vs re.match with an example.

Let's dive into the topic & continue the fun :

1) Pre-compiled Regular Expressions - Uses & Advantages (General Concept)
This is a general topic irrespective of any language (Perl or Python or any)
The extra mile of using a pre-compiled regular expression is :

     a) When you have to execute the same regular expression pattern over millions of lines (say reading lines a file), then pre-compiled regular expressions are very handy by drastically reducing the time of execution.

     b) Since the regex pattern is pre-compiled in advance, no need to process the regex pattern each and every time while reading the lines from the file (suppose millions of lines in the file)

    c) Here, we assume, the regex pattern which you have to use, should be constant, if has to vary every time then pre-compiled regex will not be that useful.

2) Explain the difference between Python re.compile Vs re.match with an example.
Let us explain the explain with a lucid example.
Here we are trying to run a regex pattern in loop of 100000 times with both Python re.compile(pre-complied) & re.match
Please check the time difference between in the output file.

E.g., regex_compile_vs_match.py

import re
import time

input_str       = "Mother Teresa"
count_compile   = 0
count_match     = 0

time_val_1 = time.time()

compiled_regex = re.compile('(\w+)\s+(\w+)')

for i in range(1000000):
    if compiled_regex.match(input_str):
        count_compile += 1

print ("Count Compile Value : ", count_compile)        
print("Time taken by Using Compile : ",  time.time() - time_val_1)
time_val_2 = time.time()

for i in range(1000000):
    if re.match('(\w+)\s+(\w+)', input_str):
        count_match += 1

print ("Count Match Value : ", count_compile)          
print("Time taken by Using Match   : ",  time.time() - time_val_2)  


Output:

Count Compile Value : 1000000
Time taken by Using Compile :  1.340999994277954
Count Math Value : 1000000
Time taken by Using Match   :  2.994999885559082  


You might wish to read Regular Expressions Concepts :
Brief on Regular Expressions
Greedy Operators in Regular Expressions in Perl
Modifiers in Regular Expressions in Perl
Capturing concept in Regular Expressions in Perl
Capture Pre Match ,Post Match, Exact match in Regular Expressions in Perl
Non Capturing Paranthesis in Regular Expressions in Perl
Substitute nth occurance in Regular Expressions in Perl
All Topics in Regular Expressions in Perl

You might also wish to read other topics like :
Python Class and Object Example
Inheritance in Python
Packages in Python
Exceptions in Python
How to remove duplicate lines from a file in Perl
How to remove duplicate lines from a file in Pyhton



No comments:

Post a Comment