Nov 18, 2019

Nohup is not writing log to output file

nohup python long_running_task.py &

Using '-u' with 'nohup' worked for me. Everything will be saved in "nohup.out " file

nohup python -u long_running_task.py &

Nov 16, 2019

Python record linkage


Python fuzzywuzzy

#Ref: https://www.datacamp.com/community/tutorials/fuzzy-string-python

#pip install fuzzywuzzy
#pip install python-Levenshtein

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

print('-------------string matching')
Str1 = "Apple Inc."
Str2 = "apple Inc"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
print(Ratio)            #95

print('-------------substring matching')
Str1 = "Los Angeles Lakers"
Str2 = "Lakers"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
print(Ratio)            #50
print(Partial_Ratio)    #100

print('-------------string different order match - same length')
#They tokenize the strings and preprocess them by turning them to lower case and getting rid of punctuation
Str1 = "united states v. nixon"
Str2 = "Nixon v. United States"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
print(Ratio)            #59
print(Partial_Ratio)    #74
print(Token_Sort_Ratio) #100

print('-------------string different order match - different length')
Str1 = "The supreme court case of Nixon vs The United States"
Str2 = "Nixon v. United States"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
Token_Set_Ratio = fuzz.token_set_ratio(Str1,Str2)
print(Ratio)            #57
print(Partial_Ratio)    #77
print(Token_Sort_Ratio) #58 
print(Token_Set_Ratio)  #95

print('-------------search string in a list of strings with score/ratio')
str2Match = "apple inc"
strOptions = ["Apple Inc.","apple park","apple incorporated","iphone"]
Ratios = process.extract(str2Match,strOptions)
print(Ratios)
#[('Apple Inc.', 100), ('apple incorporated', 90), ('apple park', 67), ('iphone', 40)]
# You can also select the string with the highest matching percentage
highest = process.extractOne(str2Match,strOptions)
print(highest)
#('Apple Inc.', 100)


Nov 14, 2019

ElasticSearch


Python Black


Newman tool Postman


Newman
  • Newman is a command-line collection runner for Postman
  • It allows you to effortlessly run and test a Postman collection directly from the command-line.

Nov 11, 2019

Python AsyncIO

Ref: https://realpython.com/async-io-python/
  • Threading Vs Multi-Processing
    • Threading is better for I/O based tasks
    • Multi-Processing is better for CPU based tasks
    • What’s important to know about threading is that it’s better for IO-bound tasks.
  • Concurrency Vs Parallelism
    • Concurrency is when two tasks can start, run, and complete in overlapping time periods. e.g., Threading, AsyncIO
    • Parallelism is when tasks literally run at the same time, eg. multi-processing. 
  • While a CPU-bound task is characterised by the computer’s cores continually working hard from start to finish, an IO-bound job is dominated by a lot of waiting on input/output to complete.
  • Preemptive multitasking Vs Cooperative multitasking
    • OS preempts a thread forcing it to give up the use of CPU (E.g., Threading)
    • Cooperative multitasking on the other hand, the running process voluntarily gives up the CPU to other processes E.g., (AsyncIO)
  • Coroutine Vs Method/Function/Subroutine
    • Method or Function returns a value and don't remember the state between invocations
    • A coroutine is a special function that can give up control to its caller without losing its state
  • Coroutine Vs Generator
    • Generator yield back value to invoker
    • Coroutine yields control to another coroutine and can resume execution from point it gave the control
    • A generator can't accept arguments once it is started where as a coroutine can accept arguments once it started
  • AsyncIO is a single-threaded, single-process design: it uses cooperative multitasking
  • AsyncIO gives a feeling of concurrency despite using a single thread in a single process
  • Coroutines (a central feature of async IO) can be scheduled concurrently, but they are not inherently concurrent.
  • Asynchronous routines are able to “pause” while waiting on their ultimate result and let other routines run in the meantime.


import asyncio
import time

async def count_func():
    print("Line One")
    await asyncio.sleep(1) # await non-blocking call
    print("Line Two")


async def main():
    await asyncio.gather(count_func(), count_func(), count_func())


if __name__ == "__main__":
    t1 = time.time()
    asyncio.run(main())
    elapsed = time.time() - t1

    #This is supposed to take more than 3 secs
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")


Output:
#####
Line One
Line One
Line One
Line Two
Line Two
Line Two
main.py executed in 1.10 seconds.



Python Disadvantages

Python Disadvantages
  • Its an interpreted language, not as fast as a compiled language
  • Slower than C & C++. Since Python is a high level language unlike C, C++, its not close to hardware 
  • Not good for Gaming, Mobile dev, Desktop UI applications
  • Not good for memory intensive, due to flexibility of data types, python memory consumption is high
  • Python's database access layer is found to be bit underdeveloped and primitive (JDBC/ODBC)
  • GIL
    • Global interpreter lock: can’t run more than one thread using in one interpreter 
    • Creating, managing and tearing down processes (multi-processing, processes are heavier than threads) is more expensive than doing the same for threads. Furthermore, inter-process communication is relatively slower than inter-thread communication. 
    • Both these drawbacks may not make Python a practical technology choice for super-critical or time-sensitive use-cases.
    • Python community are working to remove the GIL from CPython. One such attempt is known as the Gilectomy.
    • GIL exists only in the original Python implementation that is CPython.
    • Python has multiple interpreter implementations. CPython, Jython, IronPython and PyPy, written in C, Java, C# and Python respectively, are the most popular ones.


Python fstrings

elem = 10
elem_index = 5

def to_lowercase(st):
return st.lower()

#Formatted string literals (f-strings)
#The idea behind f-strings is to make string interpolation simpler.
#f-strings are expressions evaluated at runtime rather than constant values
print(f'Index of element {elem} is {elem_index}')

#call functions from f-strings
print(f'String to lowercase: {to_lowercase("TEST_STRING")}')
print(f'String to lowercase: {"TEST_STRING".lower()}')

#Format - used in previous versions of python
print('Index of element {} is {}'.format(elem, elem_index))
elapsed = 23.5678
print(f"{__file__} executed in {elapsed:0.2f} seconds.")


Output:
Index of element 10 is 5 String to lowercase: test_string String to lowercase: test_string Index of element 10 is 5
main.py executed in 23.57 seconds.