Mar 27, 2020

Python multiprocessing fork

Fork
# When we fork, the entire Python process is duplicated in memory including the Python interpreter, code, libraries, current stack, etc.
# This creates a new copy of the python interpreter.
# Fork creates two python interpreters each with its own GIL.
# Fork is faster than Spawn (Fork child inherit all resources from the parent process, Spawn re-imports all above main() method)
# Fork is the default method for multi-processing
 
#Disadvantages of Fork
# It won't work on windows
# When child shares parent libraries, values, data-structures, if a lock acquired by parent, child ends up waiting for that lock ever
# Very hard to debug when you import a third-party module/library that uses threads behind the scenes
# Fork and Multi-threading won't go well


from multiprocessing import Process
import multiprocessing
import os

file_desc = None

def process_task1():
    # write to the file in child process
    file_desc.write(f"\nWritten by child process with id {os.getpid()}")
    file_desc.flush()


if __name__ == '__main__':
    # create a file in the parent process
    file_desc = open("sample.txt", "w")
    file_desc.write(f"\nWritten by parent process with id {os.getpid()}")
    file_desc.flush()

    # Fork is default method to create a process
    # multiprocessing.set_start_method('fork')

    p = Process(target=process_task1)
    p.start()
    p.join()
    file_desc.close()

    file_des = open("sample.txt", "r")
    print(file_des.read())

    os.remove("sample.txt")


Output:
Written by parent process with id 288
Written by child process with id 294


No comments:

Post a Comment