Prabhath Kota: March 2019

Mar 29, 2019

Deploy docker container on google cloud - Docker + Google Cloud + Kubernetes

Ref:

https://www.youtube.com/watch?v=xJL-guFRZHo&list=LLdQeOBlmn5GgEL9YH_68KPA&index=2&t=1s

Prerequisites:

Setup Google Cloud (gcloud in your system)
Have your container ready in Dockerhub

Steps:

gcloud container clusters create kubecluster

#This below port 80 - should match the DockerFile (EXPOSE 80)

kubectl run kubecluster --image=prabhathkota/test-docker:tag1 --port=80 --image-pull-policy=IfNotPresent

O/P:

deployment.apps "kubecluster" created

#Create a service object that exposes the deployment

kubectl expose deployment kubecluster --type="LoadBalancer"

O/P:
service "kubecluster" exposed

kubectl get services kubecluster

O/P:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubecluster LoadBalancer 10.23.246.XXX 35.244.47.XXX 80:31607/TCP 3m

#Test
curl http://35.244.47.XXX:80

Cleanup

kubectl delete services kubecluster
kubectl delete deployment kubecluster
gcloud container clusters delete kubecluster

Mar 28, 2019

Run a docker in Amazon EC2

I created a docker sample in docker hub & tried to deploy in EC2 instance.

It’s super easy & simple.

yum install docker -y

sudo service docker start

docker run -p 4000:80 prabhathkota/test-docker:tag1

Test:

http://ec2-18-209-XXX-XXX.compute-1.amazonaws.com:4000

Output:

Hello World!

Hostname: f156f8eeb177

Visits: cannot connect to Redis, counter disabled

Mar 25, 2019

Python Pickle/UnPickle Serialize/DeSerialize

#Python pickle module is used for serializing and de-serializing a Python objects.
#Any object in Python can be pickled so that it can be saved on disk.
#Pickling “serializes” the object first before writing it to file.
#Pickling will convert a python object into a character stream.

#Serializing - pickle
import pickle
emp = {"employees":[
{"name":"Shyam", "email":"shyam@mail.com"},
{"name":"Bob", "email":"bob32@mail.com"},
{"name":"Jai", "email":"jai87@mail.com"}
]}
pickling_on = open("Emp.pickle","wb")
pickle.dump(emp, pickling_on)
pickling_on.close()

#De-Serializing - unpickle
pickle_off = open("Emp.pickle","rb")
emp = pickle.load(pickle_off)
print(emp)

#Output
{'employees': [{'name': 'Shyam', 'email': 'shyam@mail.com'}, {'name': 'Bob', 'email': 'bob32@mail.com'}, {'name': 'Jai', 'email': 'jai87@mail.com'}]}

Python Monkey Patching

Monkey Patching:

Monkey Patching refers to dynamic (run-time) modifications of a class or module
A MonkeyPatch is a piece of Python code which extends or modifies other code at runtime
The unittest.mock library makes use of monkey patching to replace part of your code under test by mock objects.
It provides functionality for writing clever unittests

### test1.py

class A:

def func(self):

print("func() is being called")

### test2.py

from test1 import A

def monkey_func(self):

print("monkey_func() is being called...")

# Replacing address of "func" with "monkey_func"

A.func = monkey_func

if __name__ == '__main__':

obj = A()

# calling function "func" whose address got replaced with "monkey_func()"

obj.func()

Output:

monkey_func() is being called...

Mar 22, 2019

Switching Java Versions on MacOS

Switching Java Versions on MacOS

Add these in ~/.profile


alias java9="export JAVA_HOME=`/usr/libexec/java_home -v 9`; java -version"

alias java8="export JAVA_HOME=`/usr/libexec/java_home -v 1.8`; java -version"

alias java7="export JAVA_HOME=`/usr/libexec/java_home -v 1.7`; java -version"

Mar 21, 2019

Git commands

# Git Config

git init

git config user.email test@test.com

git config user.name 'Prabhath Kota'

git clone https://github.com/prabhathkota/Django-user-registration.git

Checkout Branch:

git checkout -b test_branch

git branch

git push --set-upstream origin test_branch

git status -s #check if any modified files

modified: common_utils.py

Untracked files:
(use "git add <file>..." to include in what will be committed)
../api/log/
misc/test_rl.py

# stash without a name
git stash

# stash
git stash save <name_of_stash>
stash@{0}: On develop: new_changes

# -u if you want to stash untracked files as well
# git stash save -u <name_of_stash> # Dont use this

#git stash list
stash@{0}: On develop: new_changes

git status -s

# stash apply
git stash apply stash@{0}

# Takes the latest and apply
git stash pop

# Clear stash
git stash clear

# gitignore

Python Virtual Enviroment

Install python3 on top of python2 (using virtual environment)

python3 -m venv < python3env>

pip3 install virtualenv

virtualenv -p /usr/local/bin/python3 python3env

Activate:

source python3env/bin/activate

#Install modules within the virtual ENV

pip install requests

#pip Install requirements.txt, python3

pip3 install -r requirements.txt

Deactivate:

python3env>> deactivate

Mar 12, 2019

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

Error:

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

Solution:
Both spark driver and executor need mysql driver on class path so specify

spark.driver.extraClassPath = /usr/share/java/mysql-connector-java.jar
spark.executor.extraClassPath = /usr/share/java/mysql-connector-java.jar

#Initialize SparkSession and SparkContext

from pyspark.sql import SparkSession

from pyspark import SparkContext

#Create a Spark Session

SpSession = SparkSession \

.builder \

.master("local[2]") \

.appName("prabhath") \

.config("spark.executor.memory", "1g") \

.config("spark.cores.max","2") \

.config("spark.driver.extraClassPath", "/usr/share/java/mysql-connector-java.jar") \

.config("spark.executor.extraClassPath", "/usr/share/java/mysql-connector-java.jar") \

.config("spark.sql.warehouse.dir", "/Users/jlyang/Spark/spark-warehouse")\

.getOrCreate()

#Get the Spark Context from Spark Session

SpContext = SpSession.sparkContext

demoDf = SpSession.read.format("jdbc").options(

url="jdbc:mysql://localhost:3306/testpraba1",

driver = "com.mysql.jdbc.Driver",

dbtable = "users_userprofile",

user="root",

password="XXXXX").load()

demoDf.show()

ERROR MasterWebUI: Failed to bind MasterWebUI

Problem:
While starting start-master.sh, getting ERROR MasterWebUI: Failed to bind MasterWebUI

Solution:
check SPARK_LOCAL_IP is set to correct IP Address
export SPARK_LOCAL_IP=192.168.254.122

Error:
Spark Command: /opt/java/jdk1.8.0_201/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host 127.0.0.1 --port 7077 --webui-port 8080
========================================
19/03/12 14:03:50 ERROR MasterWebUI: Failed to bind MasterWebUI
java.net.BindException: Cannot assign requested address: Service 'MasterUI' failed after 16 retries (starting from 8080)! Consider explicitly setting the appropriate port for the service 'MasterUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.spark_project.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:351)
at org.spark_project.jetty.server.ServerConnector.open(ServerConnector.java:319)
at org.spark_project.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.spark_project.jetty.server.ServerConnector.doStart(ServerConnector.java:235)
at org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$newConnector$1(JettyUtils.scala:353)
at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$httpConnect$1(JettyUtils.scala:380)
at org.apache.spark.ui.JettyUtils$$anonfun$7.apply(JettyUtils.scala:383)
at org.apache.spark.ui.JettyUtils$$anonfun$7.apply(JettyUtils.scala:383)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2269)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2261)
at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:383)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:132)
at org.apache.spark.deploy.master.Master.onStart(Master.scala:140)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:122)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Mar 9, 2019

python reduce using lambda

#Using reduce get average MGP-CITY (miles per Gallon) from sample car data
from functools import reduce

car_data = ['MAKE,FUELTYPE,ASPIRE,4,BODY,DRIVE,CYLINDERS,HP,RPM,MPG-CITY,MPG-HWY,PRICE',\
'mercedes-benz,gas,std,2,convertible,RWD,eight,155,4750,16,18,35056', \
'jaguar,gas,std,4,sedan,RWD,six,176,4750,15,19,35550', \
'jaguar,gas,std,2,sedan,RWD,twelve,262,5000,13,17,36000', \
'bmw,gas,std,4,sedan,RWD,six,182,5400,15,20,36880', \
'porsche,gas,std,2,convertible,RWD,six,207,5900,17,25,37028', \
'mercedes-benz,gas,std,4,sedan,RWD,eight,184,4500,14,16,40960', \
'bmw,gas,std,2,sedan,RWD,six,182,5400,16,22,41315', \
'mercedes-benz,gas,std,2,hardtop,RWD,eight,184,4500,14,16,45400']

#Use a function to perform reduce
def getMPGCity( autoStr) :
if isinstance(autoStr, int) :
return autoStr
attList=autoStr.split(",")
if attList[9].isdigit() : #this will ignore header
return int(attList[9])
else:
return 0

total_mgp = reduce(lambda x,y : getMPGCity(x) + getMPGCity(y), car_data)
print 'Total MPG: {}'.format(total_mgp)

data_length_without_header = len(car_data)-1.0 # account for header line
print 'Total lines without header: {}'.format(data_length_without_header)

print 'Average MPG: {} '.format(total_mgp/data_length_without_header)

Output:
Total MPG: 120
Total lines without header: 8.0
Average MPG: 15.0

python reduce

from functools import reduce

input_list = [10, 20, 30, 5, 7, 1, 34, 45, 566]

#Find smallest no
print reduce(lambda x,y: x if x < y else y, input_list) #1

#Find largest no
print reduce(lambda x,y: x if x > y else y, input_list) #566

Mar 7, 2019

python change conf variables

conf.py
x=10

a.py
import conf
print 'Inside A: %d ' % (conf.x)
conf.x=20 #Here overwriting conf.x
print 'Inside A after changing value to 20 : %d ' % (conf.x)

b.py
import a
import conf
print 'Inside B: %d' % (conf.x)

Output:
python a.py
Inside A: 10
Inside A after changing value to 20 : 20

python b.py
Inside A: 10

Inside A after changing value to 20 : 20
Inside B: 20
#Here you get 20 as output, since you imported a

When you import using 2 ways:
1) from conf import x
#this will create local variable scope

2) import conf
conf.x #refers to original location

a.py
from conf import x #This creates variable in local scope
print 'Inside A: %d ' % (x)
x=20
print 'Inside A after changing value to 20 : %d ' % (x)

Output: python a.py
Inside A: 10
Inside A after changing value to 20 : 20

python b.py
Inside A: 10
Inside A after changing value to 20 : 20
Inside B: 10

Conclusion:
Always safer to use
from conf import x
instead of
import x

python ConfigParser

#file.ini
#[HEADER]
#name = Matt

import ConfigParser
config = ConfigParser.ConfigParser()
config.read('file.ini')
name = config.get('HEADER', 'NAME') #Matt
print name
name = 'Will'
config.set('HEADER', 'NAME', name)
print config.get('HEADER', 'NAME') #Will

# Writing our configuration file to 'example.cfg'
with open('file.ini', 'w') as configfile:
config.write(configfile)

Mar 6, 2019

python open very big file, python memory

Python open very big file:

#Say you are looping through a big 2 TB file

logfile = open("huge_log_file.txt","r")
info_lines = [(line,len(line)) for line in logfile if line.startswith("INFO")]
#Here it will get a huge list - costs RAM, this list could contain 2 TB of content

logfile = open("huge_log_file.txt","r")
info_lines = ((line,len(line)) for line in logfile if line.startswith("INFO"))
#Here it will get generator object - memory efficient

Opening a file (read mode) does NOT implicitly read nor load its contents into memory.
Even when you do so using Python's context management protocol (the with keyword).

e.g.,

with open('huge_log_file.txt', 'r') as f:
for each_line in f:
do_something_each_line(each_line)

Then your peak memory utilization shouldn't be much larger than the longest line of the file

If you really are reading the full content of the file into a data structure like list, then it's no wonder that your RAM usage peaks like that.
It's not that python puts the full contents of the file in RAM, but that you do.
e.g.,

#Here it will load all into memory as you are storing/dumping into a list called info_lines
info_lines = [(line,len(line)) for line in logfile if line.startswith("INFO")]

#Memory efficinet - since it return generator
info_lines = ((line,len(line)) for line in logfile if line.startswith("INFO"))

file = '/tmp/huge_log_file.txt'

#This is fine
with open(file, 'r') as fh:
for each in fh:
print each

#this is a blunder, this will crash, it loads every thing into memory
#with open(file, 'r') as fh:
# lines = fh.readlines()
# print len(lines)