Mar 29, 2019

Deploy docker container on google cloud - Docker + Google Cloud + Kubernetes

Ref:



Prerequisites:

  • Setup Google Cloud (gcloud in your system)
  • Have your container ready in Dockerhub


Steps:
  • gcloud container clusters create kubecluster

#This below port 80 - should match the DockerFile (EXPOSE 80)

  • kubectl run kubecluster --image=prabhathkota/test-docker:tag1 --port=80 --image-pull-policy=IfNotPresent

O/P:
deployment.apps "kubecluster" created


#Create a service object that exposes the deployment

  • kubectl expose deployment kubecluster --type="LoadBalancer"

O/P:
service "kubecluster" exposed


  • kubectl get services kubecluster

O/P:
NAME         TYPE             CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubecluster LoadBalancer 10.23.246.XXX 35.244.47.XXX 80:31607/TCP 3m
#Test
curl http://35.244.47.XXX:80




  • Cleanup
kubectl delete services kubecluster
kubectl delete deployment kubecluster
gcloud container clusters delete kubecluster



Mar 28, 2019

Run a docker in Amazon EC2


I created a docker sample in docker hub & tried to deploy in EC2 instance.
It’s super easy & simple.

yum install docker -y
sudo service docker start
docker run -p 4000:80 prabhathkota/test-docker:tag1

Test: 
http://ec2-18-209-XXX-XXX.compute-1.amazonaws.com:4000

Output:
Hello World!
Hostname: f156f8eeb177
Visits: cannot connect to Redis, counter disabled



Mar 25, 2019

Python Pickle/UnPickle Serialize/DeSerialize

#Python pickle module is used for serializing and de-serializing a Python objects.
#Any object in Python can be pickled so that it can be saved on disk.
#Pickling “serializes” the object first before writing it to file.
#Pickling will convert a python object into a character stream.


#Serializing - pickle
import pickle
emp = {"employees":[
    {"name":"Shyam", "email":"shyam@mail.com"},
    {"name":"Bob", "email":"bob32@mail.com"},
    {"name":"Jai", "email":"jai87@mail.com"}
]}
pickling_on = open("Emp.pickle","wb")
pickle.dump(emp, pickling_on)
pickling_on.close()


#De-Serializing - unpickle
pickle_off = open("Emp.pickle","rb")
emp = pickle.load(pickle_off)
print(emp)


#Output
{'employees': [{'name': 'Shyam', 'email': 'shyam@mail.com'}, {'name': 'Bob', 'email': 'bob32@mail.com'}, {'name': 'Jai', 'email': 'jai87@mail.com'}]}

Python Monkey Patching

Monkey Patching:
  • Monkey Patching refers to dynamic (run-time) modifications of a class or module
  • A MonkeyPatch is a piece of Python code which extends or modifies other code at runtime
  • The unittest.mock library makes use of monkey patching to replace part of your code under test by mock objects. 
  • It provides functionality for writing clever unittests


### test1.py 
class A: 
    def func(self): 
        print("func() is being called")


### test2.py
from test1 import A

def monkey_func(self): 
     print("monkey_func() is being called...")
   
# Replacing address of "func" with "monkey_func" 
A.func = monkey_func

if __name__ == '__main__':
    obj = A()
  
    # calling function "func" whose address got replaced with "monkey_func()" 
    obj.func() 


Output:

monkey_func() is being called...

Mar 22, 2019

Switching Java Versions on MacOS

Switching Java Versions on MacOS 

Add these in ~/.profile

alias java9="export JAVA_HOME=`/usr/libexec/java_home -v 9`; java -version"

alias java8="export JAVA_HOME=`/usr/libexec/java_home -v 1.8`; java -version"

alias java7="export JAVA_HOME=`/usr/libexec/java_home -v 1.7`; java -version"

Mar 21, 2019

Git commands

# Git Config

git init

git config user.email test@test.com

git config user.name 'Prabhath Kota'

git clone https://github.com/prabhathkota/Django-user-registration.git


Checkout Branch:

  • git checkout -b test_branch
  • git branch
  • git push --set-upstream origin test_branch

git status -s #check if any modified files

modified:   common_utils.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
../api/log/
misc/test_rl.py

# stash without a name
git stash

# stash 
git stash save <name_of_stash>
stash@{0}: On develop: new_changes

# -u if you want to stash untracked files as well
# git stash save -u <name_of_stash>  # Dont use this

#git stash list
stash@{0}: On develop: new_changes

git status -s

# stash apply
git stash apply stash@{0}

# Takes the latest and apply
git stash pop

# Clear stash
git stash clear

# gitignore




Python Virtual Enviroment


Install python3 on top of python2 (using virtual environment)


python3 -m venv < python3env>
or
pip3 install virtualenv
virtualenv -p /usr/local/bin/python3 python3env


Activate:
    source python3env/bin/activate
    
    #Install modules within the virtual ENV
    pip install requests

    #pip Install requirements.txt, python3
    pip3 install -r requirements.txt

Deactivate:
python3env>> deactivate

Mar 12, 2019

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

Error:
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver


Solution:
Both spark driver and executor need mysql driver on class path so specify
  • spark.driver.extraClassPath = /usr/share/java/mysql-connector-java.jar 
  • spark.executor.extraClassPath = /usr/share/java/mysql-connector-java.jar

#Initialize SparkSession and SparkContext
from pyspark.sql import SparkSession
from pyspark import SparkContext

#Create a Spark Session
SpSession = SparkSession \
    .builder \
    .master("local[2]") \
    .appName("prabhath") \
    .config("spark.executor.memory", "1g") \
    .config("spark.cores.max","2") \
    .config("spark.driver.extraClassPath", "/usr/share/java/mysql-connector-java.jar") \
    .config("spark.executor.extraClassPath", "/usr/share/java/mysql-connector-java.jar") \
    .config("spark.sql.warehouse.dir", "/Users/jlyang/Spark/spark-warehouse")\
    .getOrCreate()

#Get the Spark Context from Spark Session
SpContext = SpSession.sparkContext

demoDf = SpSession.read.format("jdbc").options(
    url="jdbc:mysql://localhost:3306/testpraba1",
    driver = "com.mysql.jdbc.Driver",
    dbtable = "users_userprofile",
    user="root",
    password="XXXXX").load()
demoDf.show()


ERROR MasterWebUI: Failed to bind MasterWebUI

Problem: 
While starting start-master.sh, getting ERROR MasterWebUI: Failed to bind MasterWebUI

Solution:
check SPARK_LOCAL_IP is set to correct IP Address
export SPARK_LOCAL_IP=192.168.254.122 

Error:
Spark Command: /opt/java/jdk1.8.0_201/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host 127.0.0.1 --port 7077 --webui-port 8080
========================================
19/03/12 14:03:50 ERROR MasterWebUI: Failed to bind MasterWebUI
java.net.BindException: Cannot assign requested address: Service 'MasterUI' failed after 16 retries (starting from 8080)! Consider explicitly setting the appropriate port for the service 'MasterUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:433)
    at sun.nio.ch.Net.bind(Net.java:425)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.spark_project.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:351)
    at org.spark_project.jetty.server.ServerConnector.open(ServerConnector.java:319)
    at org.spark_project.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
    at org.spark_project.jetty.server.ServerConnector.doStart(ServerConnector.java:235)
    at org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
    at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$newConnector$1(JettyUtils.scala:353)
    at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$httpConnect$1(JettyUtils.scala:380)
    at org.apache.spark.ui.JettyUtils$$anonfun$7.apply(JettyUtils.scala:383)
    at org.apache.spark.ui.JettyUtils$$anonfun$7.apply(JettyUtils.scala:383)
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2269)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2261)
    at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:383)
    at org.apache.spark.ui.WebUI.bind(WebUI.scala:132)
    at org.apache.spark.deploy.master.Master.onStart(Master.scala:140)
    at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:122)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Mar 9, 2019

python reduce using lambda

#Using reduce get average MGP-CITY (miles per Gallon) from sample car data
from functools import reduce

car_data = ['MAKE,FUELTYPE,ASPIRE,4,BODY,DRIVE,CYLINDERS,HP,RPM,MPG-CITY,MPG-HWY,PRICE',\
 'mercedes-benz,gas,std,2,convertible,RWD,eight,155,4750,16,18,35056', \
 'jaguar,gas,std,4,sedan,RWD,six,176,4750,15,19,35550', \
 'jaguar,gas,std,2,sedan,RWD,twelve,262,5000,13,17,36000', \
 'bmw,gas,std,4,sedan,RWD,six,182,5400,15,20,36880', \
 'porsche,gas,std,2,convertible,RWD,six,207,5900,17,25,37028', \
 'mercedes-benz,gas,std,4,sedan,RWD,eight,184,4500,14,16,40960', \
 'bmw,gas,std,2,sedan,RWD,six,182,5400,16,22,41315', \
 'mercedes-benz,gas,std,2,hardtop,RWD,eight,184,4500,14,16,45400']

#Use a function to perform reduce
def getMPGCity( autoStr) :
    if isinstance(autoStr, int) :
        return autoStr
    attList=autoStr.split(",")
    if attList[9].isdigit() : #this will ignore header
        return int(attList[9])
    else:
        return 0

total_mgp = reduce(lambda x,y : getMPGCity(x) + getMPGCity(y), car_data)
print 'Total MPG: {}'.format(total_mgp)

data_length_without_header = len(car_data)-1.0  # account for header line
print 'Total lines without header: {}'.format(data_length_without_header)

print 'Average MPG: {} '.format(total_mgp/data_length_without_header)

Output:
Total MPG: 120
Total lines without header: 8.0
Average MPG: 15.0



python reduce

from functools import reduce

input_list = [10, 20, 30, 5, 7, 1, 34, 45, 566]

#Find smallest no
print reduce(lambda x,y: x if x < y else y, input_list)  #1

#Find largest no
print reduce(lambda x,y: x if x > y else y, input_list)  #566


Mar 7, 2019

python change conf variables

conf.py
x=10

a.py
import conf
print 'Inside A: %d ' % (conf.x)
conf.x=20  #Here overwriting conf.x
print 'Inside A after changing value to 20 : %d ' % (conf.x)

b.py
import a
import conf
print 'Inside B: %d' % (conf.x)

Output:
python a.py
Inside A: 10
Inside A after changing value to 20 : 20

python b.py
Inside A: 10


Inside A after changing value to 20 : 20
Inside B: 20
#Here you get 20 as output, since you imported a 


When you import using 2 ways:
1) from conf import x 
#this will create local variable scope

2) import conf

conf.x  #refers to original location


a.py
from conf import x   #This creates variable in local scope
print 'Inside A: %d ' % (x)
x=20
print 'Inside A after changing value to 20 : %d ' % (x)

Output: python a.py
Inside A: 10
Inside A after changing value to 20 : 20

python b.py
Inside A: 10
Inside A after changing value to 20 : 20
Inside B: 10


Conclusion:
Always safer to use 
from conf import x
instead of 
import x





python ConfigParser

#file.ini
#[HEADER]
#name = Matt

import ConfigParser
config = ConfigParser.ConfigParser()
config.read('file.ini')
name = config.get('HEADER', 'NAME') #Matt
print name
name = 'Will'
config.set('HEADER', 'NAME', name)
print config.get('HEADER', 'NAME') #Will

# Writing our configuration file to 'example.cfg'
with open('file.ini', 'w') as configfile:
    config.write(configfile)



Mar 6, 2019

python open very big file, python memory


Python open very big file: 

#Say you are looping through a big 2 TB file
logfile = open("huge_log_file.txt","r")
info_lines = [(line,len(line)) for line in logfile if line.startswith("INFO")]
#Here it will get a huge list - costs RAM, this list could contain 2 TB of content

logfile = open("huge_log_file.txt","r")
info_lines = ((line,len(line)) for line in logfile if line.startswith("INFO"))
#Here it will get generator object - memory efficient


Opening a file (read mode) does NOT implicitly read nor load its contents into memory.  
Even when you do so using Python's context management protocol (the with keyword).

e.g.,

with open('huge_log_file.txt', 'r') as f:
   for each_line in f:
     do_something_each_line(each_line)

Then your peak memory utilization shouldn't be much larger than the longest line of the file

If you really are reading the full content of the file into a data structure like list, then it's no wonder that your RAM usage peaks like that. 
It's not that python puts the full contents of the file in RAM, but that you do.
e.g.,

#Here it will load all into memory as you are storing/dumping into a list called info_lines
info_lines = [(line,len(line)) for line in logfile if line.startswith("INFO")] 


#Memory efficinet - since it return generator
info_lines = ((line,len(line)) for line in logfile if line.startswith("INFO"))



file = '/tmp/huge_log_file.txt'

#This is fine
with open(file, 'r') as fh:
   for each in fh:
       print each

#this is a blunder, this will crash, it loads every thing into memory
#with open(file, 'r') as fh:
#    lines = fh.readlines()
#    print len(lines)