Feb 19, 2019

How to install pyspark in centos

Install spark ref:
http://devopspy.com/python/apache-spark-pyspark-centos-rhel/

cd /opt
wget http://www-eu.apache.org/dist/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
ln -s spark-2.4.0-bin-hadoop2.7 spark
check /etc/hosts

How to set path?
export SPARK_HOME = /opt/spark
export PATH = $PATH:/opt/spark
export export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$SPARK_HOME/python/lib/pyspark.zip:$PYTHONPATH
export PATH = $SPARK_HOME/python:$PATH

How to start master?
./sbin/start-master.sh

    1) If you get error like blow:
"hostname: Unknown host" start-master.sh
set the hostname properly
hostname test.com
hostname -f #should give you some output

    2) If you get error like below:
Getting "Unsupported major.minor version 52.0" exception while using  
        Spark Web Application framework
check Java version of jar files (/opt/spark/jars) and your installed java

How to start spark master?
cd /opt/spark
./sbin/start-master.sh
This internally runs command like below:
Spark Command: /opt/java/jdk1.8.0_201/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host test.com --port 7077 --webui-port 8080

How to access from web?
test.com:8080 (port: 8080)

How to start spark shell?
cd /opt/spark
./bin/pyspark
FYI, This internally runs command like below:
/opt/java/jdk1.8.0_201/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --name PySparkShell pyspark-shell

How to access the spark process in ps commands?
ps -ef | grep spark
      e.g.,root     13770     1  0 14:18 pts/0    00:00:10 /opt/java/jdk1.8.0_201/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host test.com --port 7077 --webui-port 8080

PIP modules to install
pip install py4j

How to access in web?
http://localhost:8080



No comments:

Post a Comment