Prabhath Kota: Spark Intro

May 18, 2020

Spark Intro

Spark

Apache is unified analytics engine and large-scale data processing
Latest version 2.4.5 Feb 2020
Speed

Apache Spark achieves high performance for both batch and streaming using state of the art DAG scheduler, query optimizer and physical execution engine
Runs 100X times faster than Hadoop

Ease of use

Write applications quickly in Java, Scala, Python, R and SQL

Generality

Spark SQL
Spark Streaming
MLib
GraphX

Runs every where

Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources.
You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes.
Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)