Spark Streaming | Structured Streaming |
Spark 1.X | Introduced in Spark 2.X |
Separate library in Spark to process continuously flowing Streaming data | Built on Spark SQL library |
Uses DStreams API powered by Spark RDDs. It works on micro batches (each batch represent RDDs) | This model is based on Dataframe and Dataset APIs. No batch concept here. |
DStrams provide us data divided into chunks as RDDs received from source of Streaming to be processed and outputs batches of processed data | Here we keep adding stream data to DataFrame (Unbounded table) |
Not easy to apply | We can easily apply SQL query or scala operations on streaming data |
| Result of Unbounded table/dataframe is based on mode of your operations Complete, Append, Update |
RDD | Dataframe/Dataset are more optimized & less time consuming, easy to understand. Apply aggregations |
No such option called event-time, only works with timestamp when the data is received. Based on the ingestion timestamp, Spark Streaming puts the data in a batch even if the event is generated early and belonged to the earlier batch, which may result in less accurate information as it is equal to the data loss | (Windowing) With event-time handling of late data, Structured Streaming outweighs Spark Streaming. |
No comments:
Post a Comment