May 13, 2023

Spark Shuffling & Combiner

 


Avoid shuffling at any cost, if shuffling is necessary, use combiner

  • groupByKey does not use combiner, it should be avoided (not optimized)

    • Amount of data sent over n/w is huge

  • reduceByKey uses combiner (map side combine) - it is optimized

    • Amount of data sent over n/w is less


Combiner:
  • It computes intermediate values for each partition to avoid shuffling


Shuffling Vs Combiner


No comments:

Post a Comment