May 13, 2023

Broadcast Vs Accumulator (RDD)

 

Accumulator Variables

Broadcast Variables

Accumulator can be added to

E.g.,  counter

Read only variable

For Accumulate operations (sum error records)

Just Read only variables ( Arrays)

Does not create shuffling

Does not create shuffling

Stored in Driver (Edge node)

Cached and stored in Memory on executor side

Try it on less size data


val acc1 = sc.longAccumulator("abc")

val bcast = sc.broadcast(Array(0,1,2,3,4))

val rdd = sc.parallelize(Array(1,2,3))

rdd.foreach(x => acc1.add(x))

println(acc1.value)



bcast.value

acc variable can be used/added inside actions only 

e.g., 

foreach() is an action you can use it

map() is not an action, you cannot use it





No comments:

Post a Comment