key value

  • parentid [ { childid1, child_id2 }]

flatMap(_.children): repartition occurs

but if we know that all childid1 fall into the same partition as the parentid (AA => AA1, AA2 and a custom algo)

right now: groupByKey().aggregate() (repartition!) better, custom: transformValues() + custom aggregate ourselves in store (no repar)

better: groupByKeyNoRepar() // reuse same partitions

Ready to work with me?

Tell me everything!
© Copyright 2018 · Stéphane Derosiaux · All Rights Reserved.