Random Walk on LargeScale Graphs with Spark (LinedIn)

random walks vertex representation.png

Here layered lookup table used: offset + length, and this can totally be transformed as a design question.

Start with part of walks, not all of them.

Spark’s zipPartition operator efficiently leverages the routing table.

And p% of the random walkers in each stage helps on co-locate results and reduce shuffle.

Reference: