Saturday, October 11, 2014

Combiners in Hadoop MapReduce

A combiner function in MapReduce has the same form as the reduce function (and is an implementation of the Reducer interface), except its output types are the intermediate key and value types (K2 and V2), so they can feed the reduce function:
combiner: (K2, list(V2)) → list(K2, V2)
Often the combiner and reduce functions are the same, in which case K3 is the same as K2, and V3 is the same as V2. On the other hand, Hadoop reserves the right to use combiners at its discretion. This means that a combiner may be invoked zero, one, or multiple times. Hence, the correctness of a MapReduce algorithm shouldn't depend on computations performed by the combiner or depend on them even being run at all.

No comments:

Post a Comment