public final class DataSkewHashPartitioner extends Object implements Partitioner
Partitioner which hashes output data from a source task appropriate to detect data skew.
It hashes data finer than HashPartitioner.
The elements will be hashed by their key, and applied "modulo" operation.
When we need to split or recombine the output data from a task after it is stored,
we multiply the hash range with a multiplier, which is commonly-known by the source and destination tasks,
to prevent the extra deserialize - rehash - serialize process.
For more information, please check JobConf.HashRangeMultiplier.| Constructor and Description |
|---|
DataSkewHashPartitioner(int hashRangeMultiplier) |
| Modifier and Type | Method and Description |
|---|---|
List<Partition> |
partition(Iterable elements,
int dstParallelism,
KeyExtractor keyExtractor)
Divides the output data from a task into multiple blocks.
|
public List<Partition> partition(Iterable elements, int dstParallelism, KeyExtractor keyExtractor)
Partitionerpartition in interface Partitionerelements - the output data from a source task.dstParallelism - the number of destination tasks.keyExtractor - extracts keys from elements.Copyright © 2018. All rights reserved.