public final class DataSkewHashPartitioner extends Object implements Partitioner
Partitioner
which hashes output data from a source task appropriate to detect data skew.
It hashes data finer than HashPartitioner
.
The elements will be hashed by their key, and applied "modulo" operation.
When we need to split or recombine the output data from a task after it is stored,
we multiply the hash range with a multiplier, which is commonly-known by the source and destination tasks,
to prevent the extra deserialize - rehash - serialize process.
For more information, please check JobConf.HashRangeMultiplier
.Constructor and Description |
---|
DataSkewHashPartitioner(int hashRangeMultiplier) |
Modifier and Type | Method and Description |
---|---|
List<Partition> |
partition(Iterable elements,
int dstParallelism,
KeyExtractor keyExtractor)
Divides the output data from a task into multiple blocks.
|
public List<Partition> partition(Iterable elements, int dstParallelism, KeyExtractor keyExtractor)
Partitioner
partition
in interface Partitioner
elements
- the output data from a source task.dstParallelism
- the number of destination tasks.keyExtractor
- extracts keys from elements.Copyright © 2018. All rights reserved.