@InterfaceAudience.Public
@InterfaceStability.Unstable
public class TezBytesWritableSerialization
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>
When using BytesWritable, data is serialized in memory (4 bytes per key and 4 bytes per value)
and written to IFile where it gets serialized again (4 bytes per key and 4 bytes per value).
This adds an overhead of 8 bytes per key value pair written. This class reduces this overhead
by providing a fast serializer/deserializer to speed up inner loop of sort,
spill, merge.
Usage e.g:
OrderedPartitionedKVEdgeConfig edgeConf = OrderedPartitionedKVEdgeConfig
.newBuilder(keyClass, valClass, MRPartitioner.class.getName(), partitionerConf)
.setFromConfiguration(conf)
.setKeySerializationClass(TezBytesWritableSerialization.class.getName(),
TezBytesComparator.class.getName()).build())
| Constructor and Description |
|---|
TezBytesWritableSerialization() |
| Modifier and Type | Method and Description |
|---|---|
boolean |
accept(Class<?> c) |
org.apache.hadoop.io.serializer.Deserializer<org.apache.hadoop.io.Writable> |
getDeserializer(Class<org.apache.hadoop.io.Writable> c) |
org.apache.hadoop.io.serializer.Serializer<org.apache.hadoop.io.Writable> |
getSerializer(Class<org.apache.hadoop.io.Writable> c) |
public boolean accept(Class<?> c)
accept in interface org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>public org.apache.hadoop.io.serializer.Serializer<org.apache.hadoop.io.Writable> getSerializer(Class<org.apache.hadoop.io.Writable> c)
getSerializer in interface org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>public org.apache.hadoop.io.serializer.Deserializer<org.apache.hadoop.io.Writable> getDeserializer(Class<org.apache.hadoop.io.Writable> c)
getDeserializer in interface org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>Copyright © 2015 Apache Software Foundation. All rights reserved.