@InterfaceAudience.Public
@InterfaceStability.Unstable
public class TezBytesWritableSerialization
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>
When using BytesWritable, data is serialized in memory (4 bytes per key and 4 bytes per value) and written to IFile where it gets serialized again (4 bytes per key and 4 bytes per value). This adds an overhead of 8 bytes per key value pair written. This class reduces this overhead by providing a fast serializer/deserializer to speed up inner loop of sort, spill, merge. Usage e.g: OrderedPartitionedKVEdgeConfig edgeConf = OrderedPartitionedKVEdgeConfig .newBuilder(keyClass, valClass, MRPartitioner.class.getName(), partitionerConf) .setFromConfiguration(conf) .setKeySerializationClass(TezBytesWritableSerialization.class.getName(), TezBytesComparator.class.getName()).build())
Constructor and Description |
---|
TezBytesWritableSerialization() |
Modifier and Type | Method and Description |
---|---|
boolean |
accept(Class<?> c) |
org.apache.hadoop.io.serializer.Deserializer<org.apache.hadoop.io.Writable> |
getDeserializer(Class<org.apache.hadoop.io.Writable> c) |
org.apache.hadoop.io.serializer.Serializer<org.apache.hadoop.io.Writable> |
getSerializer(Class<org.apache.hadoop.io.Writable> c) |
public boolean accept(Class<?> c)
accept
in interface org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>
public org.apache.hadoop.io.serializer.Serializer<org.apache.hadoop.io.Writable> getSerializer(Class<org.apache.hadoop.io.Writable> c)
getSerializer
in interface org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>
public org.apache.hadoop.io.serializer.Deserializer<org.apache.hadoop.io.Writable> getDeserializer(Class<org.apache.hadoop.io.Writable> c)
getDeserializer
in interface org.apache.hadoop.io.serializer.Serialization<org.apache.hadoop.io.Writable>
Copyright © 2024 Apache Software Foundation. All rights reserved.