@InterfaceAudience.Public @InterfaceStability.Unstable public class MRInputHelpers extends Object
| Constructor and Description | 
|---|
MRInputHelpers()  | 
| Modifier and Type | Method and Description | 
|---|---|
static org.apache.tez.dag.api.DataSourceDescriptor | 
configureMRInputWithLegacySplitGeneration(org.apache.hadoop.conf.Configuration conf,
                                         org.apache.hadoop.fs.Path splitsDir,
                                         boolean useLegacyInput)
Setup split generation on the client, with splits being distributed via the traditional
 MapReduce mechanism of distributing splits via the Distributed Cache. 
 | 
protected static org.apache.tez.dag.api.UserPayload | 
createMRInputPayload(org.apache.hadoop.conf.Configuration conf,
                    MRRuntimeProtos.MRSplitsProto mrSplitsProto)  | 
protected static org.apache.tez.dag.api.UserPayload | 
createMRInputPayloadWithGrouping(org.apache.hadoop.conf.Configuration conf)
Called to specify that grouping of input splits be performed by Tez
 The conf should have the input format class configuration
 set to the TezGroupedSplitsInputFormat. 
 | 
static org.apache.hadoop.mapreduce.InputSplit | 
createNewFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto,
                                   org.apache.hadoop.io.serializer.SerializationFactory serializationFactory)
Create an instance of  
InputSplit from the MRInput representation of a split. | 
static org.apache.hadoop.mapred.InputSplit | 
createOldFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto,
                                   org.apache.hadoop.io.serializer.SerializationFactory serializationFactory)
Create an instance of  
InputSplit from the MRInput representation of a split. | 
static MRRuntimeProtos.MRSplitProto | 
createSplitProto(org.apache.hadoop.mapred.InputSplit oldSplit)  | 
static <T extends org.apache.hadoop.mapreduce.InputSplit>  | 
createSplitProto(T newSplit,
                org.apache.hadoop.io.serializer.SerializationFactory serializationFactory)  | 
static InputSplitInfoMem | 
generateInputSplitsToMem(org.apache.hadoop.conf.Configuration conf,
                        boolean groupSplits,
                        int targetTasks)
Generates Input splits and stores them in a  
MRRuntimeProtos.MRSplitsProto instance. | 
static MRRuntimeProtos.MRInputUserPayloadProto | 
parseMRInputPayload(org.apache.tez.dag.api.UserPayload payload)
Parse the payload used by MRInputPayload 
 | 
@InterfaceStability.Unstable
public static org.apache.tez.dag.api.DataSourceDescriptor configureMRInputWithLegacySplitGeneration(org.apache.hadoop.conf.Configuration conf,
                                                                                                                org.apache.hadoop.fs.Path splitsDir,
                                                                                                                boolean useLegacyInput)
MRInput.MRInputConfigBuilder
 
 Note: Attempting to use this method to add multiple Inputs to a Vertex is not supported.
 This mechanism of propagating splits may be removed in a subsequent release, and is not recommended.conf - configuration to be used by MRInput.
                       This is expected to be fully configured.splitsDir - the path to which splits will be generated.useLegacyInput - whether to use MRInputLegacy or
                       MRInputDataSourceDescriptor which can be added
 as a data source to a Vertex@InterfaceStability.Evolving public static MRRuntimeProtos.MRInputUserPayloadProto parseMRInputPayload(org.apache.tez.dag.api.UserPayload payload) throws IOException
payload - the UserPayload instanceMRRuntimeProtos.MRInputUserPayloadProto,
 which provides access to the underlying configuration bytesIOException@InterfaceStability.Evolving public static org.apache.hadoop.mapred.InputSplit createOldFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto, org.apache.hadoop.io.serializer.SerializationFactory serializationFactory) throws IOException
InputSplit from the MRInput representation of a split.splitProto - The MRRuntimeProtos.MRSplitProto
                             instance representing the splitserializationFactory - the serialization mechanism used to write out the splitIOException@InterfaceStability.Evolving public static org.apache.hadoop.mapreduce.InputSplit createNewFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto, org.apache.hadoop.io.serializer.SerializationFactory serializationFactory) throws IOException
InputSplit from the MRInput representation of a split.splitProto - The MRRuntimeProtos.MRSplitProto
                             instance representing the splitserializationFactory - the serialization mechanism used to write out the splitIOException@InterfaceStability.Evolving public static <T extends org.apache.hadoop.mapreduce.InputSplit> MRRuntimeProtos.MRSplitProto createSplitProto(T newSplit, org.apache.hadoop.io.serializer.SerializationFactory serializationFactory) throws IOException, InterruptedException
IOExceptionInterruptedException@InterfaceStability.Evolving public static MRRuntimeProtos.MRSplitProto createSplitProto(org.apache.hadoop.mapred.InputSplit oldSplit) throws IOException
IOException@InterfaceStability.Unstable public static InputSplitInfoMem generateInputSplitsToMem(org.apache.hadoop.conf.Configuration conf, boolean groupSplits, int targetTasks) throws IOException, ClassNotFoundException, InterruptedException
MRRuntimeProtos.MRSplitsProto instance.
 Returns an instance of InputSplitInfoMem
 With grouping enabled, the eventual configuration used by the tasks, will have
 the user-specified InputFormat replaced by either TezGroupedSplitsInputFormat
 or TezGroupedSplitsInputFormatconf - an instance of Configuration which is used to determine whether
          the mapred of mapreduce API is being used. This Configuration
          instance should also contain adequate information to be able to
          generate splits - like the InputFormat being used and related
          configuration.groupSplits - whether to group the splits or nottargetTasks - the number of target tasks if grouping is enabled. Specify as 0 otherwise.InputSplitInfoMem which supports a subset of
         the APIs defined on InputSplitInfoIOExceptionClassNotFoundExceptionInterruptedException@InterfaceAudience.Private
protected static org.apache.tez.dag.api.UserPayload createMRInputPayloadWithGrouping(org.apache.hadoop.conf.Configuration conf)
                                                                              throws IOException
TezGroupedSplitsInputFormat
 or TezGroupedSplitsInputFormatIOException@InterfaceAudience.Private
protected static org.apache.tez.dag.api.UserPayload createMRInputPayload(org.apache.hadoop.conf.Configuration conf,
                                                                                MRRuntimeProtos.MRSplitsProto mrSplitsProto)
                                                                  throws IOException
IOExceptionCopyright © 2015 Apache Software Foundation. All rights reserved.