@InterfaceAudience.Public @InterfaceStability.Unstable public class MRInputHelpers extends Object
Constructor and Description |
---|
MRInputHelpers() |
Modifier and Type | Method and Description |
---|---|
static org.apache.tez.dag.api.DataSourceDescriptor |
configureMRInputWithLegacySplitGeneration(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path splitsDir,
boolean useLegacyInput)
Setup split generation on the client, with splits being distributed via the traditional
MapReduce mechanism of distributing splits via the Distributed Cache.
|
protected static org.apache.tez.dag.api.UserPayload |
createMRInputPayload(org.apache.hadoop.conf.Configuration conf,
MRRuntimeProtos.MRSplitsProto mrSplitsProto) |
protected static org.apache.tez.dag.api.UserPayload |
createMRInputPayloadWithGrouping(org.apache.hadoop.conf.Configuration conf)
Called to specify that grouping of input splits be performed by Tez
The conf should have the input format class configuration
set to the TezGroupedSplitsInputFormat.
|
static org.apache.hadoop.mapreduce.InputSplit |
createNewFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto,
org.apache.hadoop.io.serializer.SerializationFactory serializationFactory)
Create an instance of
InputSplit from the MRInput representation of a split. |
static org.apache.hadoop.mapred.InputSplit |
createOldFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto,
org.apache.hadoop.io.serializer.SerializationFactory serializationFactory)
Create an instance of
InputSplit from the MRInput representation of a split. |
static MRRuntimeProtos.MRSplitProto |
createSplitProto(org.apache.hadoop.mapred.InputSplit oldSplit) |
static <T extends org.apache.hadoop.mapreduce.InputSplit> |
createSplitProto(T newSplit,
org.apache.hadoop.io.serializer.SerializationFactory serializationFactory) |
static InputSplitInfoMem |
generateInputSplitsToMem(org.apache.hadoop.conf.Configuration conf,
boolean groupSplits,
int targetTasks)
Generates Input splits and stores them in a
MRRuntimeProtos.MRSplitsProto instance. |
static String |
getApplicationIdString(org.apache.hadoop.conf.Configuration conf) |
static int |
getDagAttemptNumber(org.apache.hadoop.conf.Configuration conf) |
static int |
getDagIndex(org.apache.hadoop.conf.Configuration conf) |
static String |
getDagName(org.apache.hadoop.conf.Configuration conf) |
static int |
getInputIndex(org.apache.hadoop.conf.Configuration conf) |
static String |
getInputName(org.apache.hadoop.conf.Configuration conf) |
static int |
getTaskAttemptIndex(org.apache.hadoop.conf.Configuration conf) |
static int |
getTaskIndex(org.apache.hadoop.conf.Configuration conf) |
static String |
getUniqueIdentifier(org.apache.hadoop.conf.Configuration conf) |
static int |
getVertexIndex(org.apache.hadoop.conf.Configuration conf)
* @see
TaskContext.getTaskVertexIndex() |
static String |
getVertexName(org.apache.hadoop.conf.Configuration conf) |
static MRRuntimeProtos.MRInputUserPayloadProto |
parseMRInputPayload(org.apache.tez.dag.api.UserPayload payload)
Parse the payload used by MRInputPayload
|
@InterfaceStability.Unstable public static org.apache.tez.dag.api.DataSourceDescriptor configureMRInputWithLegacySplitGeneration(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path splitsDir, boolean useLegacyInput)
MRInput.MRInputConfigBuilder
Note: Attempting to use this method to add multiple Inputs to a Vertex is not supported.
This mechanism of propagating splits may be removed in a subsequent release, and is not recommended.conf
- configuration to be used by MRInput
.
This is expected to be fully configured.splitsDir
- the path to which splits will be generated.useLegacyInput
- whether to use MRInputLegacy
or
MRInput
DataSourceDescriptor
which can be added
as a data source to a Vertex
@InterfaceStability.Evolving public static MRRuntimeProtos.MRInputUserPayloadProto parseMRInputPayload(org.apache.tez.dag.api.UserPayload payload) throws IOException
payload
- the UserPayload
instanceMRRuntimeProtos.MRInputUserPayloadProto
,
which provides access to the underlying configuration bytesIOException
@InterfaceStability.Evolving public static org.apache.hadoop.mapred.InputSplit createOldFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto, org.apache.hadoop.io.serializer.SerializationFactory serializationFactory) throws IOException
InputSplit
from the MRInput
representation of a split.splitProto
- The MRRuntimeProtos.MRSplitProto
instance representing the splitserializationFactory
- the serialization mechanism used to write out the splitIOException
@InterfaceStability.Evolving public static org.apache.hadoop.mapreduce.InputSplit createNewFormatSplitFromUserPayload(MRRuntimeProtos.MRSplitProto splitProto, org.apache.hadoop.io.serializer.SerializationFactory serializationFactory) throws IOException
InputSplit
from the MRInput
representation of a split.splitProto
- The MRRuntimeProtos.MRSplitProto
instance representing the splitserializationFactory
- the serialization mechanism used to write out the splitIOException
@InterfaceStability.Evolving public static <T extends org.apache.hadoop.mapreduce.InputSplit> MRRuntimeProtos.MRSplitProto createSplitProto(T newSplit, org.apache.hadoop.io.serializer.SerializationFactory serializationFactory) throws IOException, InterruptedException
IOException
InterruptedException
@InterfaceStability.Evolving public static MRRuntimeProtos.MRSplitProto createSplitProto(org.apache.hadoop.mapred.InputSplit oldSplit) throws IOException
IOException
@InterfaceStability.Unstable public static InputSplitInfoMem generateInputSplitsToMem(org.apache.hadoop.conf.Configuration conf, boolean groupSplits, int targetTasks) throws IOException, ClassNotFoundException, InterruptedException
MRRuntimeProtos.MRSplitsProto
instance.
Returns an instance of InputSplitInfoMem
With grouping enabled, the eventual configuration used by the tasks, will have
the user-specified InputFormat replaced by either TezGroupedSplitsInputFormat
or TezGroupedSplitsInputFormat
conf
- an instance of Configuration which is used to determine whether
the mapred of mapreduce API is being used. This Configuration
instance should also contain adequate information to be able to
generate splits - like the InputFormat being used and related
configuration.groupSplits
- whether to group the splits or nottargetTasks
- the number of target tasks if grouping is enabled. Specify as 0 otherwise.InputSplitInfoMem
which supports a subset of
the APIs defined on InputSplitInfo
IOException
ClassNotFoundException
InterruptedException
@InterfaceAudience.Private protected static org.apache.tez.dag.api.UserPayload createMRInputPayloadWithGrouping(org.apache.hadoop.conf.Configuration conf) throws IOException
TezGroupedSplitsInputFormat
or TezGroupedSplitsInputFormat
IOException
@InterfaceAudience.Private protected static org.apache.tez.dag.api.UserPayload createMRInputPayload(org.apache.hadoop.conf.Configuration conf, MRRuntimeProtos.MRSplitsProto mrSplitsProto) throws IOException
IOException
@InterfaceAudience.Public public static int getDagIndex(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getDagIdentifier}
@InterfaceAudience.Public public static int getVertexIndex(org.apache.hadoop.conf.Configuration conf)
TaskContext.getTaskVertexIndex()
conf
- configuration instance@InterfaceAudience.Public public static int getTaskIndex(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getTaskIndex}
@InterfaceAudience.Public public static int getTaskAttemptIndex(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getTaskAttemptNumber}
@InterfaceAudience.Public public static int getInputIndex(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getInputIndex}
@InterfaceAudience.Public public static String getDagName(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getDAGName}
@InterfaceAudience.Public public static String getVertexName(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getTaskVertexName}
@InterfaceAudience.Public public static String getInputName(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getSourceVertexName}
@InterfaceAudience.Public public static String getApplicationIdString(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getApplicationId}
@InterfaceAudience.Public public static String getUniqueIdentifier(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getUniqueIdentifier}
@InterfaceAudience.Public public static int getDagAttemptNumber(org.apache.hadoop.conf.Configuration conf)
conf
- configuration instanceInputContext#getDAGAttemptNumber}
Copyright © 2016 Apache Software Foundation. All rights reserved.