@InterfaceAudience.Public public class DataSourceDescriptor extends Object
Modifier and Type | Method and Description |
---|---|
DataSourceDescriptor |
addURIsForCredentials(Collection<URI> uris)
This method can be used to specify a list of URIs for which Credentials
need to be obtained so that the job can run.
|
static DataSourceDescriptor |
create(InputDescriptor inputDescriptor,
InputInitializerDescriptor initializerDescriptor,
org.apache.hadoop.security.Credentials credentials)
Create a
DataSourceDescriptor when the data shard calculation
happens in the App Master at runtime |
static DataSourceDescriptor |
create(InputDescriptor inputDescriptor,
InputInitializerDescriptor initializerDescriptor,
int numShards,
org.apache.hadoop.security.Credentials credentials,
VertexLocationHint locationHint,
Map<String,org.apache.hadoop.yarn.api.records.LocalResource> additionalLocalFiles)
Create a
DataSourceDescriptor when the data shard calculation
happens in the client at compile time |
InputDescriptor |
getInputDescriptor()
Get the
InputDescriptor for this DataSourceDescriptor |
InputInitializerDescriptor |
getInputInitializerDescriptor()
Get the
InputInitializerDescriptor for this DataSourceDescriptor |
Collection<URI> |
getURIsForCredentials()
Get the URIs for which credentials will be obtained
|
public static DataSourceDescriptor create(InputDescriptor inputDescriptor, @Nullable InputInitializerDescriptor initializerDescriptor, @Nullable org.apache.hadoop.security.Credentials credentials)
DataSourceDescriptor
when the data shard calculation
happens in the App Master at runtimeinputDescriptor
- An InputDescriptor
for the Inputcredentials
- Credentials needed to access the datainitializerDescriptor
- An initializer for this Input which may run within the AM. This
can be used to set the parallelism for this vertex and generate
InputDataInformationEvent
s for the actual Input.
If this is not specified, the parallelism must be set for the
vertex. In addition, the Input should know how to access data for
each of it's tasks. If a InputInitializer
is
meant to determine the parallelism of the vertex, the initial
vertex parallelism should be set to -1. Can be null.public static DataSourceDescriptor create(InputDescriptor inputDescriptor, @Nullable InputInitializerDescriptor initializerDescriptor, int numShards, @Nullable org.apache.hadoop.security.Credentials credentials, @Nullable VertexLocationHint locationHint, @Nullable Map<String,org.apache.hadoop.yarn.api.records.LocalResource> additionalLocalFiles)
DataSourceDescriptor
when the data shard calculation
happens in the client at compile timeinputDescriptor
- An InputDescriptor
for the InputinitializerDescriptor
- An initializer for this Input which may run within the AM.
This can be used to set the parallelism for this vertex and
generate InputDataInformationEvent
s
for the actual Input.
If this is not specified, the parallelism must be set for the
vertex. In addition, the Input should know how to access data
for each of it's tasks. If a InputInitializer
is
meant to determine the parallelism of the vertex, the initial
vertex parallelism should be set to -1. Can be null.numShards
- Number of shards of datacredentials
- Credentials needed to access the datalocationHint
- Location hints for the vertex tasksadditionalLocalFiles
- additional local files required by this Input. An attempt
will be made to add these files to the Vertex as Private
resources. If a name conflict occurs, a TezUncheckedException
will be thrownpublic InputDescriptor getInputDescriptor()
InputDescriptor
for this DataSourceDescriptor
InputDescriptor
@Nullable public InputInitializerDescriptor getInputInitializerDescriptor()
InputInitializerDescriptor
for this DataSourceDescriptor
InputInitializerDescriptor
public DataSourceDescriptor addURIsForCredentials(Collection<URI> uris)
FileSystem
implementations that support
credentials.uris
- a list of URI
spublic Collection<URI> getURIsForCredentials()
Copyright © 2024 Apache Software Foundation. All rights reserved.