@InterfaceAudience.Public public class DataSourceDescriptor extends Object
| Modifier and Type | Method and Description |
|---|---|
DataSourceDescriptor |
addURIsForCredentials(Collection<URI> uris)
This method can be used to specify a list of URIs for which Credentials
need to be obtained so that the job can run.
|
static DataSourceDescriptor |
create(InputDescriptor inputDescriptor,
InputInitializerDescriptor initializerDescriptor,
org.apache.hadoop.security.Credentials credentials)
Create a
DataSourceDescriptor when the data shard calculation
happens in the App Master at runtime |
static DataSourceDescriptor |
create(InputDescriptor inputDescriptor,
InputInitializerDescriptor initializerDescriptor,
int numShards,
org.apache.hadoop.security.Credentials credentials,
VertexLocationHint locationHint,
Map<String,org.apache.hadoop.yarn.api.records.LocalResource> additionalLocalFiles)
Create a
DataSourceDescriptor when the data shard calculation
happens in the client at compile time |
InputDescriptor |
getInputDescriptor()
Get the
InputDescriptor for this DataSourceDescriptor |
InputInitializerDescriptor |
getInputInitializerDescriptor()
Get the
InputInitializerDescriptor for this DataSourceDescriptor |
Collection<URI> |
getURIsForCredentials()
Get the URIs for which credentials will be obtained
|
public static DataSourceDescriptor create(InputDescriptor inputDescriptor, @Nullable InputInitializerDescriptor initializerDescriptor, @Nullable org.apache.hadoop.security.Credentials credentials)
DataSourceDescriptor when the data shard calculation
happens in the App Master at runtimeinputDescriptor - An InputDescriptor for the Inputcredentials - Credentials needed to access the datainitializerDescriptor - An initializer for this Input which may run within the AM. This
can be used to set the parallelism for this vertex and generate
InputDataInformationEvents for the actual Input.
If this is not specified, the parallelism must be set for the
vertex. In addition, the Input should know how to access data for
each of it's tasks. If a InputInitializer is
meant to determine the parallelism of the vertex, the initial
vertex parallelism should be set to -1. Can be null.public static DataSourceDescriptor create(InputDescriptor inputDescriptor, @Nullable InputInitializerDescriptor initializerDescriptor, int numShards, @Nullable org.apache.hadoop.security.Credentials credentials, @Nullable VertexLocationHint locationHint, @Nullable Map<String,org.apache.hadoop.yarn.api.records.LocalResource> additionalLocalFiles)
DataSourceDescriptor when the data shard calculation
happens in the client at compile timeinputDescriptor - An InputDescriptor for the InputinitializerDescriptor - An initializer for this Input which may run within the AM.
This can be used to set the parallelism for this vertex and
generate InputDataInformationEvents
for the actual Input.
If this is not specified, the parallelism must be set for the
vertex. In addition, the Input should know how to access data
for each of it's tasks. If a InputInitializer
is
meant to determine the parallelism of the vertex, the initial
vertex parallelism should be set to -1. Can be null.numShards - Number of shards of datacredentials - Credentials needed to access the datalocationHint - Location hints for the vertex tasksadditionalLocalFiles - additional local files required by this Input. An attempt
will be made to add these files to the Vertex as Private
resources. If a name conflict occurs, a TezUncheckedException will be thrownpublic InputDescriptor getInputDescriptor()
InputDescriptor for this DataSourceDescriptorInputDescriptor@Nullable public InputInitializerDescriptor getInputInitializerDescriptor()
InputInitializerDescriptor for this DataSourceDescriptorInputInitializerDescriptorpublic DataSourceDescriptor addURIsForCredentials(Collection<URI> uris)
FileSystem implementations that support
credentials.uris - a list of URIspublic Collection<URI> getURIsForCredentials()
Copyright © 2024 Apache Software Foundation. All rights reserved.