DataSourceDescriptor (tez-api 0.10.3 API)

java.lang.Object
- org.apache.tez.dag.api.DataSourceDescriptor

@InterfaceAudience.Public
public class DataSourceDescriptor
extends Object

Defines the input and input initializer for a data source

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`DataSourceDescriptor`	`addURIsForCredentials(Collection<URI> uris)` This method can be used to specify a list of URIs for which Credentials need to be obtained so that the job can run.
`static DataSourceDescriptor`	`create(InputDescriptor inputDescriptor, InputInitializerDescriptor initializerDescriptor, org.apache.hadoop.security.Credentials credentials)` Create a `DataSourceDescriptor` when the data shard calculation happens in the App Master at runtime
`static DataSourceDescriptor`	`create(InputDescriptor inputDescriptor, InputInitializerDescriptor initializerDescriptor, int numShards, org.apache.hadoop.security.Credentials credentials, VertexLocationHint locationHint, Map<String,org.apache.hadoop.yarn.api.records.LocalResource> additionalLocalFiles)` Create a `DataSourceDescriptor` when the data shard calculation happens in the client at compile time
`InputDescriptor`	`getInputDescriptor()` Get the `InputDescriptor` for this `DataSourceDescriptor`
`InputInitializerDescriptor`	`getInputInitializerDescriptor()` Get the `InputInitializerDescriptor` for this `DataSourceDescriptor`
`Collection<URI>`	`getURIsForCredentials()` Get the URIs for which credentials will be obtained

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - create
```
public static DataSourceDescriptor create(InputDescriptor inputDescriptor,
                                          @Nullable
                                          InputInitializerDescriptor initializerDescriptor,
                                          @Nullable
                                          org.apache.hadoop.security.Credentials credentials)
```
    Create a DataSourceDescriptor when the data shard calculation happens in the App Master at runtime
    
    Parameters:
    
    inputDescriptor - An InputDescriptor for the Input
    
    credentials - Credentials needed to access the data
    
    initializerDescriptor - An initializer for this Input which may run within the AM. This can be used to set the parallelism for this vertex and generate InputDataInformationEvents for the actual Input.
    If this is not specified, the parallelism must be set for the vertex. In addition, the Input should know how to access data for each of it's tasks.
    If a InputInitializer is meant to determine the parallelism of the vertex, the initial vertex parallelism should be set to -1. Can be null.
  - create
```
public static DataSourceDescriptor create(InputDescriptor inputDescriptor,
                                          @Nullable
                                          InputInitializerDescriptor initializerDescriptor,
                                          int numShards,
                                          @Nullable
                                          org.apache.hadoop.security.Credentials credentials,
                                          @Nullable
                                          VertexLocationHint locationHint,
                                          @Nullable
                                          Map<String,org.apache.hadoop.yarn.api.records.LocalResource> additionalLocalFiles)
```
    Create a DataSourceDescriptor when the data shard calculation happens in the client at compile time
    
    Parameters:
    
    inputDescriptor - An InputDescriptor for the Input
    
    initializerDescriptor - An initializer for this Input which may run within the AM. This can be used to set the parallelism for this vertex and generate InputDataInformationEvents for the actual Input.
    If this is not specified, the parallelism must be set for the vertex. In addition, the Input should know how to access data for each of it's tasks.
    If a InputInitializer is meant to determine the parallelism of the vertex, the initial vertex parallelism should be set to -1. Can be null.
    
    numShards - Number of shards of data
    
    credentials - Credentials needed to access the data
    
    locationHint - Location hints for the vertex tasks
    
    additionalLocalFiles - additional local files required by this Input. An attempt will be made to add these files to the Vertex as Private resources. If a name conflict occurs, a TezUncheckedException will be thrown
  - getInputDescriptor
```
public InputDescriptor getInputDescriptor()
```
    Get the InputDescriptor for this DataSourceDescriptor
    
    Returns:
    
    InputDescriptor
  - getInputInitializerDescriptor
```
@Nullable
public InputInitializerDescriptor getInputInitializerDescriptor()
```
    Get the InputInitializerDescriptor for this DataSourceDescriptor
    
    Returns:
    
    InputInitializerDescriptor
  - addURIsForCredentials
```
public DataSourceDescriptor addURIsForCredentials(Collection<URI> uris)
```
    This method can be used to specify a list of URIs for which Credentials need to be obtained so that the job can run. An incremental list of URIs can be provided by making multiple calls to the method. Currently, @{link credentials} can only be fetched for HDFS and other FileSystem implementations that support credentials.
    
    Parameters:
    
    uris - a list of URIs
    
    Returns:
    
    this
  - getURIsForCredentials
```
public Collection<URI> getURIsForCredentials()
```
    Get the URIs for which credentials will be obtained
    
    Returns:
    
    an unmodifiable list representing the URIs for which credentials are required.

Class DataSourceDescriptor

Method Summary

Methods inherited from class java.lang.Object

Method Detail

create

create

getInputDescriptor

getInputInitializerDescriptor

addURIsForCredentials

getURIsForCredentials