@InterfaceAudience.Public public class Vertex extends Object
Modifier and Type | Class and Description |
---|---|
static class |
Vertex.VertexExecutionContext
The execution context for a running vertex.
|
Modifier and Type | Method and Description |
---|---|
Vertex |
addDataSink(String outputName,
DataSinkDescriptor dataSinkDescriptor)
Specifies an external data sink for a Vertex.
|
Vertex |
addDataSource(String inputName,
DataSourceDescriptor dataSourceDescriptor)
Specifies an external data source for a Vertex.
|
Vertex |
addTaskLocalFiles(Map<String,org.apache.hadoop.yarn.api.records.LocalResource> localFiles)
Set the files etc that must be provided to the tasks of this vertex
|
static Vertex |
create(String vertexName,
ProcessorDescriptor processorDescriptor)
Create a new vertex with the given name.
|
static Vertex |
create(String vertexName,
ProcessorDescriptor processorDescriptor,
int parallelism)
Create a new vertex with the given name and parallelism.
|
static Vertex |
create(String vertexName,
ProcessorDescriptor processorDescriptor,
int parallelism,
org.apache.hadoop.yarn.api.records.Resource taskResource)
Create a new vertex with the given name.
|
boolean |
equals(Object obj) |
Map<String,String> |
getConf() |
List<Vertex> |
getInputVertices()
Get the input vertices for this vertex
|
String |
getName()
Get the vertex name
|
List<Vertex> |
getOutputVertices()
Get the output vertices for this vertex
|
int |
getParallelism()
Get the specified number of tasks specified to run in this vertex.
|
ProcessorDescriptor |
getProcessorDescriptor()
Get the vertex task processor descriptor
|
Map<String,String> |
getTaskEnvironment()
Get the environment variables of the tasks
|
String |
getTaskLaunchCmdOpts()
Get the launch command opts for tasks in this vertex
|
Map<String,org.apache.hadoop.yarn.api.records.LocalResource> |
getTaskLocalFiles()
Get the files etc that must be provided by the tasks of this vertex
|
org.apache.hadoop.yarn.api.records.Resource |
getTaskResource()
Get the resources for the vertex
|
int |
hashCode() |
Vertex |
setConf(String property,
String value)
This is currently used to setup additional configuration parameters which will be available
in the Vertex specific configuration used in the AppMaster.
|
Vertex |
setExecutionContext(Vertex.VertexExecutionContext vertexExecutionContext)
Sets the execution context for this Vertex - i.e.
|
Vertex |
setLocationHint(VertexLocationHint locationHint)
Specify location hints for the tasks of this vertex.
|
Vertex |
setTaskEnvironment(Map<String,String> environment)
Set the Key-Value pairs of environment variables for tasks of this vertex.
|
Vertex |
setTaskLaunchCmdOpts(String cmdOpts)
Set the command opts for tasks of this vertex.
|
Vertex |
setVertexManagerPlugin(VertexManagerPluginDescriptor vertexManagerPluginDescriptor)
Specifies a
VertexManagerPlugin for the vertex. |
String |
toString() |
public static Vertex create(String vertexName, ProcessorDescriptor processorDescriptor, int parallelism, org.apache.hadoop.yarn.api.records.Resource taskResource)
vertexName
- Name of the vertexprocessorDescriptor
- Description of the processor that is executed in every task of
this vertexparallelism
- Number of tasks in this vertex. Set to -1 if this is going to be
decided at runtime. Parallelism may change at runtime due to graph
reconfigurations.taskResource
- Physical resources like memory/cpu thats used by each task of this
vertex.public static Vertex create(String vertexName, ProcessorDescriptor processorDescriptor)
Vertex(String, ProcessorDescriptor, int)
with the
parallelism set to -1.vertexName
- Name of the vertexprocessorDescriptor
- Description of the processor that is executed in every task of
this vertexpublic static Vertex create(String vertexName, ProcessorDescriptor processorDescriptor, int parallelism)
TezConfiguration.TEZ_TASK_RESOURCE_MEMORY_MB
&
TezConfiguration.TEZ_TASK_RESOURCE_CPU_VCORES
Applications that
want more control over their task resource specification may create their
own logic to determine task resources and use
Vertex(String, ProcessorDescriptor, int, Resource)
to create
the Vertex.vertexName
- Name of the vertexprocessorDescriptor
- Description of the processor that is executed in every task of
this vertexparallelism
- Number of tasks in this vertex. Set to -1 if this is going to be
decided at runtime. Parallelism may change at runtime due to graph
reconfigurations.public String getName()
public ProcessorDescriptor getProcessorDescriptor()
public int getParallelism()
public org.apache.hadoop.yarn.api.records.Resource getTaskResource()
public Vertex setLocationHint(VertexLocationHint locationHint)
locationHint
- list of locations for each task in the vertexpublic Vertex addTaskLocalFiles(Map<String,org.apache.hadoop.yarn.api.records.LocalResource> localFiles)
localFiles
- files that must be available locally for each task. These files
may be regular files, archives etc. as specified by the value
elements of the map.public Map<String,org.apache.hadoop.yarn.api.records.LocalResource> getTaskLocalFiles()
public Vertex setTaskEnvironment(Map<String,String> environment)
environment
- environment
is null
public Map<String,String> getTaskEnvironment()
public Vertex setTaskLaunchCmdOpts(String cmdOpts)
cmdOpts
- public Vertex addDataSource(String inputName, DataSourceDescriptor dataSourceDescriptor)
addEdge
method.
If a vertex needs to use data generated by another vertex in the DAG and
also from an external source, a combination of this API and the DAG.addEdge
API can be used.
Note: If more than one RootInput exists on a vertex, which generates events
which need to be routed, or generates information to set parallelism, a
custom vertex manager should be setup to handle this. Not using a custom
vertex manager for such a scenario will lead to a runtime failure.inputName
- the name of the input. This will be used when accessing the input
in the LogicalIOProcessor
dataSourceDescriptor
- the @{link DataSourceDescriptor} for this input.public Vertex addDataSink(String outputName, DataSinkDescriptor dataSinkDescriptor)
addEdge
method.
If a vertex needs generate data to an external source as well as for
another Vertex in the DAG, a combination of this API and the DAG.addEdge
API can be used.outputName
- the name of the output. This will be used when accessing the
output in the LogicalIOProcessor
dataSinkDescriptor
- the DataSinkDescriptor
for this outputpublic Vertex setVertexManagerPlugin(VertexManagerPluginDescriptor vertexManagerPluginDescriptor)
VertexManagerPlugin
for the vertex. This plugin can be
used to modify the parallelism or reconfigure the vertex at runtime using
user defined code embedded in the pluginvertexManagerPluginDescriptor
- public String getTaskLaunchCmdOpts()
@InterfaceStability.Unstable public Vertex setConf(String property, String value)
property
- the property namevalue
- the value for the propertypublic Vertex setExecutionContext(Vertex.VertexExecutionContext vertexExecutionContext)
vertexExecutionContext
- the execution context for the vertex.public List<Vertex> getInputVertices()
public List<Vertex> getOutputVertices()
Copyright © 2024 Apache Software Foundation. All rights reserved.