TezConfiguration


Property Name Default Value Description Type Is Private? Is Unstable? Is Evolving?
tez.dag.recovery.enabled true Boolean value. Enable recovery of DAGs. This allows a restarted app master to recover the incomplete DAGs from the previous instance of the app master. boolean false false false
tez.dag.recovery.io.buffer.size 8192 Int value. Size in bytes for the IO buffer size while processing the recovery file. Expert level setting. integer false false false
tez.dag.recovery.flush.interval.secs 30 Int value. Interval, in seconds, between flushing recovery data to the recovery log. integer false false false
tez.dag.recovery.max.unflushed.events 100 Int value. Number of recovery events to buffer before flushing them to the recovery log. integer false false false
tez.task.heartbeat.timeout.check-ms 30000 Int value. Time interval, in milliseconds, between checks for lost tasks. Expert level setting. integer false false false
tez.task.timeout-ms 300000 Int value. Time interval, in milliseconds, within which a task must heartbeat to the app master before its considered lost. Expert level setting. integer false false false
tez.am.acls.enabled true Boolean value. Configuration to enable/disable ACL checks. boolean false false false
tez.allow.disabled.timeline-domains false Boolean value. Allow disabling of Timeline Domains even if Timeline is being used. boolean true false false
tez.am.client.am.port-range null String value. Range of ports that the AM can use when binding for client connections. Leave blank to use all possible ports. Expert level setting. It's hadoop standard range configuration. For example 50000-50050,50100-50200 string false false false
tez.am.client.am.thread-count 1 Int value. Number of threads to handle client RPC requests. Expert level setting. integer false false false
tez.am.commit-all-outputs-on-dag-success true Boolean value. Determines when the final outputs to data sinks are committed. Commit is an output specific operation and typically involves making the output visible for consumption. If the config is true, then the outputs are committed at the end of DAG completion after all constituent vertices have completed. If false, outputs for each vertex are committed after that vertex succeeds. Depending on the desired output visibility and downstream consumer dependencies this value must be appropriately chosen. Defaults to the safe choice of true. boolean false false false
tez.am.containerlauncher.thread-count-limit 500 Int value. Upper limit on the number of threads user to launch containers in the app master. Expert level setting. integer false false false
tez.am.container.idle.release-timeout-max.millis 10000 Int value. The maximum amount of time to hold on to a container if no task can be assigned to it immediately. Only active when reuse is enabled. The value must be +ve and >= TezConfiguration#TEZ_AM_CONTAINER_IDLE_RELEASE_TIMEOUT_MIN_MILLIS. Containers will have an expire time set to a random value between TezConfiguration#TEZ_AM_CONTAINER_IDLE_RELEASE_TIMEOUT_MIN_MILLIS && TezConfiguration#TEZ_AM_CONTAINER_IDLE_RELEASE_TIMEOUT_MAX_MILLIS. This creates a graceful reduction in the amount of idle resources held long false false false
tez.am.container.idle.release-timeout-min.millis 5000 Int value. The minimum amount of time to hold on to a container that is idle. Only active when reuse is enabled. Set to -1 to never release idle containers (not recommended). integer false false false
tez.am.container.reuse.enabled true Boolean value. Configuration to specify whether container should be reused across tasks. This improves performance by not incurring recurring launch overheads. boolean false false false
tez.am.container.reuse.locality.delay-allocation-millis 250 Int value. The amount of time to wait before assigning a container to the next level of locality. NODE -> RACK -> NON_LOCAL. Delay scheduling parameter. Expert level setting. long false false false
tez.am.container.reuse.non-local-fallback.enabled false Boolean value. Whether to reuse containers for non-local tasks. Active only if reuse is enabled. Turning this on can severely affect locality and can be bad for jobs with high data volume being read from the primary data sources. boolean false false false
tez.am.container.reuse.rack-fallback.enabled true Boolean value. Whether to reuse containers for rack local tasks. Active only if reuse is enabled. boolean false false false
tez.am.dag.scheduler.class org.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrder String value. The class to be used for DAG Scheduling. Expert level setting. string false false false
tez.am.disable.client-version-check false Boolean value. Disable version check between client and AM/DAG. Default false. boolean true false false
tez.am.inline.task.execution.enabled false Tez AM Inline Mode flag. Not valid till Tez-684 get checked-in boolean true false false
tez.am.inline.task.execution.max-tasks 1 Int value. The maximium number of tasks running in parallel within the app master process. integer false false false
tez.am.launch.cluster-default.cmd-opts -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN String value. Command line options which will be prepended to {@link #TEZ_AM_LAUNCH_CMD_OPTS} during the launch of the AppMaster process. This property will typically be configured to include default options meant to be used by all jobs in a cluster. If required, the values can be overridden per job. string false false false
tez.am.launch.cmd-opts -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC String value. Command line options provided during the launch of the Tez AppMaster process. Its recommended to not set any Xmx or Xms in these launch opts so that Tez can determine them automatically. string false false false
tez.am.launch.env null String value. Env settings for the Tez AppMaster process. Should be specified as a comma-separated of key-value pairs where each pair is defined as KEY=VAL e.g. "LD_LIBRARY_PATH=.,USERNAME=foo" These take least precedence compared to other methods of setting env. These get added to the app master environment prior to launching it. string false false false
tez.am.legacy.speculative.slowtask.threshold null Float value. Specifies how many standard deviations away from the mean task execution time should be considered as an outlier/slow task. float false false true
tez.am.log.level INFO Root Logging level passed to the Tez app master. Simple configuration: Set the log level for all loggers. e.g. INFO This sets the log level to INFO for all loggers. Advanced configuration: Set the log level for all classes, along with a different level for some. e.g. DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO This sets the log level for all loggers to DEBUG, expect for the org.apache.hadoop.ipc and org.apache.hadoop.security, which are set to INFO Note: The global log level must always be the first parameter. DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is valid org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is not valid string false false false
tez.am.max.allowed.time-sec.for-read-error 300 int value. Represents the maximum time in seconds for which a consumer attempt can report a read error against its producer attempt, after which the producer attempt will be re-run to re-generate the output. There are other heuristics which determine the retry and mainly try to guard against a flurry of re-runs due to intermittent read errors (due to network issues). This configuration puts a time limit on those heuristics to ensure jobs dont hang indefinitely due to lack of closure in those heuristics Expert level setting. integer false false false
tez.am.max.app.attempts 2 Int value. Specifies the number of times the app master can be launched in order to recover from app master failure. Typically app master failures are non-recoverable. This parameter is for cases where the app master is not at fault but is lost due to system errors. Expert level setting. integer false false false
tez.am.maxtaskfailures.per.node 10 Int value. Specifies the number of task failures on a node before the node is considered faulty. integer false false false
tez.am.modify-acls null String value. AM modify ACLs. This allows the specified users/groups to run modify operations on the AM such as submitting DAGs, pre-warming the session, killing DAGs or shutting down the session. Comma separated list of users, followed by whitespace, followed by a comma separated list of groups string false false false
tez.am.node-blacklisting.enabled true Boolean value. Enabled blacklisting of nodes of nodes that are considered faulty. These nodes will not be used to execute tasks. boolean false false false
tez.am.node-blacklisting.ignore-threshold-node-percent 33 Int value. Specifies the percentage of nodes in the cluster that may be considered faulty. This limits the number of nodes that are blacklisted in an effort to minimize the effects of temporary surges in failures (e.g. due to network outages). integer false false false
tez.am.preemption.heartbeats-between-preemptions 3 Int value. The number of RM heartbeats to wait after preempting running tasks before preempting more running tasks. After preempting a task, we need to wait at least 1 heartbeat so that the RM can act on the released resources and assign new ones to us. Expert level setting. integer false false false
tez.am.preemption.max.wait-time-ms 60000 Int value. Time (in millisecs) that an unsatisfied request will wait before preempting other resources. In rare cases, the cluster says there are enough free resources but does not end up getting enough on a node to actually assign it to the job. This configuration tries to put a deadline on such wait to prevent indefinite job hangs. integer false false false
tez.am.preemption.percentage 10 Int value. Specifies the percentage of tasks eligible to be preempted that will actually be preempted in a given round of Tez internal preemption. This slows down preemption and gives more time for free resources to be allocated by the cluster (if any) and gives more time for preemptable tasks to finish. Valid values are 0-100. Higher values will preempt quickly at the cost of losing work. Setting to 0 turns off preemption. Expert level setting. integer false false false
tez.am.resource.cpu.vcores 1 Int value. The number of virtual cores to be used by the app master integer false false false
tez.am.resource.memory.mb 1024 Int value. The amount of memory in MB to be used by the AppMaster integer false false false
tez.am.am-rm.heartbeat.interval-ms.max 1000 Int value. The maximum heartbeat interval between the AM and RM in milliseconds Increasing this reduces the communication between the AM and the RM and can help in scaling up. Expert level setting. integer false false false
tez.am.session.min.held-containers 0 Int value. The minimum number of containers that will be held in session mode. Not active in non-session mode. Enables an idle session (not running any DAG) to hold on to a minimum number of containers to provide fast response times for the next DAG. integer false false false
tez.am.mode.session false Boolean value. Execution mode for the Tez application. True implies session mode. If the client code is written according to best practices then the same code can execute in either mode based on this configuration. Session mode is more aggressive in reserving execution resources and is typically used for interactive applications where multiple DAGs are submitted in quick succession by the same user. For long running applications, one-off executions, batch jobs etc non-session mode is recommended. If session mode is enabled then container reuse is recommended. boolean false false false
tez.am.speculation.enabled false boolean false false true
tez.staging-dir null String value. Specifies a directory where Tez can create temporary job artifacts. string false false false
tez.am.staging.scratch-data.auto-delete true Boolean value. If true then Tez will try to automatically delete temporary job artifacts that it creates within the specified staging dir. Does not affect any user data. boolean false false false
tez.am.task.listener.thread-count 30 Int value. The number of threads used to listen to task heartbeat requests. Expert level setting. integer false false false
tez.am.task.max.failed.attempts 4 Int value. The maximum number of attempts that can fail for a particular task before the task is failed. This does not count killed attempts. Task failure results in DAG failure. integer false false false
tez.am.tez-ui.history-url.template __HISTORY_URL_BASE__/#/tez-app/__APPLICATION_ID__ String value Tez UI URL template for the application. Expert level setting. The AM will redirect the user to the Tez UI via this url. Template supports the following parameters to be replaced with the actual runtime information: __APPLICATION_ID__ : Replaces this with application ID __HISTORY_URL_BASE__: replaces this with TEZ_HISTORY_URL_BASE For example, "http://uihost:9001/#/tez-app/__APPLICATION_ID__/ will be replaced to http://uihost:9001/#/tez-app/application_1421880306565_0001/ string false false false
tez.am.view-acls null String value. AM view ACLs. This allows the specified users/groups to view the status of the AM and all DAGs that run within this AM. Comma separated list of users, followed by whitespace, followed by a comma separated list of groups string false false false
tez.am.tez-ui.webservice.enable true String value Allow disabling of the Tez AM webservice. If set to false the Tez-UI wont show progress updates for running application. boolean false false false
tez.aux.uris null Auxiliary resources to be localized for the Tez AM and all its containers. Value is comma-separated list of fully-resolved directories or file paths. All resources are made available into the working directory of the AM and/or containers i.e. $CWD. If directories are specified, they are not traversed recursively. Only files directly under the specified directory are localized. All duplicate resources are ignored. string false false false
tez.cancel.delegation.tokens.on.completion true boolean true false false
tez.client.asynchronous-stop true Boolean value. Backwards compatibility setting. Changes TezClient stop to be a synchronous call waiting until AM is in a final state before returning to the user. Expert level setting. boolean false false false
tez.client.diagnostics.wait.timeout-ms 3000 Long value Time to wait (in milliseconds) for yarn app's diagnotics is available Workaround for YARN-2560 long true false false
tez.client.timeout-ms 30000 Long value. Time interval, in milliseconds, for client to wait during client-requested AM shutdown before issuing a hard kill to the RM for this application. Expert level setting. long false false false
tez.java.opts.checker.class null String value. Ability to provide a different implementation to check/verify java opts defined for vertices/tasks. Class has to be an instance of JavaOptsChecker string true false false
tez.java.opts.checker.enabled true Boolean value. Default true. Ability to disable the Java Opts Checker boolean true false false
tez.container.max.java.heap.fraction 0.8 Double value. Tez automatically determines the Xmx for the JVMs used to run Tez tasks and app masters. This feature is enabled if the user has not specified Xmx or Xms values in the launch command opts. Doing automatic Xmx calculation is preferred because Tez can determine the best value based on actual allocation of memory to tasks the cluster. The value if used as a fraction that is applied to the memory allocated Factor to size Xmx based on container memory size. Value should be greater than 0 and less than 1. float false false false
tez.counters.counter-name.max-length 64 Int value. Configuration to limit the length of counter names. This can be used to limit the amount of memory being used in the app master to store the counters. Expert level setting. integer false false true
tez.counters.group-name.max-length 256 Int value. Configuration to limit the counter group names per app master. This can be used to limit the amount of memory being used in the app master to store the counters. Expert level setting. integer false false true
tez.counters.max 1200 Int value. Configuration to limit the counters per dag (AppMaster and Task). This can be used to limit the amount of memory being used in the app master to store the counters. Expert level setting. integer false false true
tez.counters.max.groups 500 Int value. Configuration to limit the number of counter groups for a DAG. This can be used to limit the amount of memory being used in the app master to store the counters. Expert level setting. integer false false true
tez.credentials.path null String value that is a file path. Path to a credentials file (with serialized credentials) located on the local file system. string false false false
tez.dag.status.pollinterval-ms 500 Long value Status Poll interval in Milliseconds used when getting DAG status with timeout. long false false false
tez.generate.debug.artifacts false boolean false false true
tez.history.logging.service.class org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService String value that is a class name. Specify the class to use for logging history data string false false false
tez.tez-ui.history-url.base null String value Tez-UI Url base. This gets replaced in the TEZ_AM_TEZ_UI_HISTORY_URL_TEMPLATE ex http://ui-host:9001 or if its hosted with a prefix http://ui-host:9001/~user if the ui is hosted on the default port (80 for http and 443 for https), the port should not be specified. string false false false
tez.ignore.lib.uris null Boolean value. Allows to ignore 'tez.lib.uris'. Useful during development as well as raw Tez application where classpath is propagated with application via {@link LocalResource}s. This is mainly useful for developer/debugger scenarios. boolean false false true
tez.lib.uris null String value to a file path. The location of the Tez libraries which will be localized for DAGs. This follows the following semantics
  1. To use a single .tar.gz or .tgz file (generated by the tez build), the full path to this file (including filename) should be specified. The internal structure of the uncompressed tgz will be retained under $CWD/tezlib.
  2. If a single file is specified without the above mentioned extensions - it will be treated as a regular file. This means it will not be uncompressed during runtime.
  3. If multiple entries exist
    • Files: will be treated as regular files (not uncompressed during runtime)
    • Directories: all files under the directory (non-recursive) will be made available (but not uncompressed during runtime).
    • All files / contents of directories are flattened into a single directory - $CWD
string false false false
tez.local.mode false Boolean value. Enable local mode execution in Tez. Enables tasks to run in the same process as the app master. Primarily used for debugging. boolean false false false
tez.queue.name null String value. The queue name for all jobs being submitted from a given client. string false false false
tez.session.am.dag.submit.timeout.secs 300 Int value. Time (in seconds) for which the Tez AM should wait for a DAG to be submitted before shutting down. Only relevant in session mode. integer false false false
tez.session.client.timeout.secs 120 Int value. Time (in seconds) to wait for AM to come up when trying to submit a DAG from the client. Only relevant in session mode. If the cluster is busy and cannot launch the AM then this timeout may be hit. In those case, using non-session mode is recommended if applicable. Otherwise increase the timeout (set to -1 for infinity. Not recommended) integer false false false
tez.simple.history.logging.dir null String value. The directory into which history data will be written. This defaults to the container logging directory. This is relevant only when SimpleHistoryLoggingService is being used for {@link TezConfiguration#TEZ_HISTORY_LOGGING_SERVICE_CLASS} string false false false
tez.simple.history.max.errors 10 Int value. Maximum errors allowed while logging history data. After crossing this limit history logging gets disabled. The job continues to run after this. integer false false false
tez.task.am.heartbeat.counter.interval-ms.max 4000 Int value. Interval, in milliseconds, after which counters are sent to AM in heartbeat from tasks. This reduces the amount of network traffice between AM and tasks to send high-volume counters. Improves AM scalability. Expert level setting. integer false false false
tez.task.am.heartbeat.interval-ms.max 100 Int value. The maximum heartbeat interval, in milliseconds, between the app master and tasks. Increasing this can help improve app master scalability for a large number of concurrent tasks. Expert level setting. integer false false false
tez.task.generate.counters.per.io false Whether to generate counters per IO or not. Enabling this will rename CounterGroups / CounterNames to making them unique per Vertex + Src|Destination boolean true false true
tez.task.get-task.sleep.interval-ms.max 200 Int value. The maximum amount of time, in milliseconds, to wait before a task asks an AM for another task. Increasing this can help improve app master scalability for a large number of concurrent tasks. Expert level setting. integer false false false
tez.task.launch.cluster-default.cmd-opts -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN String value. Command line options which will be prepended to {@link #TEZ_TASK_LAUNCH_CMD_OPTS} during the launch of Tez tasks. This property will typically be configured to include default options meant to be used by all jobs in a cluster. If required, the values can be overridden per job. string false false false
tez.task.launch.cmd-opts -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC String value. Command line options provided during the launch of Tez Task processes. Its recommended to not set any Xmx or Xms in these launch opts so that Tez can determine them automatically. string false false false
tez.task.launch.env null String value. Env settings for the Tez Task processes. Should be specified as a comma-separated of key-value pairs where each pair is defined as KEY=VAL e.g. "LD_LIBRARY_PATH=.,USERNAME=foo" These take least precedence compared to other methods of setting env These get added to the task environment prior to launching it. string false false false
tez.task.log.level INFO Root Logging level passed to the Tez tasks. Simple configuration: Set the log level for all loggers. e.g. INFO This sets the log level to INFO for all loggers. Advanced configuration: Set the log level for all classes, along with a different level for some. e.g. DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO This sets the log level for all loggers to DEBUG, expect for the org.apache.hadoop.ipc and org.apache.hadoop.security, which are set to INFO Note: The global log level must always be the first parameter. DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is valid org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is not valid string false false false
tez.task.max-events-per-heartbeat 500 Int value. Maximum number of of events to fetch from the AM by the tasks in a single heartbeat. Expert level setting. Expert level setting. integer false false false
tez.task.resource.calculator.process-tree.class null string true false true
tez.task.resource.cpu.vcores 1 Int value. The number of virtual cores to be used by tasks. integer false false false
tez.task.resource.memory.mb 1024 Int value. The amount of memory in MB to be used by tasks. This applies to all tasks across all vertices. Setting it to the same value for all tasks is helpful for container reuse and thus good for performance typically. integer false false false
tez.task.scale.memory.additional-reservation.fraction.max null float true false true
tez.task.scale.memory.additional-reservation.fraction.per-io null Fraction of available memory to reserve per input/output. This amount is removed from the total available pool before allocation and is for factoring in overheads. float true false true
tez.task.scale.memory.allocator.class org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor The allocator to use for initial memory allocation string true false true
tez.task.scale.memory.enabled true Whether to scale down memory requested by each component if the total exceeds the available JVM memory boolean true false true
tez.task.scale.memory.reserve-fraction 0.3 The fraction of the JVM memory which will not be considered for allocation. No defaults, since there are pre-existing defaults based on different scenarios. double true false true
tez.task.scale.memory.ratios null string true false true
tez.task-specific.launch.cmd-opts null Additional launch command options to be added for specific tasks. __VERTEX_NAME__ and __TASK_INDEX__ can be specified, which would be replaced at runtime by vertex name and task index. e.g tez.task-specific.launch.cmd-opts= "-agentpath:libpagent.so,dir=/tmp/__VERTEX_NAME__/__TASK_INDEX__" string false false true
tez.task-specific.launch.cmd-opts.list null Set of tasks for which specific launch command options need to be added. Format: "vertexName[csv of task ids];vertexName[csv of task ids].." Valid e.g: v[0,1,2] - Additional launch-cmd options for tasks 0,1,2 of vertex v v[1,2,3];v2[5,6,7] - Additional launch-cmd options specified for tasks of vertices v and v2. v[1:5,20,30];v2[2:5,60,7] - Additional launch-cmd options for 1,2,3,4,5,20,30 of vertex v; 2, 3,4,5,60,7 of vertex v2 Partial ranges like :5, 1: are not supported. v[] - Additional launch-cmd options for all tasks in vertex v string false false true
tez.task-specific.log.level null Task specific log level. Simple configuration: Set the log level for all loggers. e.g. INFO This sets the log level to INFO for all loggers. Advanced configuration: Set the log level for all classes, along with a different level for some. e.g. DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO This sets the log level for all loggers to DEBUG, expect for the org.apache.hadoop.ipc and org.apache.hadoop.security, which are set to INFO Note: The global log level must always be the first parameter. DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is valid org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is not valid string false false true
tez.test.minicluster.app.wait.on.shutdown.secs 30 Long value. Time to wait (in seconds) for apps to complete on MiniTezCluster shutdown. long true false false
tez.use.cluster.hadoop-libs false Boolean value. Specify whether hadoop libraries required to run Tez should be the ones deployed on the cluster. This is disabled by default - with the expectation being that tez.lib.uris has a complete tez-deployment which contains the hadoop libraries. boolean false false false
tez.yarn.ats.acl.domains.auto-create true boolean false false false
tez.yarn.ats.event.flush.timeout.millis -1 Int value. Time, in milliseconds, to wait while flushing YARN ATS data during shutdown. Expert level setting. long false false false
tez.yarn.ats.max.events.per.batch 5 Int value. Max no. of events to send in a single batch to ATS. Expert level setting. integer false false false
tez.yarn.ats.max.polling.time.per.event.millis 10 Int value. Time, in milliseconds, to wait for an event before sending a batch to ATS. Expert level setting. integer false false false