A Tez specific shuffle handler allows Tez DAGs to shuffle data in a way that takes advantage of the new features in Tez. In particular, the Tez shuffle handler allows DAGs to shuffle data more efficiently for Tez’s new data movements types and runtime optimizations, such as auto-reduce parallelism. Long running Tez sessions will be able to clean up intermediate data for completed queries and Tez applications can decide to clean up completed intermediate data for running applications.
Requires: Apache Tez 0.9.0 or above
Configuration in the client specify the Tez shuffle handler
tez-site.xml ------------- ... <property> <name>tez.am.shuffle.auxiliary-service.id</name> <value>tez_shuffle</value> </property> ...
The Tez Shuffle Handler jar artifact org.apache.org:tez-aux-services needs to be placed into the Node Manager classpath and restarted
Requires: Apache Hadoop 2.6.0 or above
The following configuration needs to be setup in the Node Manager yarn-site.xml to enable the Tez Shuffle Handler
yarn-site.xml ------------- ... <property> <name>yarn.nodemanager.aux-services</name> <value>tez_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.tez_shuffle.class</name> <value>org.apache.tez.auxservices.ShuffleHandler</value> </property> ...