This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
spark [2019/02/27 10:54] mantis [Providing spark jars] |
spark [2019/06/24 16:45] mantis |
||
---|---|---|---|
Line 1: | Line 1: | ||
Unless noted otherwise, code is tested with Spark **2.2** | Unless noted otherwise, code is tested with Spark **2.2** | ||
+ | |||
+ | ====== Non-committal testdrive ====== | ||
+ | |||
+ | Minimum-effort way to test-drive Spark with a | ||
+ | [[https://databricks.com/spark/getting-started-with-apache-spark/quick-start#overview|Databricks tutorial]] (no local setup required) | ||
Line 127: | Line 132: | ||
<code bash> | <code bash> | ||
cd /opt/spark-2.2.0-bin-hadoop2.7/jars | cd /opt/spark-2.2.0-bin-hadoop2.7/jars | ||
- | zip /opt/spark-2.2.0-bin-hadoop2.7/spark220-jars.zip ./*</code> | + | zip /opt/spark-2.2.0-bin-hadoop2.7/spark220-jars.zip ./* |
+ | # and then copy the archive to your HDFS | ||
+ | hdfs dfs -put /tmp/spark220-jars.zip /user/hdfs/</code> | ||
+ | |||
+ | |||
+ | Then you can make use of the provided archive by adding to spark-submit | ||
+ | |||
+ | <code> --conf spark.yarn.archive=hdfs:///user/hdfs/spark220-jars.zip </code> | ||
- | and then copy the archive to your HDFS | ||
- | <code bash>hdfs dfs -put /tmp/spark220-jars.zip /user/hdfs/</code> | ||
====== Testing ====== | ====== Testing ====== | ||
Line 139: | Line 149: | ||
+ | ====== Tuning ====== | ||