This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
spark [2018/12/03 10:09] mantis [statsd] |
spark [2020/05/12 14:26] (current) mantis [Providing spark jars] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | Unless noted otherwise, code is tested with Spark 2.2 | + | Unless noted otherwise, code is tested with Spark **2.2** |
+ | |||
+ | ====== Non-committal testdrive ====== | ||
+ | |||
+ | Minimum-effort way to test-drive Spark with a | ||
+ | [[https://databricks.com/spark/getting-started-with-apache-spark/quick-start#overview|Databricks tutorial]] (no local setup required) | ||
Line 20: | Line 25: | ||
===== Installation ===== | ===== Installation ===== | ||
+ | In order of how it will be used later on. | ||
==== influxdb ==== | ==== influxdb ==== | ||
Line 38: | Line 44: | ||
I tested successfully with version 2.1.0. | I tested successfully with version 2.1.0. | ||
| | ||
+ | |||
+ | ==== stacktrace export utility ==== | ||
+ | |||
+ | Download this [[https://github.com/aviemzur/statsd-jvm-profiler/blob/master/visualization/influxdb_dump.py|Python script]]. | ||
+ | |||
==== flamegraph ==== | ==== flamegraph ==== | ||
Line 44: | Line 55: | ||
- | ==== stacktrace export utility ==== | ||
- | |||
- | Download this [[https://github.com/aviemzur/statsd-jvm-profiler/blob/master/visualization/influxdb_dump.py|Python script]]. | ||
Line 94: | Line 102: | ||
- | python2.7 $flaminggraph_installation/influxdb_dump.py -o $local_ip -r $port -u profiler -p profiler -d profiler -t spark -e sparkapp -x stack_traces | + | <code>python2.7 $flaminggraph_installation/influxdb_dump.py -o $local_ip -r $port -u profiler -p profiler -d profiler -t spark -e sparkapp -x stack_traces </code> |
- | You can **filter**/exclude specific classes by adding | + | You can **filter**/exclude specific classes by adding an option |
<code> -f /path/to/filterfile</code> | <code> -f /path/to/filterfile</code> | ||
Line 103: | Line 111: | ||
<code>sun.nio</code> | <code>sun.nio</code> | ||
- | Now you can create your flamegraph: | + | Now you can create your flamegraph |
- | perl $flaminggraph_installation/flamegraph.pl --title "$MAINCLASS" stack_traces/all_*.txt > flamegraph.svg | + | <code>perl $flaminggraph_installation/flamegraph.pl --title "$MAINCLASS" stack_traces/all_*.txt > flamegraph.svg</code> |
+ | and open it e.g. in Firefox. | ||
The flamegraph is interactive, you can click into a cell to investigate. | The flamegraph is interactive, you can click into a cell to investigate. | ||
+ | |||
+ | |||
Read more [[http://www.brendangregg.com/flamegraphs.html|here]]. | Read more [[http://www.brendangregg.com/flamegraphs.html|here]]. | ||
{{:fg.png|}} | {{:fg.png|}} | ||
+ | ===== Submitting jobs ===== | ||
+ | |||
+ | ==== Providing spark jars ==== | ||
+ | https://spark.apache.org/docs/latest/running-on-yarn.html#preparations | ||
+ | |||
+ | Download the required version [https://spark.apache.org/downloads.html|here]. | ||
+ | |||
+ | How to setup provided jars (found [[https://mapr.com/docs/60/Spark/ConfigureSparkJARLocation_2.0.1.html|here]]): | ||
+ | |||
+ | <code bash> | ||
+ | cd /opt/spark-2.2.0-bin-hadoop2.7/jars | ||
+ | zip /opt/spark-2.2.0-bin-hadoop2.7/spark220-jars.zip ./* | ||
+ | # and then copy the archive to your HDFS | ||
+ | hdfs dfs -put /tmp/spark220-jars.zip /user/hdfs/</code> | ||
+ | |||
+ | |||
+ | Then you can make use of the provided archive by adding to spark-submit | ||
+ | |||
+ | <code> --conf spark.yarn.archive=hdfs:///user/hdfs/spark220-jars.zip </code> | ||
+ | |||
+ | ====== Testing ====== | ||
+ | |||
+ | <todo>look into | ||
+ | https://github.com/holdenk/spark-testing-base | ||
+ | |||
+ | https://github.com/MrPowers/spark-fast-tests | ||
+ | |||
+ | |||
+ | ====== Tuning ====== | ||