Differences

This shows you the differences between two versions of the page.

--- spark [2018/12/03 10:09]
mantis [statsd]
+++ spark [2020/05/12 14:26] (current)
mantis [Providing spark jars]
@@ Line 1: / Line 1: @@
-Unless noted otherwise, code is tested with Spark 2.2
+Unless noted otherwise, code is tested with Spark **2.2**
+====== Non-committal testdrive ======
+Minimum-effort way to test-drive Spark with a
+[[https://databricks.com/spark/getting-started-with-apache-spark/quick-start#overview|Databricks tutorial]] (no local setup required)
@@ Line 20: / Line 25: @@
 ===== Installation =====
+In order of how it will be used later on.
 ==== influxdb ====
@@ Line 38: / Line 44: @@
 I tested successfully with version 2.1.0.
+==== stacktrace export utility ====
+Download this [[https://github.com/aviemzur/statsd-jvm-profiler/blob/master/visualization/influxdb_dump.py|Python script]].
 ==== flamegraph ====
@@ Line 44: / Line 55: @@
-==== stacktrace export utility ====
-Download this [[https://github.com/aviemzur/statsd-jvm-profiler/blob/master/visualization/influxdb_dump.py|Python script]].
@@ Line 94: / Line 102: @@
-python2.7 $flaminggraph_installation/influxdb_dump.py -o $local_ip -r $port -u profiler -p profiler -d profiler -t spark -e sparkapp -x stack_traces
+<code>python2.7 $flaminggraph_installation/influxdb_dump.py -o $local_ip -r $port -u profiler -p profiler -d profiler -t spark -e sparkapp -x stack_traces </code>
-You can **filter**/exclude specific classes by adding
+You can **filter**/exclude specific classes by adding an option
 <code> -f /path/to/filterfile</code>
@@ Line 103: / Line 111: @@
 <code>sun.nio</code>
-Now you can create your flamegraph:
+Now you can create your flamegraph
-perl $flaminggraph_installation/flamegraph.pl --title "$MAINCLASS" stack_traces/all_*.txt > flamegraph.svg
+<code>perl $flaminggraph_installation/flamegraph.pl --title "$MAINCLASS" stack_traces/all_*.txt > flamegraph.svg</code>
+and open it e.g. in Firefox.
 The flamegraph is interactive, you can click into a cell to investigate.
 Read more [[http://www.brendangregg.com/flamegraphs.html|here]].
 {{:fg.png|}}
+===== Submitting jobs =====
+==== Providing spark jars ====
+https://spark.apache.org/docs/latest/running-on-yarn.html#preparations
+Download the required version [https://spark.apache.org/downloads.html|here].
+How to setup provided jars (found [[https://mapr.com/docs/60/Spark/ConfigureSparkJARLocation_2.0.1.html|here]]):
+<code bash>
+cd /opt/spark-2.2.0-bin-hadoop2.7/jars
+zip /opt/spark-2.2.0-bin-hadoop2.7/spark220-jars.zip ./*
+# and then copy the archive to your HDFS
+hdfs dfs -put /tmp/spark220-jars.zip  /user/hdfs/</code>
+Then you can make use of the provided archive by adding to spark-submit
+<code>    --conf spark.yarn.archive=hdfs:///user/hdfs/spark220-jars.zip </code>
+====== Testing ======
+<todo>look into
+https://github.com/holdenk/spark-testing-base
+ https://github.com/MrPowers/spark-fast-tests
+====== Tuning ======

Small heaps of code

User Tools

Site Tools

Differences

Page Tools