Spark Job Visualizer

Driver

Cluster Manager

Worker 1 / Executor

Worker 2 / Executor

Worker 3 / Executor

Speed1.0x

PySpark Code

1from pyspark.sql import SparkSession
2
3# 1. Initialize Spark Session
4spark = SparkSession.builder.appName("WordCount").getOrCreate()
5sc = spark.sparkContext
6
7# 2. Load Data from source
8sample_data = [
9    "spark makes big data simple",
10    "big data means huge opportunities",
11    "spark runs jobs in parallel",
12    "data processing with spark is fast",
13    "parallel computing makes tasks faster",
14    "big opportunities come with big challenges"
15]
16lines = sc.parallelize(sample_data)
17
18# 3. Define Transformations
19words = lines.flatMap(lambda line: line.split(" "))
20word_counts = words.map(lambda word: (word, 1))
21final_counts = word_counts.reduceByKey(lambda a, b: a + b)
22
23# 4. Trigger Action to start execution
24output = final_counts.collect()
25
26# 5. Process results in Driver
27for (word, count) in output:
28    print(f"{word}: {count}")
29
30spark.stop()

Execution Analysis

Step 1 of 8

Start: Application Initialization

The PySpark application starts. The driver program requests resources from the Cluster Manager to launch executors on worker nodes. The SparkContext is created.

Data Inspector

Partition 0
on worker-1

Initial Data

spark makes big data simple
big data means huge opportunities

2 items

After flatMap

spark
makes
big
data
simple
big
data
means
huge
opportunities

10 items

After map

spark: 1
makes: 1
big: 1
data: 1
simple: 1
big: 1
data: 1
means: 1
huge: 1
opportunities: 1

10 items

After reduceByKey (Final)

big: 4
runs: 1
with: 2
come: 1

4 items

Partition 1
on worker-2

Initial Data

spark runs jobs in parallel
data processing with spark is fast

2 items

After flatMap

spark
runs
jobs
in
parallel
data
processing
with
spark
is
fast

11 items

After map

spark: 1
runs: 1
jobs: 1
in: 1
parallel: 1
data: 1
processing: 1
with: 1
spark: 1
is: 1
fast: 1

11 items

After reduceByKey (Final)

makes: 2
means: 1
jobs: 1
processing: 1
is: 1
fast: 1
tasks: 1
faster: 1
challenges: 1

9 items

Partition 2
on worker-3

Initial Data

parallel computing makes tasks faster
big opportunities come with big challenges

2 items

After flatMap

parallel
computing
makes
tasks
faster
big
opportunities
come
with
big
challenges

11 items

After map

parallel: 1
computing: 1
makes: 1
tasks: 1
faster: 1
big: 1
opportunities: 1
come: 1
with: 1
big: 1
challenges: 1

11 items

After reduceByKey (Final)

spark: 3
data: 3
simple: 1
huge: 1
opportunities: 2
in: 1
parallel: 2
computing: 1

8 items

Start: Application Initialization

Partition 0on worker-1

Initial Data

After flatMap

After map

After reduceByKey (Final)

Partition 1on worker-2

Initial Data

After flatMap

After map

After reduceByKey (Final)

Partition 2on worker-3

Initial Data

After flatMap

After map

After reduceByKey (Final)

Partition 0
on worker-1

Partition 1
on worker-2

Partition 2
on worker-3