Spark Job Visualizer
Driver
Cluster Manager
Worker 1 / Executor
Worker 2 / Executor
Worker 3 / Executor
Speed1.0x
PySpark Code
1from pyspark.sql import SparkSession23# 1. Initialize Spark Session4spark = SparkSession.builder.appName("WordCount").getOrCreate()5sc = spark.sparkContext67# 2. Load Data from source8sample_data = [9 "spark makes big data simple",10 "big data means huge opportunities",11 "spark runs jobs in parallel",12 "data processing with spark is fast",13 "parallel computing makes tasks faster",14 "big opportunities come with big challenges"15]16lines = sc.parallelize(sample_data)1718# 3. Define Transformations19words = lines.flatMap(lambda line: line.split(" "))20word_counts = words.map(lambda word: (word, 1))21final_counts = word_counts.reduceByKey(lambda a, b: a + b)2223# 4. Trigger Action to start execution24output = final_counts.collect()2526# 5. Process results in Driver27for (word, count) in output:28 print(f"{word}: {count}")2930spark.stop()Execution Analysis
Step 1 of 8
Start: Application Initialization
The PySpark application starts. The driver program requests resources from the Cluster Manager to launch executors on worker nodes. The SparkContext is created.
Data Inspector