Data · Advanced

Spark Batch Processing

Optimize partitioned jobs, shuffle stages, and cost-aware cluster settings.

Hands-on PySpark labs from local mode to small EMR-style clusters. Focus on partitioning, caching decisions, and explaining physical plans in mentor reviews.

₩1,890,000 · 10 weeks · Blended

Request information Refund & Cancellation
Cluster status panel showing job stages and executor metrics

Included in this cohort

  • PySpark 3.5 lab images
  • Explain plan reading drills
  • Skew mitigation strategies
  • Delta Lake intro module
  • Cost estimation worksheet
  • Performance regression lab
  • Capstone on clickstream aggregation

Outcomes you can show

  1. Cut a sample job runtime by measurable percent
  2. Document partition strategy trade-offs
  3. Present Spark UI screenshots in portfolio
Portrait of Min-jun Park

Mentor

Min-jun Park

Spark practitioner for ad-tech batch reconciliation jobs.

Common questions

PySpark only. Scala snippets appear in reading lists.

Learner notes