732A54 and TDDE31 Big Data Analytics
BDA2 - Spark SQL - Assigments
Note: please make sure you have gone through the BDA Lab Introduction.
PySpark 3.5.1 API reference
For the BDA - Spark SQL, refer to the PySpark 3.5.1 API documentation listed below:
BDA2 - Spark SQL - Assigments
Redo the assigments from BDA1 using Spark SQL wherever possible. The initial processing of CSV files (such as splitting on ;) can be done using Spark's map.
There are two ways to write queries in Spark SQL , using built-in API functions or running SQL-like queries. To pass this lab, you need to use built-in API functions for all 5 assigments.
For each assignment, include the following data in your report, sorted as shown below.
Important: You may use either the local or yarn option to submit your jobs. However, to pass BDA1 and BDA2, you must demonstrate that you have used both options and submit the updatedrun_yarn*.q file (for instance, pick one question and show you use a different option to submit the job).
Assigment 1
year, station with the max, maxValue ORDER BY maxValue DESC
year, station with the min, minValue ORDER BY minValue DESC
Assigment 2
year, month, value ORDER BY value DESC
year, month, value ORDER BY value DESC
Assigment 3
year, month, station, avgMonthlyTemperature ORDER BY avgMonthlyTemperature DESC
Assigment 4
station, maxTemp, maxDailyPrecipitation ORDER BY station DESC
Assigment 5
year, month, avgMonthlyPrecipitation ORDER BY year DESC, month DESC
Page responsible: Huanyu Li
Last updated: 2026-04-08
