Thank you for your interest in Redapt. Whether you are a current customer or are interested in becoming one, we are here to help. Just input a few bits of information, and we'll quickly connect you with a helpful Redapt expert.
I recently had the pleasure to explore Apache Spark for a client engagement, and what I discovered was a whole lot of awesomeness. You could say that it sparked my curiosity (pun absolutely intended). I also got to play around a bit with Apache Parquet, a cool columnar data format made for big data processing. As is customary, I spun-up a repository to share my learnings. Both code examples are written in Python and use the Python API for Spark, otherwise known as PySpark. There is much, much more to explore with Spark, so stay tuned for new blog posts. But you already knew that.
So What's Going on in the Script?
Figure 2 shows the script used to generate the above results. Note: Lines 1 – 31 have been excluded for brevity but basically include some setup information and a helper function used in the script output.
Figure 3 shows the Spark UI after running our script.
Lines(s) 32-42 Purpose: Import PySpark modules SparkContext and SQLContext. Create a SQLContext object passing in the SparkContext. When running in the Zeppelin environment this context is set from Zeppelin.
Line(s) 50-59 Purpose: Run a simple query and use a lambda to output results.
Line(s) 61-71 Purpose: Run a more complex slightly complex query and use a lambda to output results.
Running Script Against Local Spark
I have a script configured to run against a local instance of Spark. It is almost identical to the EMR script with a few notable changes. Please refer to the repository README for instructions to run the script locally.
I feel like I really got a decent grasp with the following pieces of tech with my exploration.
There is so much more to experiment with, so keep your eyes peeled for new posts. Not actually peeled — that sounds painful. In the meantime, dive into the repository and start playing around yourself … ignite the awesomeness!