Scala for Data Scientists — Part 1

Mohan Dorairaj
2 min readNov 13, 2019

This is part 1 of a 3 part series to learn Scala. We will focus on learning what is Scala, what other libraries come along with it, why is it better and how typical processing looks like

Functional + OO = Scala
Spark is for Scala like Pandas is for Python
A typical spark process: Your code goes to where data is; whereas, in Python, data is passed to the code. This is what makes it super fast as we are working with huge data sets
Spark operations are categorized into Transformations & Actions
Some transformations happen within the node (eg: Filter); Some require shuffling across nodes (Eg: Group by)

Now that you understand what Scala is and how a typical scala process works, we will learn Functional Programming concepts in Part 2

--

--