Scala for Data Scientists — Part 1

2 min readNov 13, 2019

This is part 1 of a 3 part series to learn Scala. We will focus on learning what is Scala, what other libraries come along with it, why is it better and how typical processing looks like

**Spark is for Scala** like Pandas is for Python

A typical spark process: **Your code goes to where data is**; whereas, in Python, data is passed to the code. This is what makes it super fast as we are working with huge data sets

Spark operations are categorized into **Transformations & Actions**

Some transformations happen within the node (eg: Filter); Some require shuffling across nodes (Eg: Group by)

Now that you understand what Scala is and how a typical scala process works, we will learn Functional Programming concepts in Part 2

Scala for Data Scientists — Part 1

Written by Mohan Dorairaj