Impact Sizing or Impact Estimation is what product data scientists rigorously work on during the planning season. It helps answer key questions like:

Here are some key skills you should hone to become a good analyst:

GIT is a version controlling system used in software engineering. For Data Scientists working in teams, I recommend using it for especially for your python code. Though you can check-in Jupiter notebooks, GIT is not very helpful for collaborating with Jupyter notebooks — in fact, I am not aware of any better collaborative environments for Jupyter notebooks. Please leave a comment if something works for you.

With GIT, it is easy to find common commands to perform certain actions. But when mistakes happen (mistakes do happen), it is hard to find how to undo what you just did. This is page serves as a reference on how to undo common tasks on GIT.

As a SQL expert, selecting, filtering, aggregating and even advancedOLAP operations come naturally. However it is hard to wrap our minds the Pandas way of accomplish the very same tasks.

This is learning by replicating exercise. I have picked 3 SQL examples and converted them into Pandas.

Toy Data:

Here is the toy data I used for this exercise:

and here is the code to create this toy data:

import pandas as pdMarks_data=[ [101,80,99,100,'A'],[102,87,76,79,'B'],[103,80,80,81,'B'],[104,65,60,70,'C']]Marks=pd.DataFrame(data=Marks_data, columns=['StudentID','Mark1','Mark2','Mark3','FinalGrade'])Student_data=[[101,1],[102,1],[103,2],[104,1]]Student=pd.DataFrame(data=Student_data, columns=['StudentID','Class'])

3 Examples:

Mohan Dorairaj

Lead Data Scientist @ Facebook

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store