Here are some key skills you should hone to become a good analyst:
Below are some tips for you to improve/optimize performance of your Pandas operations. These tips are helpful when you have lot of data or lot of processing or both. I have linked to some source articles where you can learn more about each topic.
For memory optimization
GIT is a version controlling system used in software engineering. For Data Scientists working in teams, I recommend using it for especially for your python code. Though you can check-in Jupiter notebooks, GIT is not very helpful for collaborating with Jupyter notebooks — in fact, I am not aware of any better collaborative environments for Jupyter notebooks. Please leave a comment if something works for you.
With GIT, it is easy to find common commands to perform certain actions. But when mistakes happen (mistakes do happen), it is hard to find how to undo what you just did. This is page serves as a reference on how to undo common tasks on GIT.
This is part 2 of a 3 part series to learn Scala. We will learn what is Functional programming and introduce you to a few functional programming concepts
So you know of functions in a programming language. But what is a functional programming language? …
This is a very practical course on A/B testing by Udacity & Google and I have recommended this to many folks interested in A/B testing. Here are my original notes taken from this course. Course has 5 sections:
As a SQL expert, selecting, filtering, aggregating and even advancedOLAP operations come naturally. However it is hard to wrap our minds the Pandas way of accomplish the very same tasks.
This is learning by replicating exercise. I have picked 3 SQL examples and converted them into Pandas.
Here is the toy data I used for this exercise:
and here is the code to create this toy data:
import pandas as pdMarks_data=[ [101,80,99,100,'A'],[102,87,76,79,'B'],[103,80,80,81,'B'],[104,65,60,70,'C']]Marks=pd.DataFrame(data=Marks_data, columns=['StudentID','Mark1','Mark2','Mark3','FinalGrade'])Student_data=[[101,1],[102,1],[103,2],[104,1]]Student=pd.DataFrame(data=Student_data, columns=['StudentID','Class'])