Course Overview
Data scientists use algorithms and frameworks which enables computers to solve problems that are classified on a higher complexity level than traditional algorithms. Probably the most successful frameworks, gaining high acceptance both in academy and by major businesses is the Python open source language. First released in 1989, Python is a fast, object oriented, portable, scientific, enterprise, back-end and front end application development framework. Focusing on readability and fast deployment, it is the ideal tool for the modern data scientist.
In this course we will introduce the main building blocks of the language, relevant for the data scientist; its most important libraries such as NumPy, Pandas and Scikit-learn, as well as its newest additions around data presentation and parallelism.
We will review various use cases and implement mini-labs in Python.
Prerequisite:
One to two years programming skills in any other languages, and the introduction to machine learning basic course.
Expectations and Goals
Understand the different tools available for the data scientist in Python, best practices and design patterns.
Course Outline:
Day 1:
1. Introduction to Python
• Development environment
• Basic constructs, functions, scopes, classes and objects, main collections
• NumPy and Pandas
• Developing machine learning algorithms in Python
• Validation in Python
• Time series analysis using Python.
2. Scikit-learn library and tools
• Preprocessing
• Correlation, feature selection and reduction
• Model selection
• Linear models
• Basic trees
Day 2:
3. Algorithms and Estimators
• Clustering and classification
• Trees and SVM
• Validation strategies
• Plotting results
4. Advanced Topics
• ANN and Deep learning
• Parallel distribution
• Cloud services
• Lab presentation – recommendation system