Data Science / Machine Learning
Duration: 40-50 Hours
Prerequisites
- Basic knowledge of any programming Language.
- Basic knowledge of Database (SQL) and files (MS Excel, CSV etc.â?¦)
- Basic high school Algebra and Geometry
Course Content
- Fundamentals of Data Science and Machine Learning
- Introduction to Data Science
- What is Data Analytics?
- Data, Data Types
- Data Distribution
- Need of Business Analytics
- BigData and Data Science
- Descriptive Analytics
- Predictive Analytics
- Prescriptive Analytics
- Data Science Life Cycle,
- Different tools available for Data Science
- Introduction to Statistics
- Types of data
- Measures of central tendency and dispersion
- Statistical Graphics
- Probability and Probability Distributions
- Probability Theory
- Binomial Distribution
- Poisson Distribution
- Normal Distribution
- R Programming Basics
- Introduction to R
- How to install and usages of R
- Operators in R
- Data Types
- Conditional Statements
- Control structure
- Loops in R
- Functions in R
- Data Structure In R Programming
- Data Exploration
- Data Harmonization
- Creating Objects in R
- Data Structure in R
- Factors and Lists
- Import Functions in R
- Import an Excel File
- How to Import Minitab File?
- Importing Data from Database
- Table and CSV Files
- Reading data, Subsetting Data
- Visualizing the Data
- Input Output Sub setting
Data Science with Python
- An Introduction to Python
- Why Python , its Unique Feature and where to use it?
- Python environment Setup
- Discuss about IDEâ??s like IDLE, Pycharm and Enthought Canopy
- Start programming on interactive shell.
- Python Identifiers, Keywords
- Discussion about installed module s and packages
- Access Command line arguments within programs
- Installing Anaconda
- Understanding the Spyder (IDE)
- Conditional Statement, Loops and File Handling
- Python Data Types and Variable
- Condition and Loops in Python
- Decorators
- Python Modules & Packages
- Python Files and Directories manipulations
- Use various files and directory functions for OS operations
- Python Core Objects and Functions
- Built in modules (Library Functions)
- Numeric and Mathâ??s Module
- String/List/Dictionaries/Tuple
- Complex Data structures in Python
- Arbitrary data types and their Data Structure
- Python built in function
- Python user defined functions
- Python packages and functions
- The anonymous Functions - Lambda Functions
- Object Oriented Python
- OOPs Concepts
- Object , Classes and Destroying Objects
- Accessing attributes,Built-In Class Attributes
- Inheritance and Polymorphism
- Overriding Methods,Data Hiding
- Overloading Operators
- Introduction to NumPy
- Understanding Data Types in Python
- The Basics of NumPy Arrays
- Computation on NumPyArrays
- Aggregations â?? Min Max and Everything In Between
- Introducing Broadcasting
- Rules of Broadcasting
- Broadcasting in practice
- Sorting Arrays
- Structure Data
- Data Manupulation with Pandas
- Installing and Using Pandas
- Introducing Pandas Objects
- Data Indexing and Selection
- Operation on Data in Pandas
- Handling missing Data
- Hierarchical Indexing
- Descriptive & Inferential Statistics
- Estimation Theory
- Sampling Distribution
- Point Estimation
- Interval Estimation
- Test of Hypothesis
- Inference about one population means
- Inference about two populations means
- Analysis of Variance Concept
- Inference about one & two population (Means & Proportion)
- Analysis of Variance ( 1 Way & 2 Way)
- Machine Learning: Supervised Algorithms Classification
- Introduction to Machine Learning
- Naïve Bays Algorithm
- K-Nearest Neighbor Algorithm
- Decision Tress (SingleTree)
- Support Vector Machines
- Model Ensembling
- Bagging
- Random Forest
- Boosting
- Gradient Boosted Trees
- Stacking
- Stacking Classifications
- Optimization Algorithm
- Gradient Descent/Ascent
- Stochastic Gradient Descent
- Grid Search & Random Search
- Cross Validation and Model performance
- F1 & KS statistics
- ROC, AUC etc...
- Machine Learning: Regression
- Simple Linear Regression
- Multiple Linear Regression
- Count Regression
- Logistic Regression
- Decision Tree and Random Forest Regression
- Machine Learning: Unsupervised Learning Algorithms
- Similarity Measures
- Principal Components Analysis
- Cluster Analysis and Similarity Measures
- Hierarchical Clustering
- K-means Clustering
- Soft Clustering
- Clustering using mix Data Type â?? Prototype clustering
- Association Rules Mining & Market Basket Analysis
- Neural Network
- Text Mining
- Term Document Matrix
- TF-IDF
- Word Cloud
- Time Series Analysis
- Moving Average, Simple Exponential Smoothening
- Holt-Winter's Method
- ARIMA Models
- Data Visualization and Web Scraping
- Data Visualization and Matplotlib
- Python Libraries
- Features of Matplotlib
- Line Properties Plot with (x, y)
- Set Axis, Labels, and Legend Properties
- Alpha and Annotation
- Multiple Plots and SubPlots
- Python Web Scraping and Data Science
- The Parser
- Searching & Modifying the Tree
- Printing, Formatting, Encoding
- Visualization with Matplotlib
- General Matplotlib
- Importing matplotlib
- Setting Styles
- show() or No show()? How to Display Your Plots
- Simple Line Plots
- Simple Scatter Plots
- Visualizing Errors
- Density and Contour Plots
- Visualizing a Three-Dimensional Function
- Histograms, Binnings, and Density
Projects Covered
- Financial Analytics
- Logistics Analytics
- Text Analytics
Take Away
- Comprehensive Case Studies for each Topic.
- Various Approaches to Solve Data Science Problem.
- Pros and Cons of Various Algorithms and approaches.