# Data Science for Beginners

#### Data Science for Beginners

##### Introduction to Data Science
• What is data science?
• Importance of Data Science in real world
• Difference between Data science and Reporting?
• Who is data scientist?
• Pre-requisite for data science ##### R programming
1. Introduction to R
2. Installation of R
3. Data Types in R
1. Vector
2. Matrices
3. Data Frames
4. Lists
5. Arrays
6. Factors
7. String
4. Operators
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
5. Variables
1. Variable creation and rules
2. Data Type of a Variable
3. Variable Assignment
4. Finding Variables
5. Deleting Variables
6. Conditional statement
1. If statement
2. If else statement
3. Switch statement
7. Loops
1. For loop
2. While loop
3. Repeat loop
8. Function
1. Function Definition
2. Function Components
3. Built-in Function
4. User-defined Function
5. Calling a Function
6. Calling a Function without an Argument
7. Calling a Function with Argument
8. Calling a Function with Default Argument
9. Lazy Evaluation of Function
9. Data Frame Creation and Manipulation
1. Data frame creation using various data types
3. update
4. Delete
5. Retrieval
6. Aggregation using dplyr
10. Join/ Merge Data Frame
1. Inner join
2. Left outer join
3. Right outer join
4. Outer join
11. Packages
1. What is package?
2. Installation of packages
3. Uninstall of packages
12. Date Time data manipulation
1. Date conversion
2. Day, week, month, year extraction
13. Visualization
1. Pie chart
2. Bar plot
3. Histogram
4. Scatter plot
14. Reading data from file and Data base
1. CSV file
2. Text file
3. XML file
4. SQL Server
5. Oracle Server
15. Practice Example
##### Machine Learning
1. Introduction
2. Importance of machine learning
3. Real time use
4. Future
5. Supervised and Unsupervised Learning
1. What is supervised with example?
2. What is unsupervised with example?
6. Classification and Regression
1. Classification
2. Example of classification
3. Regression
4. Example of regression
7. Supervisedlearning Algorithms
1. Linear Regression
• Understanding linear Regression
• Assumptions in Linear regression
• Model development andinterpretation
• Model validation
• Measure of error
• R Square value and interpretation of R Square and adjusted R square
• Model optimization
• Prediction on unseen data
• Case study using R
8. Decision Tree
1. Introduction
2. Types of Decision tree (C5.0, CART)
3. Step by step understanding of decision tree
4. What is node?
5. Splitting of nodes
6. Information Gain Theory and Gini Index
7. Model creation
8. Model validation using test data
9. Error measure
10. Overfitting check
11. Pruning tree
12. Case study using R
9. Random Forest
1. Introduction to random forest
2. Importance
3. Understanding algorithm
4. Advantage over decision tree
5. Model development
6. Modelvalidation
7. Model optimization
8. Prediction on unseen data
9. Finding best features
10. Case study using R
10. K Nearest Neighbors (KNN)
1. Introduction
2. Distance measure
3. Understanding algorithm
4. Model development
5. Model validation
6. Model optimization
7. Finding best K
8. Example using R
11. Unsupervised Learning algorithm (Cluster analysis)
1. What is cluster?
2. Importance of cluster
3. Different types of clusters
• K-means clustering
##### Project
• Retails Weekly sales prediction
• Finding Telecom Churned customer
• Finding Churned Employee
• Retail Customer segmentation