Savitribai Phule Pune University, Maharashtra (SPPU), Computer Engineering Semester 7, Data Mining and Warehousing Syllabus

Data Mining and Warehousing Lecture notes | Videos | Free pdf Download | Previous years solved question papers | MCQs | Question Banks| Syllabus
Get access to 100s of MCQs, Question banks, notes and videos as per your syllabus.
Try Now for free

Elective I

410244(D): Data Mining and Warehousing

Credit 03

Unit I Introduction 08 Hours
Data Mining, Data Mining Task Primitives, Data: Data, Information and Knowledge; Attribute Types: Nominal, Binary, Ordinal and Numeric attributes, Discrete versus Continuous Attributes; Introduction to Data Preprocessing, Data Cleaning: Missing values, Noisy data; Data integration: Correlation analysis; transformation: Min-max normalization, z-score normalization and decimal scaling; data reduction: Data Cube Aggregation, Attribute Subset Selection, sampling; and Data Discretization: Binning, Histogram Analysis

Unit II Data Warehouse 08 Hours
Data Warehouse, Operational Database Systems and Data Warehouses(OLTP Vs OLAP), A Multidimensional Data Model: Data Cubes, Stars, Snowflakes, and Fact Constellations Schemas; OLAP Operations in the Multidimensional Data Model, Concept Hierarchies, Data Warehouse Architecture, The Process of Data Warehouse Design, A three-tier data warehousing architecture, Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP.

Unit III Measuring Data Similarity and Dissimilarity 08 Hours

Measuring Data Similarity and Dissimilarity, Proximity Measures for Nominal Attributes and Binary Attributes, interval scaled; Dissimilarity of Numeric Data: Minskowski Distance, Euclidean distance and Manhattan distance; Proximity Measures for Categorical, Ordinal Attributes, Ratio scaled variables; Dissimilarity for Attributes of Mixed Types, Cosine Similarity.

Unit IV Association Rules Mining 08 Hours
Market basket Analysis, Frequent item set, Closed item set, Association Rules, a-priori Algorithm, Generating Association Rules from Frequent Item sets, Improving the Efficiency of a-priori, Mining Frequent Item sets without Candidate Generation: FP Growth Algorithm; Mining Various Kinds of Association Rules: Mining multilevel association rules, constraint based association rule mining, Meta rule-Guided Mining of Association Rules.

Unit V Classification 08 Hours
Introduction to: Classification and Regression for Predictive Analysis, Decision Tree Induction, Rule-Based Classification: using IF-THEN Rules for Classification, Rule Induction Using a Sequential Covering Algorithm. Bayesian Belief Networks, Training Bayesian Belief Networks, Classification Using Frequent Patterns, Associative Classification, Lazy Learners-k-Nearest- Neighbor Classifiers, Case-Based Reasoning.

Unit VI Multiclass Classification 08 Hours
Multiclass Classification, Semi-Supervised Classification, Reinforcement learning, Systematic Learning, Wholistic learning and multi-perspective learning. Metrics for Evaluating Classifier Performance: Accuracy, Error Rate, precision, Recall, Sensitivity, Specificity; Evaluating the Accuracy of a Classifier: Holdout Method, Random Sub sampling and Cross-Validation.

Books:
Text:

1. Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and Techniques”, Elsevier Publishers, ISBN:9780123814791, 9780123814807.
2. Parag Kulkarni, “Reinforcement and Systemic Machine Learning for Decision Making” by Wiley-IEEE Press, ISBN: 978-0-470-91999-6

References:
1. Matthew A. Russell, "Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More" , Shroff Publishers, 2nd Edition, ISBN: 9780596006068
2. Maksim Tsvetovat, Alexander Kouznetsov, "Social Network Analysis for Startups:Finding connections on the social web", Shroff Publishers , ISBN: 10: 1449306462

Share  
Link Copied
More than 1 Million students use Goseeko! Join them to feel the power of smart learning.
Spot anything incorrect? Contact us