410243: Data Analytics
Credit 03
Unit I Introduction and Life Cycle 08 Hours
Introduction: Big data overview, state of the practice in Analytics- BI Vs Data Science, Current Analytical Architecture, drivers of Big Data, Emerging Big Data Ecosystem and new approach. Data Analytic Life Cycle: Overview, phase 1- Discovery, Phase 2- Data preparation, Phase 3- Model Planning, Phase 4- Model Building, Phase 5- Communicate Results, Phase 6- Opearationalize. Case Study: GINA
Unit II Basic Data Analytic Methods 08 Hours
Statistical Methods for Evaluation- Hypothesis testing, difference of means, wilcoxon rank–sum test, type 1 type 2 errors, power and sample size, ANNOVA. Advanced Analytical Theory and Methods: Clustering- Overview, K means- Use cases, Overview of methods, determining number of clusters, diagnostics, reasons to choose and cautions.
Unit III Association Rules and Regression 08 Hours
Advanced Analytical Theory and Methods: Association Rules- Overview, a-priori algorithm, evaluation of candidate rules, case study-transactions in grocery store, validation and testing, diagnostics. Regression- linear, logistics, reasons to choose and cautions, additional regression models.
Unit IV Classification 08 Hours
Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Naïve Bayes – Bayes? Algorithm, Naïve Bayes? Classifier, smoothing, diagnostics. Diagnostics of classifiers, additional classification methods.
Unit V Big Data Visualization 08 Hours
Introduction to Data visualization, Challenges to Big data visualization, Conventional data visualization tools, Techniques for visual data representations, Types of data visualization, Visualizing Big Data, Tools used in data visualization, Analytical techniques used in Big data visualization.
Unit VI Advanced Analytics-Technology and Tools 08 Hours
Analytics for unstructured data- Use cases, Map Reduce, Apache Hadoop. The Hadoop Ecosystem- Pig, HIVE, HBase, Mahout, NoSQL. An Analytics Project-Communicating, operationalizing, creating final deliverables.
Books:
Text:
1. David Dietrich, Barry Hiller, “Data Science and Big Data Analytics”, EMC education services, Wiley publications, 2012, ISBN0-07-120413-X
2. Ashutosh Nandeshwar , “Tableau Data Visualization Codebook”, Packt Publishing, ISBN 978-1-84968-978-6
References:
1. Maheshwari Anil, Rakshit, Acharya, “Data Analytics”, McGraw Hill, ISBN: 789353160258.
2. Mark Gardner, “Beginning R: The Statistical Programming Language”, Wrox Publication, ISBN: 978-1-118-16430-3
3. Luís Torgo, “Data Mining with R, Learning with Case Studies”, CRC Press, Talay and Francis Group, ISBN9781482234893
4. Carlo Vercellis, “Business Intelligence - Data Mining and Optimization for Decision Making”, Wiley Publications, ISBN: 9780470753866.