314454 : DATA SCIENCE AND BIG DATA ANALYTICS
CREDITS - 04
UNIT – I INTRODUCTION: DATA SCIENCE AND BIG DATA 08 hours
Introduction to Data science and Big Data, Defining Data science and Big Data, Big Data examples, Data explosion, Data volume, Data Velocity, Big data infrastructure and challenges, Big Data Processing Architectures, Data Warehouse, Re-Engineering the Data Warehouse, Shared everything and shared nothing architecture, Big data learning approaches.
UNIT – II MATHEMATICAL FOUNDATION OF BIG DATA 08 Hours
Probability theory, Tail bounds with applications, Markov chains and random walks, Pair wise independence and universal hashing, Approximate counting, Approximate median, The streaming models, Flajolet Martin Distance sampling, Bloom filters, Local search and testing connectivity, Enforce test techniques, Random walks and testing, Boolean functions, BLR test for linearity.
UNIT - III BIG DATA PROCESSING 08 Hours
Big Data technologies, Introduction to Google file system, Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop Shell commands, Anatomy of File Write and Read, NameNode, Secondary NameNode, and DataNode, Hadoop MapReduce paradigm, Map Reduce tasks, Job, Task trackers - Cluster Setup – SSH & Hadoop Configuration, Introduction to: NOSQL, Textual ETL processing.
UNIT – IV BIG DATA ANALYTICS 08 Hours
Data analytics life cycle, Data cleaning , Data transformation, Comparing reporting and analysis, Types of analysis, Analytical approaches, Data analytics using R, Exploring basic features of R, Exploring R GUI, Reading data sets, Manipulating and processing data in R, Functions and packages in R, Performing graphical analysis in R, Integrating R and Hadoop, Hive, Data analytics.
UNIT – V Big Data Visualization 08 Hours
Introduction to Data visualization, Challenges to Big data visualization, Conventional data visualization tools, Techniques for visual data representations, Types of data visualization, Visualizing Big Data, Tools used in data visualization, Propriety Data Visualization tools, Open –source data visualization tools, Analytical techniques used in Big data visualization, Data visualization with Tableau, Introduction to: Pentaho, Flare, Jasper Reports, Dygraphs, Datameer Analytics Solution and Cloudera, Platfora, NodeBox, Gephi, Google Chart API, Flot, D3, and Visually.
UNIT – VI BIG DATA TECHNOLOGIES APPLICATION AND IMPACT 08 Hours
Social media analytics, Text mining, Mogile analytics , Roles and responsibilities of Big data person, Organizational impact, Data analytics life cycle, Data Scientist roles and responsibility, Understanding decision theory, creating big data strategy, big data value creation drivers, Michael Porter’s valuation creation models, Big data user experience ramifications, Identifying big data use cases.
Text Books
1. Krish Krishnan, Data warehousing in the age of Big Data, Elsevier, ISBN: 9780124058910, 1st Edition.
2. DT Editorial Services, Big Data, Black Book, DT Editorial Services, ISBN: 9789351197577, 2016 Edition.
Reference Books
1. Mitzenmacher and Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge University press, ISBN :521835402 hardback.
2. Dana Ron, Algorithmic and Analysis Techniques in Property Testing, School of EE.
3. Graham Cormode, Minos Garofalakis, Peter J. Haas and Chris Jermaine, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches, Foundation and trends in databases, ISBN:10.1561/1900000004.
4. A.Ohri, R for Business Analytics, Springer, ISBN:978-1-4614-4343-8.
5. Alex Holmes, Hadoop in practice, Dreamtech press, ISBN:9781617292224.
6. AmbigaDhiraj, Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Business, Wiely CIO Series.
7. Arvind Sathi, Big Data Analytics: Disruptive Technologies for Changing the Game, IBM Corporation, ISBN:978-1-58347-380-1.
8. EMC Education Services, Data Science and Big Data Analytics- Discovering, analyzing Visualizing and Presenting Data.
9. Li Chen, Zhixun Su, Bo Jiang, Mathematical Problems in Data Science, Springer, ISBN :978-3-319-25127-1.
10. Philip Kromer and Russell Jurney, Big Data for chips, O’Reilly, ISBN :9789352132447.
11. EMC Education services, Data Science and Big Data Analytics, EMC2 Wiley, ISBN :978812655653-3.
12. Mueller Massaron, Python for Data science, Wiley, ISBN :9788126557394.
13. EMC Education Services, Data Science and Big Data Analytics, Wiley India, ISBN:9788126556533
14. Benoy Antony, Konstantin Boudnik, Cheryl Adams,,Professional Hadoop, Wiley India, ISBN :9788126563029
15. Mark Gardener, Beginning R: The Statistical Programming Language ,Wiley India, ISBN:9788126541201
16. Mark Gardener, The Essential R Reference ,Wiley India, ISBN : 9788126546015
17. Judith Hurwitz, Alan Nugent, Big Data For Dummies, Wiley India, ISBN : 9788126543281