Table of Contents
Dedication; Preface; Motivation; Origins of the Class; Origins of the Book; What to Expect from This Book; How This Book Is Organized; How to Read This Book; How Code Is Used in This Book; Who This Book Is For; Prerequisites; Supplemental Reading; About the Contributors; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments; Chapter 1: Introduction: What Is Data Science?; 1.1 Big Data and Data Science Hype; 1.2 Getting Past the Hype; 1.3 Why Now?; 1.4 The Current Landscape (with a Little History); 1.5 A Data Science Profile; 1.6 Thought Experiment: Meta-Definition; 1.7 OK, So What Is a Data Scientist, Really?; Chapter 2: Statistical Inference, Exploratory Data Analysis, and the Data Science Process; 2.1 Statistical Thinking in the Age of Big Data; 2.2 Exploratory Data Analysis; 2.3 The Data Science Process; 2.4 Thought Experiment: How Would You Simulate Chaos?; 2.5 Case Study: RealDirect; Chapter 3: Algorithms; 3.1 Machine Learning Algorithms; 3.2 Three Basic Algorithms; 3.3 Exercise: Basic Machine Learning Algorithms; 3.4 Summing It All Up; 3.5 Thought Experiment: Automated Statistician; Chapter 4: Spam Filters, Naive Bayes, and Wrangling; 4.1 Thought Experiment: Learning by Example; 4.2 Naive Bayes; 4.3 Fancy It Up: Laplace Smoothing; 4.4 Comparing Naive Bayes to k-NN; 4.5 Sample Code in bash; 4.6 Scraping the Web: APIs and Other Tools; 4.7 Jake’s Exercise: Naive Bayes for Article Classification; Chapter 5: Logistic Regression; 5.1 Thought Experiments; 5.2 Classifiers; 5.3 M6D Logistic Regression Case Study; 5.4 Media 6 Degrees Exercise; Chapter 6: Time Stamps and Financial Modeling; 6.1 Kyle Teague and GetGlue; 6.2 Timestamps; 6.3 Cathy O’Neil; 6.4 Thought Experiment; 6.5 Financial Modeling; 6.6 Exercise: GetGlue and Timestamped Event Data; Chapter 7: Extracting Meaning from Data; 7.1 William Cukierski; 7.2 The Kaggle Model; 7.3 Thought Experiment: What Are the Ethical Implications of a Robo-Grader?; 7.4 Feature Selection; 7.5 David Huffaker: Google’s Hybrid Approach to Social Research; Chapter 8: Recommendation Engines: Building a User-Facing Data Product at Scale; 8.1 A Real-World Recommendation Engine; 8.2 Thought Experiment: Filter Bubbles; 8.3 Exercise: Build Your Own Recommendation System; Chapter 9: Data Visualization and Fraud Detection; 9.1 Data Visualization History; 9.2 What Is Data Science, Redux?; 9.3 A Sample of Data Visualization Projects; 9.4 Mark’s Data Visualization Projects; 9.5 Data Science and Risk; 9.6 Data Visualization at Square; 9.7 Ian’s Thought Experiment; 9.8 Data Visualization for the Rest of Us; Chapter 10: Social Networks and Data Journalism; 10.1 Social Network Analysis at Morning Analytics; 10.2 Social Network Analysis; 10.3 Terminology from Social Networks; 10.4 Thought Experiment; 10.5 Morningside Analytics; 10.6 More Background on Social Network Analysis from a Statistical Point of View; 10.7 Data Journalism; Chapter 11: Causality; 11.1 Correlation Doesn’t Imply Causation; 11.2 OK Cupid’s Attempt; 11.3 The Gold Standard: Randomized Clinical Trials; 11.4 A/B Tests; 11.5 Second Best: Observational Studies; 11.6 Three Pieces of Advice; Chapter 12: Epidemiology; 12.1 Madigan’s Background; 12.2 Thought Experiment; 12.3 Modern Academic Statistics; 12.4 Medical Literature and Observational Studies; 12.5 Stratification Does Not Solve the Confounder Problem; 12.6 Is There a Better Way?; 12.7 Research Experiment (Observational Medical Outcomes Partnership); 12.8 Closing Thought Experiment; Chapter 13: Lessons Learned from Data Competitions: Data Leakage and Model Evaluation; 13.1 Claudia’s Data Scientist Profile; 13.2 Data Mining Competitions; 13.3 How to Be a Good Modeler; 13.4 Data Leakage; 13.5 How to Avoid Leakage; 13.6 Evaluating Models; 13.7 Choosing an Algorithm; 13.8 A Final Example; 13.9 Parting Thoughts; Chapter 14: Data Engineering: MapReduce, Pregel, and Hadoop; 14.1 About David Crawshaw; 14.2 Thought Experiment; 14.3 MapReduce; 14.4 Word Frequency Problem; 14.5 Other Examples of MapReduce; 14.6 Pregel; 14.7 About Josh Wills; 14.8 Thought Experiment; 14.9 On Being a Data Scientist; 14.10 Economic Interlude: Hadoop; 14.11 Back to Josh: Workflow; 14.12 So How to Get Started with Hadoop?; Chapter 15: The Students Speak; 15.1 Process Thinking; 15.2 Naive No Longer; 15.3 Helping Hands; 15.4 Your Mileage May Vary; 15.5 Bridging Tunnels; 15.6 Some of Our Work; Chapter 16: Next-Generation Data Scientists, Hubris, and Ethics; 16.1 What Just Happened?; 16.2 What Is Data Science (Again)?; 16.3 What Are Next-Gen Data Scientists?; 16.4 Being an Ethical Data Scientist; 16.5 Career Advice; Index; Colophon;
Show More