Core Curriculum

FOUNDATIONS

  • The social science of measurement
  • Formulating research questions
  • Basics of program evaluation
  • Differentiating data sources
  • "Big Data" - definitions, technical issues
  • Quality frameworks and varying needs
  • Introduction to the data that will be used in this class
  • Case studies
  • Introduction to Python
  • Working with Jupyter Notebooks
  • Web scraping exercises
  • Exploring your data visually

 

DATA CURATION

  • Introduction to APIs
  • Database concepts
  • Database taxonomies
  • Introduction to characteristics of large databases
  • Building a data schema
  • ETL in different databases
  • Building datasets to be linked
  • Linkage in the context of big data
  • Example of record linkage with MapReduce
  • Create a big data work flow
  • Data hygiene: curation and documentation
  • Working in the Data Hub

 

DATA ANALYSIS

  • What is machine learning
  • Examples, process and methods
  • Fundamentals of network analysis
  • Directed and undirected graphs
  • Relational analysis on graphs
  • Value of text data
  • Different text analytics paradigms
  • Discovering topics and themes in large quantities of text data
  • The importance of geographic information
  • Basics in spatial data analysis
  • Geographic information systems
  • Mapping your data

 

PRESENTATION, INFERENCE AND ETHICS

  • Using graphics packages for data visualization, including network geolocation and GIS software to display shape files
  • Review of total survey error
  • Error sources specific to found (big) data
  • Examples of big data analysis and erroneous inferences caused by ignoring data errors
  • Inference in the big data context
  • Methods to correct for data errors
  • Big data and privacy
  • Legal framework
  • Statistical framework
  • Disclosure control techniques
  • Ethical issues
  • Practical approaches