StatPREP Workshops for 2018

homepage :: schedule :: tutorials :: Little Apps :: lessons :: resources :: HELP!!

StatPREP Pre-Workshop Homework

In the instructions for getting ready for the StatPREP workshop, we asked participants to think about their statistics course and what they would like to change. Here is a summary of the responses.

Data Handling

Question: What do you teach about handling data? (That is, not about the analysis of data but about how data is stored, accessed, transformed.)

Given that statistics is sometimes described as “the science of data,” it’s remarkable that ways of storing, accessing, and manipulating/wrangling data are not in the standard curriculum. The participants’ answers reflected this.

  • Nothing
  • Nothing formal but embedded in class activities
  • Missing data, how to make a data sheet
  • Sources such as McCabe text, DASL, JMP data sets
  • Difficulty of big data.
  • Data may not be accurate (entered incorrectly)
  • Data tells a story; you need to find that story
  • Data ethics
  • Copy and paste from web to Excel
  • Presenting data graphically
  • read.csv() & write.csv()

Lesson ideas:

  • Calculating grade-point-averages for departments or professors by joining tables in a registrar’s data base. (grades.csv, courses.csv, grade2number.csv from MOSAIC site.)
  • Cleaning inconsistently entered data (e.g. bird species encountered)

Unsatisfying topics

Question: Which topics do you find unsatisfying?

  • Descriptive stats (generally they already know this)
    • “rules” for histograms
  • Probability:
  • Distributions: binomial, uniform, t
  • Hypothesis testing
    • Writing H_0 and H_a correctly.
  • t, z, … table mechanics
  • Experimental design.

Lesson Idea: Sensitivity and specificity of, e.g., medical screening tests, or any classifier. More generally, do conditional probability by actually counting cases from a data set.

Lessons to improve

Question: Which lesson have you been aching to improve?

  • Would love to get to ANOVA and chi-square.
  • Comparing distributions
  • Inferential statistics
  • Sampling distributions of mean and proportion
  • Medical tests: sensitivity, specificity, pos. predictivity, …
  • Projects, reports, presentations

Lesson Ideas:

  • Use distribution Little App to look at NHANES variables, which have a variety of distributions. Talk about why the different variables have the distributions they do.
  • Use sampling, resampling, shuffling

Topics to remove

Question: Suppose you were tasked to develop a version of your statistics course that is only two-thirds as long as your current course. What topics would you drop or consolidate?

  • drop intro material in favor of actual data analysis
  • counting, “unhelpful” probability (Bayes Theorem)
  • binomial theorem
  • normal approx. to \(\hat{p}\) distribution
  • consolidate two-sample and multi=sample for quantitative variables. We are “doing the same thing” when we construct CI for \(\mu_1\), \(\mu_x - \mu_y\), \(p_1\), \(p_1 - p_2\)
  • regression (it gets covered in other courses)
  • Detailed explanation of Type I and II errors
  • Urn and playing card examples.

Lesson Idea: Rather than the “one-way” and “two-way” formulas, ANOVA as a general (and simple) method for studying models, based on the idea of nested models.

Data examples

Question: What data sets do you currently use that you would like to give a higher profile? Are there data sets that you’d like to incorporate but haven’t had a chance to do so?

  • GAPMINDER (esp. “Dollar Street”)
  • Consumer complaints, e.g. at web site.
  • CDC data sets
  • Datasets for success rates of students in a 4-year college.
  • Medical charting
  • Kaiser Health (State Health Facts) … “But I puzzle over using state-level data. I would prefer the unit of observation to be people or fish or something easy for students to understand.”
  • US Census / American Community Survey
  • World Bank data
  • Genome data
  • Death dates by day of week and day of year.

Lesson Idea: - 311 calls for Seattle - Dept. of Education scorecard data - Voter registration data (e.g. Wake County) - Road race data - Weather events - National Cancer Institute NCI-60 data.