homepage :: schedule :: tutorials :: Little Apps :: lessons :: resources :: HELP!!

In the instructions for getting ready for the StatPREP workshop, we asked participants to think about their statistics course and what they would like to change. Here is a summary of the responses.

Question: *What do you teach about handling data? (That is, not about the analysis of data but about how data is stored, accessed, transformed.)*

Given that statistics is sometimes described as “the science of data,” it’s remarkable that ways of storing, accessing, and manipulating/wrangling data are not in the standard curriculum. The participants’ answers reflected this.

- Nothing
- Nothing formal but embedded in class activities
- Missing data, how to make a data sheet
- Sources such as McCabe text, DASL, JMP data sets
- Difficulty of big data.
- Data may not be accurate (entered incorrectly)
- Data tells a story; you need to find that story
- Data ethics
- Copy and paste from web to Excel
- Presenting data graphically
`read.csv()`

&`write.csv()`

**Lesson ideas**:

- Calculating grade-point-averages for departments or professors by joining tables in a registrar’s data base. (
`grades.csv`

,`courses.csv`

,`grade2number.csv`

from MOSAIC site.) - Cleaning inconsistently entered data (e.g. bird species encountered)

Question: *Which topics do you find unsatisfying?*

- Descriptive stats (generally they already know this)
- “rules” for histograms

- Probability:
- Distributions: binomial, uniform, t
- Hypothesis testing
- Writing H_0 and H_a correctly.

- t, z, … table mechanics
- Experimental design.

**Lesson Idea**: Sensitivity and specificity of, e.g., medical screening tests, or any classifier. More generally, do conditional probability by actually counting cases from a data set.

Question: *Which lesson have you been aching to improve?*

- Would love to get to ANOVA and chi-square.
- Comparing distributions
- Inferential statistics
- Sampling distributions of mean and proportion
- Medical tests: sensitivity, specificity, pos. predictivity, …
- Projects, reports, presentations

**Lesson Ideas**:

- Use distribution Little App to look at NHANES variables, which have a variety of distributions. Talk about
*why*the different variables have the distributions they do. - Use sampling, resampling, shuffling

Question: *Suppose you were tasked to develop a version of your statistics course that is only two-thirds as long as your current course. What topics would you drop or consolidate?*

- drop intro material in favor of actual data analysis
- counting, “unhelpful” probability (Bayes Theorem)
- binomial theorem
- normal approx. to \(\hat{p}\) distribution
- consolidate two-sample and multi=sample for quantitative variables.
*We are “doing the same thing” when we construct CI for \(\mu_1\), \(\mu_x - \mu_y\), \(p_1\), \(p_1 - p_2\)* - regression (it gets covered in other courses)
- Detailed explanation of Type I and II errors
- Urn and playing card examples.

**Lesson Idea**: Rather than the “one-way” and “two-way” formulas, ANOVA as a general (and simple) method for studying models, based on the idea of nested models.

Question: *What data sets do you currently use that you would like to give a higher profile? Are there data sets that you’d like to incorporate but haven’t had a chance to do so?*

- GAPMINDER (esp. “Dollar Street”)
- Consumer complaints, e.g. at
`finance.gov`

web site. - CDC data sets
- Datasets for success rates of students in a 4-year college.
- Medical charting
- Kaiser Health (State Health Facts) … “But I puzzle over using state-level data. I would prefer the unit of observation to be people or fish or something easy for students to understand.”
- US Census / American Community Survey
- World Bank data
- Genome data
- Death dates by day of week and day of year.

**Lesson Idea**: - 311 calls for Seattle - Dept. of Education scorecard data - Voter registration data (e.g. Wake County) - Road race data - Weather events - National Cancer Institute NCI-60 data.