Probability & Probability Distributions

MPH/MHSR 7103: Please find the slides for my class in:
1) Probability,
2) Probability distributions (including binomial, poission & normal)
3) Probability distributions (including students t, Chi & F) (Updated: Sept 12 2011)

Please also find additional FREE statistics textbook resources for your classes at the this link


Here attached is the assignment: Due Date=> 26th September 2011. Please email assignment to my email address distributed in class.


Non-Parametric Statistics

Welcome Biostat 2 8202 class:

1) Please find class notes for todays class here
2) Please an example non-parametric stata analysis do-file here
2) Please find assignment for todays class here

More resources coming up under this category..


Fisher Exact Test

Here are slides for the Fisher Exact Test Class (Click here).

Go to “Read More” for stata stuff...

The Kruskal-Wallis Test

My Kruskal-Wallis Test slide notes for the MPH 2 class are right here (click)

Also, find here attached a do file showing how you can run kuskal-wallis in stata (click)

There is an exercise on the last page of the slide notes

You can come back here for additional resources & links to the non-parametric Kruskal-wallis test otherwise go to “Read More” below..


Does Anything Happen at Random?

A brilliant talk by Persi Diaconis - “Does Anything Happen at Random” - Is this random ? Card shuffling, coins and their friends …


Age-Adjusted Rates...

I am supervising a student that is looking at the trend of different cancers in Uganda (at least as reported at the main referral hospital - Mulago).

Here is a link to a site that I thought that would help him understand the rationale (& method) of age-adjusting rates as done in the Big Apple...(Courtesy of Health State Dept of NY)


Why do we do age-adjustment?

Almost all diseases or health outcomes occur at different rates in different age groups. Most chronic diseases, including most cancers, occur more often among older people. Other outcomes, such as many types of injuries, occur more often among younger people. The age distribution determines what the most common health problems in a community will be. One way of examining the pattern of health outcomes in communities of different sizes is to calculate an incidence or mortality rate, which is the number of new cases or deaths divided by the size of the population. In chronic diseases and injuries, rates are usually expressed in terms of the number of cases/deaths per 100,000 people per year.

A community made up of more families with young children will have a higher rate of bicycle injuries than a community with fewer young children. A community with more older individuals will have higher rates of cancer than one with younger individuals. This is true even if the individuals in the two communities have the same risk of developing cancer or being injured. Epidemiologists refer to this as confounding. Confounding happens when the measurement of the association between the exposure and the disease is mixed up with the effects of some extraneous factor (a confounding variable). See the attached example of age confounding. Age-adjustment is a statistical way to remove confounding caused by age.

How is age-adjustment done?

Age confounding occurs when the two populations being compared have different age distributions and the risk of the disease or outcome varies across the age groups. The process of age-adjustment by the direct method changes the amount that each age group contributes to the overall rate in each community, so that the overall rates are based on the same age structure. Rates that are based on the same age distribution can be compared to each other without the presence of confounding by age. Adjustment is accomplished by first multiplying the age-specific rates of disease by age-specific weights. The weights used in the age-adjustment of cancer data are the proportion of the 1970 US population within each age group. The weighted rates are then summed across the age groups to give the age-adjusted rate. Age-adjustment is demonstrated here using the cancer mortality rates for all sites of cancer among men in New York State in 1994. Age confounding is demonstrated using the prostate cancer mortality rates among white and black men in 1994. Read More...

Statistical Concepts of LQAS

Here are my slides for the LQAS session that I will be giving as orientation of the LQAS methodology to the Uganda MSH-STRIDES project group. The group is involved in helping districts improve delivery of family planning services..... Read More...

Experts vs Regressions

In the book “Super Crunchers” Yale economist Ian Ayres notes the predictive superiority of analytics over experts in many disciplines =>

Unlike self-involved experts, statistical regressions don’t have egos or feelings

ref=sib_dp_pt

Tricks in read.table()

  • Many people do not realize the possibility of converting the data types of columns in read.table() and always use such specific post hoc conversion:
    soup = read.table("",
  • TRUE)
  • soup$taster = factor(soup$taster)
  • soup$batch = factor(soup$batch)
  • soup$recipe = factor(soup$recipe)
  • soup$tasteorder = factor(soup$tasteorder)
    But in fact, we can specify the types of columns while reading data:
    ## we know the first 4 are factors and the last one is numeric:
  • soup = read.table("",
  • TRUE, colClasses = c(rep("factor", 4), "numeric"))
  • ## conversion already done!
  • > str(soup)
  • 'data.frame': 72 obs. of 5 variables:
  • $ recipe : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 2 2 2 2 ...
  • $ batch : Factor w/ 12 levels "1","10","11",..: 1 1 1 1 1 1 5 5 5 5 ...
  • $ taster : Factor w/ 24 levels "1","10","11",..: 1 12 18 19 20 21 1 12 18 19 ...
  • $ tasteorder: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3 2 3 1 3 ...
  • $ y : num 3 5 6 4 4 3 6 9 6 7 ...

  • There are other tips in read.table() but I find this one the most useful. Check the 22 arguments in ?read.table if you want to know more magic (e.g. how to specify the first column in the data file as the row names).

Only God does not need data...

In God we trust, all others bring data” - William Edwards Deming (1900-1993)

NOTE: Apparently, (according to Hastie & Tibisirani)
, this quote has been widely attributed to both Deming and Robert W.Hayden; however Prof. Hayden, claims no credit for this quote,
and, ironically, there is no data confirming that Deming actually said this...