Propensity scores

For SAS code, see next section. And before you read further: no, propensity scores do not solve causal inference problems, but they are helpful for diagnosing lack of overlap issues and for clarifying what it is one gets when using regression adjustment. For causal inference, propensity scores are as useful as regression adjustment: you still need ignorability, conditional independence, or selection on observables -- or at least mean independence. Regression adjustment is the oldest causal inference method we have — as in adjusting for all confounders. See this short paper about propensity scores vs regression adjustment. Propensity scores and matching estimators do help dealing with lack of overlap.

Stata has a command, teffects, that estimates propensity score type estimators. See my note on  regression adjustment with teffects.

Lecture notes

Lecture (intro) to lack of overlap problems and the propensity score. (Includes code using teffects command.) For more, go to class notes.

More advanced lecture notes on matching estimators, IPW: Lecture nores

Lecture notes (matching estimators, Malahanobis, propensity scores, IPW)
Code

Older notes and code from lectures at the University of Chicago (still worth keeping but the above notes are more up-to-date).

Slides, part I - Slides, part II
Stata code to reproduce examples
Data
matching.do: A basic 1 to 1 matching in Stata. I just wanted students to do a simple matching without any "black box" code that would do the matching for them. Use other Stata teffects nnmatch for more efficient methods.

Matching with SAS

Note: I wrote the code below a long time ago. I barely use SAS these days. Below are notes based on questions I received over the years. Happy to help if you have problems, but I may not remember much. Google Scholar tells me the code has become popular based on citations here and here.

Matching is straightforward in most statistical packages. Both R and Stata have many tools to do many flavors of matching (Stata 13 added more). In SAS, simple matching is complicated because the data are not loaded into memory. Not loading data into memory allows SAS to work with very large datasets; datasets that couldn't be loaded into memory because they are too large. SAS reads data line by line and can work with more than one dataset at the same time. When writing algorithms for matching, reading line by line is problematic because matching requires iterating over a vector of data many times. This is a trivial indexing problem that is not so trivial if all data elements cannot be easily accessed.

One solution is to use pointers in the data step. The pointer syntax, however, is messy and not very flexible. My solution was to use "hashes." SAS hashes are a way to create data vectors that can be easily indexed (here is an intro to SAS hashes). You don't need to store all the dataset in a hash; just two variables: an observation id and the propensity score or its logit. The price you pay is that the hash syntax is unlike that of SAS (it seems to have been borrowed from C). In any case, if you're a SAS user, inconsistent syntax is business as usual. The advantages of hashes are that you can program matching algorithms that are straightforward and easy to read and, most importantly, easy to understand.

Resources
PSmatching.sas macro based on my code written by William Thomas at the University of Minnesota.
Steven Utke implemented a matching with replacement macro
NESUG 2006 paper
Global Forum 2007 paper -- this paper includes code for global optimal matching. It minimizes the total distance among observations. Global matching doesn't work well with large datasets because it needs a matrix of distances between all pairs. Don't confuse optimal matching with full matching. Full matching is yet another way to do 1 to n matching (or n to 1) where n is not set a priori.

FAQs about the code

Q: What about standard errors?
A: You could boostrap, but there are issues with bootsrapping with matching estimators (Abadie and Imbens, 2018). Stata's teffects matching commands would give you better SEs. 

Q: How do I do a 1 to N match?
A: Use the PSmatching.sas macro. There is an option called "numberofcontrols." The default is 1. The trick is to create k duplicates of the treated observations. The result is that a treated observation is then matched to k controls using the same code as 1 to 1 mathching.

Q: It takes a while to do the matching... [more a complaint than a question].
A: Yes, it does if you have large datasets. Make sure you test your code with a smaller sample. Chances are that you made a mistake in the code if it takes too long (say, more than 60 minutes). You could also use the "put" statement in the data step to print the observation id being read--that way you know how long it takes to iterate over the control vector for each treated unit. But try this trick with a small sample.

Q: Should I consider BIC, AIC, or likelihood ratio tests to choose the logistic model? Do I care about the significance of the coefficients?
A: No. You care about the balance that your model is achieving, not the fit of the logistic regression or the significance of the coefficients. Sometimes the model that best fits the data doesn't achieve good balance. Check standardized mean differences and variance ratios, although once overalp is better variance ratios are not as important.