class: title-slide, left, bottom <font size="50"> Monte Carlo Simulation: Some Fundamentals </font> <br><br><br><br><br><br> **Ashley I Naimi, PhD** <br> Associate Professor <br> Emory University <br> <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M464 64H48C21.49 64 0 85.49 0 112v288c0 26.51 21.49 48 48 48h416c26.51 0 48-21.49 48-48V112c0-26.51-21.49-48-48-48zm0 48v40.805c-22.422 18.259-58.168 46.651-134.587 106.49-16.841 13.247-50.201 45.072-73.413 44.701-23.208.375-56.579-31.459-73.413-44.701C106.18 199.465 70.425 171.067 48 152.805V112h416zM48 400V214.398c22.914 18.251 55.409 43.862 104.938 82.646 21.857 17.205 60.134 55.186 103.062 54.955 42.717.231 80.509-37.199 103.053-54.947 49.528-38.783 82.032-64.401 104.947-82.653V400H48z"></path></svg> ashley.naimi@emory.edu <br> <svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @ashley_naimi <br> <svg viewBox="0 0 496 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> ainaimi <br> --- # Simulations .font150[ * General Principles * Types of Simulation * Let's Simulate * Analyzing and Reporting * Considerations * Further Exploration ] --- # General Principles behind Data Simulation .font150[ * Why Simulate? * Exploring * Learning * Teaching * Research ] --- # General Principles behind Simulation .font150[ * Every sample has an underlying cohort * Clean code * Efficient code * Reproducible code ] --- # Types of Simulation .font150[ * <span style="color: red"> Systems Dynamics and Agent Based Models </span> * <span style="color: red"> The Parametric G Formula (aka G Computation) </span> * Monte Carlo Simulation * Regular * Plasmode ] --- # Let's Simulate Data .pull-left-a-little[ <img src="./images/F1.png" width="400px" style="display: block; margin: auto;" /> ] -- .pull-right-a-lot[ ```r set.seed(123) expit <- function(x){1/(1+exp(-x))} n <- 500 id <- 1:500 c1 <- rbinom(n, 1, .5) c2 <- rbinom(n, 1, .5) x <- rbinom(n,1,expit(-2 + log(2)*c1 + log(2)*c2)) y <- 30 + 3*x + 3*c1 + 3*c2 + rnorm(n,0,1) sim_dat <- data.frame(id,y,x,c1,c2) head(sim_dat) ``` ``` ## id y x c1 c2 ## 1 1 31.5 0 0 0 ## 2 2 32.9 0 1 0 ## 3 3 30.5 0 0 0 ## 4 4 36.2 1 1 0 ## 5 5 35.8 1 1 0 ## 6 6 29.9 0 0 0 ``` ] --- # Let's Simulate Data ```r mod1 <- glm(y ~ x + c1 + c2, data=sim_dat,family=gaussian("identity")) coeftest(mod1, vcov = vcov(mod1))[2,] ``` ``` ## Estimate Std. Error z value Pr(>|z|) ## 3.04e+00 1.17e-01 2.60e+01 2.39e-149 ``` ```r ps <- glm(x ~ c1 + c2, data=sim_dat,family=binomial("logit"))$fitted.values sim_dat$sw <- x*(mean(x)/ps) + (1-x)*((1-mean(x))/(1-ps)) summary(sim_dat$sw) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.602 0.887 0.985 1.000 1.006 2.069 ``` ```r mod2 <- glm(y ~ x, data=sim_dat,family=gaussian("identity"),weights=sw) coeftest(mod2, vcov = vcovCL(mod2, cluster=sim_dat$id, type = "HC1"))[2,] ``` ``` ## Estimate Std. Error z value Pr(>|z|) ## 3.06e+00 2.91e-01 1.05e+01 8.90e-26 ``` -- .font150[ <span style="color: red"> How different are the estimates of `\(\beta_X\)` when using outcome regression versus IP weighting? </span> ] --- # Simulation Study ```r set.seed(123) expit <- function(x){1/(1+exp(-x))} simulation_function <- function(z){ n <- 500 id <- 1:500 c1 <- rbinom(n, 1, .5) c2 <- rbinom(n, 1, .5) x <- rbinom(n,1,expit(-2 + log(2)*c1 + log(2)*c2)) y <- 30 + 3*x + 3*c1 + 3*c2 + rnorm(n,0,1) sim_dat <- data.frame(id,y,x,c1,c2) mod1 <- glm(y ~ x + c1 + c2, data=sim_dat,family=gaussian("identity")) coef1 <- coeftest(mod1, vcov = vcov(mod1))[2,1:2] ps <- glm(x ~ c1 + c2, data=sim_dat,family=binomial("logit"))$fitted.values sim_dat$sw <- x*(mean(x)/ps) + (1-x)*((1-mean(x))/(1-ps)) mod2 <- glm(y ~ x, data=sim_dat,family=gaussian("identity"),weights=sw) coef2 <- coeftest(mod2, vcov = vcovCL(mod2, cluster=sim_dat$id, type = "HC1"))[2,1:2] res <- rbind(cbind(1,t(coef1)), cbind(2,t(coef2))) return(res) } ``` --- # Simulation Study ```r set.seed(123) res <- lapply(1:10000, function(ii) simulation_function(ii)) res <- do.call(rbind,res) res <- data.frame(res) names(res) <- c("Model","Estimate","SE") head(res) ``` ``` ## Model Estimate SE ## 1 1 3.04 0.1169 ## 2 2 3.06 0.2911 ## 3 1 2.86 0.0997 ## 4 2 2.82 0.2727 ## 5 1 3.16 0.1103 ## 6 2 3.21 0.2598 ``` ```r res %>% group_by(Model) %>% summarize(bias = mean(3 - Estimate), sd_beta = sd(Estimate), se_beta = mean(SE)) ``` ``` ## # A tibble: 2 x 4 ## Model bias sd_beta se_beta ## * <dbl> <dbl> <dbl> <dbl> ## 1 1 0.000248 0.109 0.110 ## 2 2 -0.000681 0.136 0.277 ``` --- # Simulation Study <img src="images/sim_res.png" width="1000px" style="display: block; margin: auto;" /> --- # Setting up and Analyzing Simulation Data .font150[ * Monte Carlo simulations are experiments * For a Monte Carlo sample size `\(M\)`, with error `\(e\)` * for error in bias, `\(M = \frac{SD\{ \hat{B}_M(\psi)^2 \} }{e^2}\)` * for error in proportions, `\(M = \frac{4 p (1-p)}{e^2}\)` ] --- # Simulation Considerations .font150[ * Output and save more than you might need * Keep MC samples small at first * Be aware of how you parameterized the simulations * Leaving something (e.g., interaction) is assuming something is zero! ] --- # For Further Exploration .font150[ * R functions * lapply, mapply, and mclapply * parallel processing ] --- # Some Papers for Further Reading .font150[ * Morris et al (2019) Using simulation studies to evaluate statistical methods. Stat Med. 38(11): 2074-2102. * Rudolph et al (2020) Simulation as a Tool for Teaching and Learning Epidemiologic Methods. Am J Epidemiol. https://doi.org/10.1093/aje/kwaa232 * Rudolph et al (2021) Simulation in Practice: The Balancing Intercept. Am J Epidemiol. https://doi.org/10.1093/aje/kwab039 ]