Monday February 24th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Michelle Daya, PhD
Colorado Center for Personalized Medicine,

A whirlwind tour of tools for genome-wide genetic analyses in diverse populations (with some “real life” examples)

Genome-wide genetic analyses in populations with substructure due to ancestry require special considerations. In this talk I will present a number of statistical techniques that I have found useful in my research to both account for and leverage substructure. I will summarize a number of methods: 1) Calculating ancestry related principal components that reflect ancestry and not recent relatedness, and in turn estimating recent relatedness that is not biased by ancestry. I will show how these results can be used to ensure proper calibration of test statistics in genome-wide association studies (GWAS). 2)  A meta-regression approach for trans-ethnic meta-analysis of GWAS that quantifies heterogeneity due to differences in ancestry. 3) Leveraging local ancestry in association testing. 4) Polygenic risk scoring in underrepresented populations.  5) Special considerations when estimating heritability from GWAS summary statistics in admixed populations. 


Friday February 28th 2020, 10:30am - 11:30am, Student Commons Building room 4017

Dr. Alvin C. Bronstein, PhD
Hawaii State Department of Health

Understanding the COVID-19 CoVID2019 Emerging Infectious Disease

Background: 
On 31 December 2019, China reported a cluster of pneumonia cases in people associated with Huanan Seafood Wholesale Market exposure in Wuhan, Hubei Province. On 7 January 2020, Chinese health authorities confirmed a case associated with a novel coronavirus, 2019-nCoV.  As the outbreak developed cases appeared with no history of being in the Wuhan seafood market. Epidemiologic data began to indicate that person-to-person transmission of the virus was occurring. Person-to-person spread has been reported outside China, including in the United States (30 January 2020) and other countries. Chinese officials report that sustained person-to-person spread in the community is occurring in China. Though originally called Novel coronavirus on 11 February the World Health Organization (WHO) renamed the virus COVID-19. 

Objectives:
1.    Describe the known corona viruses
2.    Understand the anatomy of the COVID-19
3.    Describe the COVID-19 clinical syndrome
4.    List clinical COVID-19 risk factors
5.    Describe how the virus is spread
6.    List common infection control practices. 
7.    Understand treatment options
8.    Discuss tracking COVID-19

Methods:
We will review the known history of COVID-19 and discuss the current global outbreak.  A basic understanding of the clinical picture of disease will be presented. Prevention best practices will be discussed. We will look at a dashboard developed to track the virus.

Question:
Is there a role for adaptive systems in describing and monitoring the outbreak?

Conclusion:
COVID-19 is an emerging infectious disease with medical, societal and economic impacts. A basic understanding of the virus and prevention practices is essential for everyone. 
 


Monday March 2nd 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Alex Kaizer
Biostatistics and Informatics
University of Colorado Anschutz Medical Campus


Monday March 9th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Jan Mandel
Department of Mathematical and Statistical Sciences
CU Denver

Unlimited computing with the Open Science Grid


Wednesday March 11th 2020, 9:00am - 11:00am, Student Commons Building room 4017

Stephan Patterson
PhD candidate, CU Denver

PhD Defense: Algorithms for Discrete Barycenters

Committee: Stephen Billups (Chair), Steffen Borgwardt (Advisor), Yaning Liu, Burt Simon and Ethan Anderes

Discrete barycenters are solutions to a class of optimal transport problems in which the input are probability measures with finite support sets. Exact solutions can be computed through exponential-sized linear programs, a prohibitively expensive approach. Efficient computations are highly desirable, as applications arise in a variety of fields including economics, physics, statistics, manufacturing, facility locations, and more. To illustrate the extremes of problem difficulty, we focus on two examples: a best-case setting based on the MNIST Digits Data set and a worst-case setting based on Denver crime locations. We describe improved linear programming models and solving strategies for each setting, supported with implementations that demonstrate significant improvements in total running time and memory requirements. We conclude with a brief examination of our proof that a decision variant of the problem is computationally hard; that is, through a reduction from planar three-dimensional matching, we show it is NP-hard to decide if there exists a solution with non-mass-splitting transport cost and support set size below prescribed bounds.


Wednesday March 11th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Ethan Anderes
Statistics, UC Davis


Monday March 16th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Tan Bui Thanh
Department of Aerospace Engineering and Engineering Mechanics
The Oden Institute for Computational Engineering and Sciences
The University of Texas at Austin

Model-aware learning approaches to data-driven inverse/UQ problems

In the first part of the talk, we present a subspace model-aware regularization technique (SMART) that combines advantages of the classical truncated SVD and Tikhonov regularization. In particular, the SMART approach does not pollute the data-informed modes, and regularizes only less data-informed ones. As a direct consequence, the approach is at least as good as the Tikhonov method for any value of the regularization parameter and it is more accurate than the TSVD (for reasonable regularization parameter). Due to this blending of these two classical methods, SMART is robust with regularization parameter. We show that the SMART approach has an interesting statistical interpretation, that is, it  transforms both  the  data  distribution  (i.e.   the  likelihood)  and  prior  distribution  (induced  by Tikhonov regularization) to the same Gaussian distribution whose covariance matrix is  diagonal  and  diagonal  elements  are  exactly  the  singular  values  of  a  composition of  the  prior  covariance  matrix,  the  forward  map,  and  the  noise  covariance  matrix. In other words, SMART finds the modes that are most equally data-informed and prior-informed and leaves these modes untouched so that the inverse solution receive the best possible (balanced) information from both prior and the data. We will show that SMART is regularization strategy and admissible regularization. To demonstrate and to support our findings, we have presented various results for popular computer vision and imaging problems including deblurring, denosing, and X-ray tomography. We also present the theoretical aspect of SMART methods on infinite dimensional spaces

The second part of the talk presents our recent work on developing model-aware deep learning approaches for inverse problems. The first approach combines the traditional ROM method and deep learning to learn the parameter-to-observable map, and the second develops an Auto-Inversion (AI) approach using a model-aware autoencoder method to learn the inverse parameter-to-observable map. Various numerical inversion results will be presented to verify the proposed approach.


Wednesday March 18th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Darius Baer


Monday March 30th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Jan van Leeuwen,
Colorado State University and University of Reading, UK

Particle Flow-based Bayesian Inference for high-dimensional geophysical problems


Monday April 6th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Kathleen Gatliffe
Department of Mathematical and Statistical Sciences, CU Denver


Monday April 30th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Kannan Premnath,
Department of Mechanical Engineering, CU Denver


Monday April 20th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Greg Kinney
Department of Epidemiology
University of Colorado Anschutz Medical Campus


Wednesday April 22nd 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Albert Berahas
Lehigh University




Past Seminars




Monday February 17th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Megan Null, PhD candidate, CU Denver

RAREsim: Simulating Rare Variant Genetic Data

Simulating realistic rare variant genetic data is vital for accurate evaluation of new statistical methods. Research suggests that large sample sizes and functional information are necessary for sufficiently powered rare variant association tests. Further, the distribution of simulated variants should be similar to that observed in sequencing data. Currently there is no simulation software that produces large sample sizes with realistic functional annotation and the expected AFS across all, including very rare, variants.
We developed RAREsim, a flexible software that simulates large sample sizes of genetic data with an AFS similar to that observed in sequencing data. Because RAREsim simulates from a sample of real haplotypes, existing functional and other genetic annotation can be used, capturing known and unknown complexities of real data. RAREsim is a two-step algorithm. First, RAREsim simulates haplotypes using HAPGEN2 (Su 2011) allowing for mutations to occur at most sites across the region. Second, RAREsim prunes the rare variants using the expected number of variants at each minor allele count. The expected number of variants is calculated from an estimate of the total number of variants in the region and the AFS. Since the AFS and total number of variants have been shown to vary by ancestry and variant type (e.g. synonymous, intron), we provide tuning parameters to enable user flexibility while maintaining the general relationship between the number of variants, AFS, and sample size. While we derive default parameters from the Genome Aggregation Database (gnomAD), the user has the ability to vary the number and distribution of variants to reflect their desired distribution for the region.
RAREsim is available as an R package and provides the ability to simulate large samples of rare variant data with functional annotation and the expected AFS for all, including very rare, variants. Realistic rare variant simulations are critical for rare variant method development. In turn, advances in these methods will allow for a greater understanding of the role rare variants play within health and disease.


Monday February 10th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Yaning Liu
Department of Mathematical and Statistical Sciences
CU Denver

From HDMR to FAST-HDMR: Surrogate Modeling for Uncertainty Quantification

Surrogate modeling is a popular and practical method to meet the needs of a large number of queries of computationally demanding models in the analysis of uncertainty, sensitivity and system reliability. We first explore various methods that can improve the accuracy of a particular class of surrogate models, the high dimensional model representation (HDMR), and their performances in uncertainty quantification and variance-based global sensitivity analysis. The efficiency of our proposed methods is demonstrated by a few analytical examples that are commonly studied for uncertainty and sensitivity analysis algorithms. HDMR techniques are also applied to an operational wildland fire model that is widely employed in fire prevention and safety control, and a chemical kinetics H2/air combustion model predicting the ignition delay time, which plays an important role in studying fuel and combustion system reliability and safety. We then show how the traditional Fourier Amplitude Sensitivity Testing (FAST), heavily used for variance-based global sensitivity analysis, can be treated in the framework of HDMR. The resulting surrogate model, named FAST-HDMR, is shown to be computationally more efficient then the original FAST. Various improvements that further enhance the accuracy of FAST-HDMR are discussed and illustrated by examples.


Wednesday February 5th 2020, 4:00pm - 5:30pm, Student Commons Building room 4113

James T. Campbell
University of Memphis

Lightning Strikes ( see flyer )

A pair of undergraduate students in an honors seminar proposed the following discrete model for the formulation of lightning. Place randomly generated numbers (levels) in each cell of an mxn grid, creating a configuration. Choose a starting cell along the top row, examine the neighboring cells, and (i) draw an edge to any neighbor whose level is less than or equal to our current level (such a cell has become visited), (ii) list the visited cells in a queue, and (iii) start the process over at the beginning of the queue, proceeding until the queue is empty.

The pictures in Figure 1 (see flyer) were computer generated from this model, with a 50x50 grid, the cell values chosen uniformly from the set {0,1,2}, and the center cell in the top row as the starting point. Each picture corresponds to a different initial distribution of the integers in the cells.

We are interested the fate of the resulting path, and would especially like to be able to compute the probability that some portion of the path reaches the bottom of the grid. We think of this case as success, or more colloquially, a lightning strike. Besides being fun to think about, it turns out that in its proper generality, the question is highly non-trivial. There are tons of related open questions, most of which are accessible to undergraduates.

Early results obtained in collaboration with Lauren Sobral, who was an undergraduate at the time.

 


Monday February 3rd 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Matt Strand
Professor and Head of Biostatistics
National Jewish Health

Building a practical survival model for the COPDGene study

There are different types of survival models, ranging from nonparametric (e.g., Kaplan-Meier), to semi-parameteric (e.g., Cox proportional hazards) to parametric (e.g., accelerated failure time [AFT]) models.  An AFT survival model using the Weibull distribution was built for subjects in the COPDGene study to quantify mortality risk as a function of several predictors spanning behavioral, physiologic, demographic and imaging categories.  The risk model can be used to motivate patients to modify behaviors to decrease risk with the help of clinician input, and to identify subjects who can be targeted for therapeutic intervention or clinical trials.  A point system was developed based on the fitted models (one for men, one for women) for ease of use and interpretation of predictors.


Wednesday January 29th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Aakash Sahai
CU Denver Department of Electrical Engineering

Nonlinear Cylindrical Ion Soliton-driven cKdV equation

This talk will to introduce how effectively modeling complex nonlinear collective phenomena using large-scale computations is critical for future scientific discovery. Collective motion of particles forms the basis of physical processes ranging from the Astrophysical to the Atomic-scale. Lab-based nonlinear collective modes strongly driven as wakefields in gasses have now paved the way to controllably access electric fields exceeding 100GV/m. These fields can effect dramatic advances in particle acceleration technology by offering at least two orders of magnitude reduction in the size of future discovery machines that will succeed 27km long LHC at CERN.

However, a major challenge lies in understanding how the electron modes interact with ions especially in the region where particle beam gets accelerated. My work shows that a cylindrical ion-soliton can be driven by the steepened nonlinear electron modes excited as wakefields. A hollow region naturally excited in the plasma solves the critical problem of collisions and related undesirable effects. Proof of principle of the theoretical model of a driven cKdV equation is established using a computational model. Experiments have recently confirmed the existence of such long-lived soliton modes which paves the way for transformative directions in accelerator technology.


Monday January 27th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Megan Sorenson, CU Denver PhD Student

Empirical simulation of very rare variant genetic data

Simulating realistic rare variant genetic data is vital for accurate evaluation of new statistical methods. Research suggests that large sample sizes and functional information are necessary for sufficiently powered rare variant association tests. Further, the distribution of simulated variants should be similar to that observed in sequencing data. HAPGEN2 (Su 2011) accurately simulates common genetic variants, but is unable to simulate data that reflects the observed allele frequency spectrum (AFS) for very rare variants, such as singletons and doubletons. Currently there is no simulation software that produces large sample sizes with realistic functional annotation and the expected AFS across all, including very rare, variants.

We developed RAREsim, a flexible software that simulates large sample sizes of genetic data with an AFS similar to that observed in sequencing data. Because RAREsim simulates from a sample of real haplotypes, existing functional and other genetic annotation can be used, capturing known and unknown complexities of real data. RAREsim is a two-step algorithm. First, RAREsim simulates haplotypes using HAPGEN2 (Su 2011) allowing for mutations to occur at most sites across the region. Second, RAREsim prunes the rare variants using the expected number of variants at each minor allele count. The expected number of variants is calculated from an estimate of the total number of variants in the region and the AFS. RAREsim is available as an R package, with a Shiny App for the user to select and visualize the allele distribution parameters. RAREsim provides the ability to simulate large samples of rare variant data with functional annotation and the expected AFS for all, including very rare, variants.

Jessica Murphy, CU Denver student

Accessible Analysis of Longitudinal Data with Linear Mixed Effects Models: There’s an App for That

Longitudinal mouse models are commonly used to study possible causal factors associated with human health and disease. However, the statistical models applied in these studies are often incorrect. If correlated observations in longitudinal data are not modeled correctly, they can lead to biased and imprecise results. Therefore, we provide an interactive Shiny App to enable appropriate analysis of correlated data using linear mixed effects (LME) models. Using the app, we re-analyze a dataset published by Blanton et al (Science 2016) that modeled mice growth trajectories after microbiome implantation from nourished or malnourished children. We then compare the fit and stability of LME models with different parameterizations. While the model with the best fit and zero convergence warnings differed substantially from the two-way ANOVA model chosen by Blanton et al, both models found significantly different growth trajectories for microbiota from nourished vs. malnourished children. We also show through simulation that the results from the two-way ANOVA and LME models will not always be consistent, supporting the need to model correlated data correctly. Hence, our app provides easy implementation of LME models for accessible and appropriate analysis of studies with longitudinal data.


Monday January 15th 2020, 11:00am - 12:15pm, Student Commons Building room 4017

Joanne B. Cole, Ph.D.
Instructor, Harvard Medical School, 
Postdoctoral Research Fellow, Medical and Population Genetics Program, Broad Institute of MIT and Harvard

Genetics of dietary intake in UK Biobank: You eat what you are

Unhealthful diet is a leading risk factor for several life-altering metabolic diseases such as obesity, type 2 diabetes (T2D), and coronary artery disease (CAD), all of which substantially increase mortality and decrease quality of life. Recent advent of large biobanks with both genetic data and deep phenotyping enable us to study the genetics of modestly heritable traits, such as diet. We derived 170 data-driven dietary habits in UK Biobank (UKB) including single food quantitative traits and principal component (PC) analysis dietary patterns, and found most (84%) had a significant proportion of phenotypic variance that could be explained by common genetic markers (‘SNP heritability’), with milk type, butter consumption, dietary pattern PC1, alcohol intake, water intake, and adding salt to food with the largest genetic contributions. Genome-wide association studies (GWAS) testing for an association between genetic variants throughout the genome and 143 heritable dietary habits using linear mixed models in ~450K European individuals identified 814 independent genetic loci, for which 205 are novel, and 136 were uniquely associated with dietary patterns and not single foods. We conducted genetic instrumental variable analysis (‘Mendelian randomization’) to identify causal relationships between our lead “healthy” vs. “unhealthy” PC1-dietary pattern. Though we find little evidence that PC1, largely driven by type of bread consumed, has a causal effect on cardio-metabolic disease, we do find a significant bidirectional causal relationship with educational attainment, where the relative strengths of the causal estimates suggest that higher educational attainment and/or correlated traits, such as socioeconomic status, shift individuals towards healthier eating habits. Overall, this work uses comprehensive genetic analysis in a well-powered sample to expand our understanding of the genetic contributors to dietary intake, and uses these findings as tools to dissect relationships with human health and disease.