The BEEHIVE develops tailored models and methods using approaches from statistics and machine learning in order to study, deconstruct, visualize, predict, modify, and engineer biomedical systems.

Genomics

Measurements of biological systems have both noise and systematic bias, and often the analytical goal is to identify biologically-meaningful low-dimensional substructure within a high-dimensional space with structured samples, such as time-series data, single-cell RNA-sequencing, spatial transcriptomics samples, or Perturb-seq CRISPR screens. We build model-based approaches to gain access to interesting biological phenomena that would be otherwise missed.

Electronic healthcare data

Electronic healthcare records (EHRs) hold the promise of studying the processes of hospital patient care, and automating and improving this care using machine learning approaches. Our group has worked on time-series models of EHR data, off-policy reinforcement learning methods to capture these data, and a number of applications of these approaches to patient scenarios, including ventilator weening, electrolyte repletion, and administration of lab tests.

Longitudinal studies

Longitudinal cohort studies, including the Fragile Families Child Wellbeing Study (FFCWS), the UK Biobank, and others, offer a glimpse into how life events or other factors may affect disease risk later in life. For example, in the FFCWS we showed in boys for the first time that early puberty is associated with high BMI only at the most extreme decile of BMI using an ordinal quantile regression model. Using Bayesian time-series models and tests for association, we are identifying meaningful longitudinal phenotypes and identifying associations with donor demographics, environmental factors, and genomic markers.

The impact of trauma

Physical and emotional trauma impacts mammalian systems in complex ways. In collaboration with sociologists, including the Fragile Family Study, and biologists, including Dr. Jenny Tung and Prof Cate Pena, we are developing models to study the impact of trauma on cellular systems and the effects of those cellular modifications later in life. We are also searching for trauma buffers, or cellular modifications that protect humans and other mammals from the effects of trauma later in life, in order to identify possible treatments for specific types of trauma.

Live-cell imaging

In collaboration with Prof Jared Toettcher and Prof Alex Marson, we have several projects focused on developing statistical machine learning models for studying live-cell imaging data. With the Toettcher Lab, we developed the Cellular Point Process (CPP), a modified Hawkes process that deconvolved protein spontaneous pulsing from protein pulsing rates within a single cell from protein intercellular pulsing rates. We are currently developing methods to phenotype CAR T cells for cancer therapy using live-cell imaging data in order to use machine learning methods to accelerate the discovery of the precise CRISPR mutations for a patient’s T cells to kill their tumor cells most effectively.

Protein engineering

We have recently initiated a handful of collaborations in the area of protein engineering for the purpose of mitigating climate change. Bacterial and plant species play an enormous role in climate change, and we are working with local groups to give these species additional abilities through engineered proteins to improve their abilities to remove carbon from the air or repair cracked cement or fix nitrogen more efficiently for the long term goal of reducing our impact on the earth’s climate.