Genome-wide Association Studies of Substance Use and Use Disorder Where to find them, and what to do with them

Abstract: This presentation will guide attendees with how to access genomewide association study summary statistics, and showcase resources available for annotating these summary data for follow-up analyses, including gene-based analyses, eQTL and epigenetic annotation as well as causal variable analysis. We will guide attendees through components of a GWAS summary dataset and two excellent resources - FUMA and MASSIVE - that use these summary files as inputs to generate vast amounts of annotations that can be brought forward to answer translational research questions.

Supported by: NIH/NIDA R01DA054869; T32DA007261; K02DA032573

Presented by:

Dr. Alexander S. Hatoum
Department of Psychiatry
Washington University School of Medicine

Mouse Phenome Database: Resources and analysis tools for curated and integrated primary mouse phenotype and genotype data

Abstract: The Mouse Phenome Database (MPD; https://phenome.jax.org ) is a widely used resource that provides access to primary experimental data, protocols and analysis tools for mouse phenotyping studies. Data are contributed by investigators around the world and represent a broad scope of phenotyping endpoints and disease-related characteristics in naïve mice and those exposed to drugs, environmental agents or other treatments. MPD is engineered to facilitate interactive data exploration and quantitative analysis. It encompasses data from inbred strains and other reproducible panels, including HMDP, KOMP, Collaborative Cross (CC), CC-RIX, and founder strains, along with primary data from mapping populations, including historic mapping crosses and advanced high-diversity mouse populations such as Diversity Outbred mice. A new Study Intake Platform (SIP) for data contributors allows domain experts to submit and annotate their own data with relevant ontology terms. Data contributors also provide detailed information for protocols and animal environmental conditions to fulfill ARRIVE guidelines. Data are exposed to analysis tools within MPD and are available through APIs to other systems. We will demonstrate selected MPD tools, including GenomeMUSter (https://muster.jax.org), a new imputed SNP grid on 650+ strains of mice at 106+M locations, and a new GWAS metanalysis tool.

Bogue MA, Ball RL, Philip VM, Walton DO, Dunn MH, Kolishovski G, Lamoureux A, Gerring M, Liang H, Emerson J, Stearns T, He H, Mukherjee G, Bluis J, Desai S, Sundberg B, Kadakkuzha B, Kunde-Ramamoorthy G, Chesler EJ. Mouse Phenome Database: towards a more FAIR-compliant and TRUST-worthy data repository and tool suite for phenotypes and genotypes. Nucleic Acids Res. 2023 Jan 6;51(D1):D1067-D1074. doi: 10.1093/nar/gkac1007. PMID: 36330959; PMCID: PMC9825561.

Ball RL, Bogue MA, Liang H, Srivastava A, Ashbrook DG, Lamoureux A, Gerring MW, Hatoum AS, Kim MJ, He H, Emerson J, Berger AK, Walton DO, Sheppard K, El Kassaby B, Castellanos F, Kunde-Ramamoorthy G, Lu L, Bluis J, Desai S, Sundberg BA, Peltz G, Fang Z, Churchill GA, Williams RW, Agrawal A, Bult CJ, Philip VM, Chesler EJ. GenomeMUSter mouse genetic variation service enables multitrait, multipopulation data integration and analysis. Genome Res. 2024 Feb 7;34(1):145-159. doi: 10.1101/gr.278157.123. PMID: 38290977; PMCID: PMC10903950.

Funding provided by NIH DA028420, DA045401, AG066346.

Presented by:

Molly Bogue and Robyn Ball
Other senior members of the MPD team: Elissa Chesler, Vivek Philip, Dave Walton
The Jackson Laboratory

HiDiver: A Suite of Methods to Merge Magnetic Resonance Histology, Light Sheet Microscopy, and Complete Brain Delineations

Abstract: We have developed new imaging and computational workflows to produce accurately aligned multimodal 3D images of the mouse brain that exploit high resolution magnetic resonance histology (MRH) and light sheet microscopy (LSM) with fully rendered 3D reference delineations of brain structures. The suite of methods starts with the acquisition of geometrically accurate (in-skull) brain MRIs using multi-gradient echo (MGRE) and new diffusion tensor imaging (DTI) at an isotropic spatial resolution of 15 μm. Whole brain connectomes are generated using over 100 diffusion weighted images acquired with gradients at uniformly spaced angles. Track density images are generated at a super-resolution of 5 μm. Brains are dissected from the cranium, cleared with SHIELD, stained by immunohistochemistry, and imaged by LSM at 1.8 μm/pixel. LSM channels are registered into the reference MRH space along with the Allen Brain Atlas (ABA) Common Coordinate Framework version 3 (CCFv3). The result is a high-dimensional integrated volume with registration (HiDiver) that has a global alignment accuracy of 10–50 μm. HiDiver enables 3D quantitative and global analyses of cells, circuits, connectomes, and CNS regions of interest (ROIs). Throughput is sufficiently high that HiDiver is now being used in comprehensive quantitative studies of the impact of gene variants and aging on rodent brain cytoarchitecture.

Presented by:

Dr. G Allan Johnson
Charles E Putman Professor of Radiology, Physics, and Biomedical Engineering
Duke University
Durham, North Carolina

Julia: a fast, friendly, and powerful language for data science

Julia is a high-level dynamic programming language that is gaining popularity. The Julia language is designed for scientific computing and offers several attractive features for data science applications. In this webinar, we will make a case for why a data scientist might consider taking a serious look at Julia. We will show code examples and point the audience to further resources.

Goals of this webinar:

  • To articulate why Julia is attractive for data scientists

  • To provide an overview of Julia language syntax and design

  • To provide additional resources about the Julia language and ecosystem

Presented by:

Gregory Farage and Saunak Sen gfarage@uthsc.edu / sen@uthsc.edu Division of Biostatistics Department of Preventive Medicine University of Tennessee Health Science Center Memphis, TN

Resources

Tutorial

Extra

Useful links

A Comprehensive Tutorial to Learn Data Science with Julia from Scratch

10 Reasons Why You Should Learn Julia

Noteworthy Differences from other Languages

Julia Cheat Sheet

Guide to evaluating the application of machine learning methods in genetics literature

Goals of this webinar:

  • To describe the relationship between artificial intelligence (AI), machine learning (ML), and deep learning (DL).

  • To describe general scenarios when ML is appropriate.

  • To understand methods for comparing the performance of different ML algorithms

  • To layout general criteria to examine when evaluating literature that includes machine learning algorithms

Presented by:

Laura Saba, PhD
Associate Professor
Department of Pharmaceutical Sciences
Skaggs School of Pharmacy and Pharmaceutical Sciences
University of Colorado Anschutz Medical Campus
Aurora, CO

References

Liu Y, Chen PC, Krause J, Peng L. How to Read Articles That Use Machine Learning: Users' Guides to the Medical Literature. JAMA. 2019 Nov 12;322(18):1806-1816. doi: 10.1001/jama.2019.16489. PMID: 31714992.

Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019 Apr 4;380(14):1347-1358. doi: 10.1056/NEJMra1814259. PMID: 30943338.

Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920. Epub 2015 May 7. PMID: 25948244; PMCID: PMC5204302.

Other Links Referenced During Discussion

Rob's Salmon fMRI study - https://www.wired.com/2009/09/fmrisalmon/

Hao Chen's MLOps link - https://www.deeplearning.ai/wp-content/uploads/2021/06/MLOps-From-Model-centric-to-Data-centric-AI.pdf

A Primer on Brain Proteomics and protein-QTL Analysis for Substance Use Disorders

Goals of this webinar:

  • To give a general introduction to proteomics technologies and data processing/normalization

  • To present a pipeline for correcting sample mix-ups in proteomic data.

  • To discuss rat brain proteome and protein QTL analysis for Substance Use Disorders.

Presented by:

Xusheng Wang, PhD
Assistant Professor
Department of Biology
University of North Dakota

Robert W. Williams, PhD
Professor and Chair
Department of Genetics, Genomics, and Informatics
University of Tennessee Health Science Center

Organizing data in spreadsheets

Abstract: Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this presentation will offer practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.

Presented by:

Karl Broman, PhD
Professor
Department of Biostatistics & Medical Informatics
University of Wisconsin-Madison

A Rube Goldbergian Approach to Scheduling Rodent Behavior Experiments and Data Collection

Abstract: Large-scale rodent behavioral experiments with complicated testing procedures conducted over several years (e.g. genetic mapping of operant drug taking) need rigorous control on the quality of the data. This webinar will discuss methods used in my lab where we generate ready to use MedPC macros from a spreadsheet for new test sessions, cell phone notification on the completion of behavioral tests, nightly automated data assembly, daily notification of procedural changes for individual animals. Potential errors are checked automatically at several points with messages sent to the users. This system is put together using a relational database (sqlite), several ad hoc computer programs (perl, python, or shell),  a cloud storage service (Dropbox), and a messaging system (slack). By turning much of the experiment planning and error checking procedure into computer code, we improve experimental efficiency and data quality.

Presented by:

Hao Chen, PhD
Associate Professor
Department of Pharmacology, Addiction Science, and Toxicology
University of Tennessee Health Science Center

Introduction to DNA Methylation Platforms and Data Analysis

Goals of this webinar:

Studying DNA methylation is widespread in biomedical research. The goals of this webinar are:

  • To describe research questions that can be explored by profiling the methylome

  • To give a general overview of DNA methylation profiling technologies

  • To outline steps in DNA methylation analysis pipeline

  • To provide information on common resources and databases

Presented by:

Katerina Kechris, PhD
Professor
Department of Biostatistics and Informatics
Colorado School of Public Health
University of Colorado Anschutz Medical Campus

Identifying sample mix-ups in eQTL data

Goals of this webinar:

Sample mix-ups interfere with our ability to detect genotype-phenotype associations. However, the presence of numerous eQTL with strong effects provides the opportunity to not just identify sample mix-ups, but also to correct them.

  • To illustrate methods for identifying sample duplicates and errors in sex annotations

  • To illustrate methods for identifying sample mix-ups in DNA and RNA samples from experimental cross data 

Presented by:

Karl Broman, PhD
Professor
Department of Biostatistics and Medical Informatics
University of Wisconsin–Madison

Introduction to the Hybrid Rat Diversity Panel: A renewable rat panel for genetic studies of addiction-related traits

Goals of this webinar:

  • Inbred model organisms

  • Recombinant inbred panels

  • Why rats?

  • Hybrid Rat Diversity Panel

  • Current resources

  • Data integration demo

  • Where to now?

Presented by:

Hao Chen, PhD
Associate Professor
Department of Pharmacology, Addiction Science, and Toxicology
University of Tennessee Health Science Center

Laura Saba, PhD
Associate Professor
Department of Pharmaceutical Sciences
Skaggs School of Pharmacy and Pharmaceutical Sciences
University of Colorado Anschutz Medical Campus

Introduction to Metabolomics Platforms and Data Analysis

Goals of this webinar:

The use of metabolomics to profile small molecules is now widespread in biomedical research. The goals of this webinar are:

  • To describe research questions that can be addressed using metabolomics

  • To give a general overview of metabolomics technologies

  • To outline steps in a metabolomics data analysis pipeline

  • To provide information on common resources and databases

Presented by:

Katerina Kechris, PhD
Professor
Department of Biostatistics and Informatics
Colorado School of Public Health
University of Colorado Anschutz Medical Campus

Landing on Jupyter: A guided tour of interactive notebooks

Goals of this webinar:

Jupyter is an interactive interface to data science and scientific computing across a variety of programming languages. We will present the Jupyter notebook, and explain some key concepts (e.g., kernel, cells). We will show how to create a new notebook; modify an existing notebook; save, export, and publish a notebook. We will discuss several possible use cases: developing code, writing reports, taking notes, and teaching/presenting.

Objectives:
- Learn what Jupyter notebooks are
- Learn how to install, configure, and use Jupyter notebooks
- Learn how to use Jupyter notebooks for research, teaching, or code development 

Presented by:
Dr. Gregory Farage & Dr. Saunak Sen
Department of Preventative Medicine
University of Tennessee Health Science Center 

Become a UseR: A brief tour of R

Goals of this webinar:
We will introduce R programming language and outline the benefits of learning R.  We will give a brief tour of basic concepts and tasks: variables, objects, functions, basic statistics, visualization, and data import/export.  We will showcase a practical example demonstrating statistical analysis.

  • Why should one use/learn R?

  • How to install R/Rstudio

  • Learn about R basics: variables, programming, functions

  • Learn about the R package ecosystem that extends its capabilities

  • See a basic statistical analysis example

  • Learn about additional resources

Presented by:
Dr. Gregory Farage & Dr. Saunak Sen

Department of Preventative Medicine
University of Tennessee Health Science Center

From GWAS to gene: what are the essential analyses and how do we bring them together using heterogeneous stock rats?

Goals of this webinar:
Heterogeneous stock (HS) rats are an outbred population that was created in 1984 by intercrossing 8 inbred strains. The Center for GWAS in Outbred Rats (http://www.ratgenes.org) has developed a suite of analysis tools for analyzing genome wide association studies (GWAS) in HS rats

  • explain the HS rat population and their history

  • describe the automated pipeline that performs GWAS in HS rats

  • explore the fine mapping of associated regions and explain the various secondary analyses that we use to prioritize genes within associated intervals

Presented by:
Dr. Abraham Palmer
Professor and Vice Chair for Basic Research
Department of Psychiatry
University of California San Diego

Beginner’s guide to bulk RNA-Seq Analysis

Goals of this webinar:
The use of high throughput short read RNA sequencing has become common place in many scientific laboratories. The analysis tools for quantitating a transcriptome have matured becoming relatively simple to use. The goals of this webinar are:

  • To give a general overview of the popular Illumina technology for sequencing RNA.

  • To outline several of the key aspects to consider when designing a RNA-Seq study

  • To provide guidance on methods and tools for transforming reads to quantitative expression measurements.

  • To describe statistical models that are typically used for differential expression and why these specialized models are needed.

Presented by:
Dr. Laura Saba
Associate Professor
Department of Pharmaceutical Science
Skaggs School of Pharmacy and Pharmaceutical Sciences
University of Colorado Anschutz Medical Campus

Sketching alternate realities: An introduction to causal inference in genetic studies

Goals of this webinar:
Determination of cause is an important goal of biological studies, and genetic studies provide unique opportunities.  In this introductory lecture we will frame causal inference as a missing data problem to clarify challenges, assumptions, and strategies necessary for assigning cause.  We will survey the use of directed acyclic graphs (DAGs) to express causal information and to guide analytic strategies.

  • Express causal inference as a missing data problem (counterfactual framework)

  • Outline assumptions needed for causal inference

  • Express causal information as (directed acyclic) graphs

  • Outline how to use graphs to guide analytic strategy

Presented by:
Dr. Saunak Sen
Professor and Chief of Biostatistics
Department of Preventative Medicine
University of Tennessee Health Science Center

Introduction to GeneWeaver: Integrating and analyzing heterogeneous functional genomics data

Goals of this webinar:

  • Compare a user's gene list with multiple functional genomics data sets

  • Compare and contrast gene lists with data currently available and integrated in GeneWeaver

  • Explore functional relationships among genes and disease across species

Presented by:
Dr. Elissa Chesler
Professor
The Jackson Laboratory
AND
Dr. Erich Baker
Professor and Chair
Department of Computer Science
Baylor University