DBP6: Constructing a dynamic, spatial map of transcription and chromatin structure

Back to table

A. Collaborating Investigator(s): Daniel Larson, 1 Carl Kingsford,2 Robert F. Murphy,2 Ivet Bahar3

B. Institutions: 1NIH, 2Carnegie Mellon U, 3Pitt

C. Funding Status of Project: NIH, NCI 1ZIABC011383-05 (Larson) (2011- )

D. Driving relationship between DBP6 and TR&D1 and 4: This DBP drives Aim 1 of TR&D4 and Aim 2 of TR&D1. The project stems from funded work in the Larson laboratory to study the dynamics and heterogeneity of genome structure as it relates to gene expression by systematically measuring the position, mobility, and transcription of genes in the human nucleus using multi-color imaging of nascent RNA in living cells. A key innovation of this DBP is the ability to image the history of transcriptional activity at two loci (Fig VII.5A-B). This two-gene assay is likely to become a crucial tool in imaging gene expression. These measurements will drive the development of new image-analysis (TR&D4) and structural modeling (TR&D1) tools. The Larson lab (Fig VII. 5A) will produce 3D movies where fluorescent spots reveal the location and intensity of transcription for pairs of genes. Analysis of these movies requires the development of new segmentation, registration, and modeling techniques to create models of the spatiotemporal relationships among labeled loci and between labeled loci and cellular landmarks (TR&D4). No existing packages can automatically process and model 3D movies of RNA transcripts and chromosomal domains in living cells. Images of 10,000 individual MCF7 cell lines, each with a pair of gene loci whose transcription is monitored with different fluorescent proteins, will be generated. This scale requires the development of fully automated image analysis techniques to create maps of the relationships between transcribing loci, which will drive TR&D4 Aim 1 and be incorporated into CellOrganizer.

Analysis to identify loci of coordinated and bursty translation from these maps will require the development of new computational and statistical techniques. No existing packages can identify such events from 4D traces of transcription, hence driving Subaim 2.2 of TR&D1. We will develop applications to take point traces derived from the image analysis to produce models of transcription activity conditioned on the position and transcription activity of nearby loci. From these models, significant transcription burst events will be detected. Construction of these models will require new methods for distance imputation, detection of high-confidence, reproducible distances, and clustering of measurements into heterogeneous structural classes. We have developed an open-source suite of analysis tools (Armatus-3C)93 for genome structure measurements from chromosome capture data that will be significantly extended and adapted into a suite suitable for the dynamic point traces collected here (TR&D1). Finally, the dynamic imaging data produced by this DBP will provide a validation set that will drive Subaim 2.1 of TR&D1 for the application and extension of elastic network models (e.g. GNM) and ProDy API94,95 (originally developed for protein dynamics) to evaluate chromatin motion from more widely available, but static, chromosome conformation capture measurements (e.g. Hi-C).

image006

 

 

Fig VII.5 Nascent RNA visualization in living cells. (A) The technique is based on orthogonal high affinity RNA binding proteins from MS2 and PP7 bacteriophages. (B) Simultaneous observation of two reporter genes (red and green) with identical promoters in single cells. (C) Mobility and transcriptional activity of two genes in a single nucleus. The inter-gene distance is shown as a function of time, and can be used to extract a diffusion coefficient and radius of confinement of the individual genes.

 

E. Resulting Innovations: This DBP will drive the following technical advancements and deliverables: (1) Pipeline for processing of 3D movies including a machine learning system for optimizing segmentation to a new cell/probe system. (2) Software package for fully automated construction of probabilistic point traces and probabilistic location maps from tens of thousands of 4D movies containing fluorescently labeled transcribing loci. (3) New analysis techniques, based on point process statistics and conditional models, to identify burst events consisting of co-localized co-transcribing genes, and a software package for the reporting of these events. (4) Transfer, extension and validation of GNM analysis tools to predict the motions of gene loci and their correlations (Fig III.9 in the TR&D1 section).

F. Methods and Procedures: Generation of live cell 3D movies for 10,000 gene pairs in estrogen-responsive MCF7 cells in order to better understand position, mobility, and transcription of genes. This approach is based on insertion of a DNA cassette that codes for RNA hairpins.96 When transcribed, the RNA hairpins are specifically bound by a fluorescent phage coat protein, resulting in a fluorescent ‘spot’ which represents the active gene. The original MS2 system for RNA visualization has been in use for nearly 2 decades, but a second stem loop system based on the PP7 phage was recently developed and shown to be orthogonal to the MS2 system.97-99 Using this concept, Larson has made successive advances that enabled: observation of the transcription of single genes in yeast and human cells,100,101 simultaneous visualization of two segments of an individual RNA by labeling an intron and exon of a single transcript98and direct insertion of stem loops into an endogenous locus in human cells using gene-trap technology.

Specific Aim 1: We will develop and apply fully automated image processing approaches (TR&D4) to create models of the spatiotemporal relationships between labeled loci and cellular landmarks using movies of 10,000 individual MCF7 cell lines, each with a pair transcribing gene loci, monitored with different fluorescent proteins. Although many of the analytical tools needed to analyze time-series images of this type have been implemented in some fashion in previous studies,98,102 these implementations were low throughout or relied on user intervention. This project will drive the development in TR&D4 of high-throughput, fully automated and adaptable tools, thus providing tools that can systematically integrate the spatial, temporal and functional components of gene regulation. These tools will be developed in the context of live cell imaging, but will be useful to the community for other datatypes as well.

Specific Aim 2: We will identify bursting and co-localized transcriptional events from the location maps derived from Aim 1 via the development of a framework for identifying confident, significant spatial relationships between loci. We will impute missing distances and construct a conditional random field (CRF) that models expression as a function of neighboring measured genes and their relative distances. Bursty events will be those sets of genes with high joint probability of being expressed when their spatial distances are small. Point process statistics will also be used and extended to identify spatial clustering.

Specific Aim 3: We will relate predicted motion derived from MCF7 Hi-C measurements using extensions of GNM modeling techniques (TR&D1) to the observed motion via live cell imaging. Mobility measures such as mean square displacement (MSD) and distance fluctuations between pairs of genes, step-size distributions, and angular displacements103,104 will be computed from the segmented transcription sites. The two modalities of measurements (Hi-C vs. imaging) will be used to confirm conclusions.

 

Copyright © 2018 National Center for Multiscale Modeling of Biological Systems. All Rights Reserved.