Genomic Architecture
Last article, we learned about the history of DNA and how it holds the genetic code, providing instructions for every cell in your body to do its business. We also addressed the importance of faithfully maintaining the DNA sequence, which is copied into each new cell. This might've made some of you pause - wait a second, if every cell has the same genetic code, then how in the world do they do different things? What tells a cell in your heart to do heart things, but a cell in your tongue to sense "taste"? This process is called differentiation, and although the main idea is the selective expression of the same set of genes, it isn't entirely clear how this unfolds. There are, however, have a few key strategies. One of those strategies, which we'll learn about in this article, is by precisely and functionally organizing the genome in space.
Figure 1: Flemming's drawings of mitosis. The dark stuff is what he called "chromatin", and contains the DNA but also RNA and proteins involved in maintenance of the genome.
Not just DNA
As we discussed last time, the genome is composed of DNA which encodes genes and consequently everything else in a cell. But I didn't explain the full story. Ten years after Meischer coined "nuclein", Walter Flemming was studying cellular division under a microscope. Particularly, in 1879 Flemming was interested in what was going on in the nucleus. Using a dye, he stained the nuclear material and watched it under a microscope as the cell divided (Figure 1). Flemming then made a quite prescient, if not facetious, comment: “The word chromatin may serve until its chemical nature is known, and meanwhile stands for that substance in the cell nucleus which is readily stained." Flemming coined "chromatin" for whatever that nuclear stuff was that was stainable - 140 years later and the terminology stuck. Nowadays, chromatin refers to the complex of DNA, RNA, and proteins that organizes in the nucleus and maintains and regulates the genome. We won't get too focused on the composition of chromatin, but it is an important term and highlights how DNA in the cell nucleus is not the naked alphabet we sometimes like to think about. It exists in three-dimensional space in a crowded environment in the form of chromatin. If you were to take all the DNA from one human cell and stretch it out into a straight line, it would lie 6 meters long! But the cell packs this into a nucleus around 10 micrometers (0.00001 meters) in diameter, neatly (or not) folding and organizing the genome. This organization occurs across spatial and temporal scales. In this article I will give an overview of these scales, and then dive a bit into the particular scales that I am interested in and discuss some of the interesting methodologies and findings that researchers have developed over the past couple of decades.
From the double helix to the chromosome
At the base level, DNA is organized into a double helix, as we previously discussed . The strand of DNA is wrapped around nucleosomes – composed of proteins called histones – such that ~150 base pairs (bp) of DNA are wound on each nucleosome, with around 60 bp of DNA between each nucleosome. You can conceptually visualize this as beads on a string (Figure 2). These beads facilitate the compaction of the DNA, and the nucleosome-wrapped DNA further organizes into what is referred to as the chromatin fiber. The chromatin fiber was reported in 1976 as the compaction of nucleosomal DNA into a ~30 nanometer (0.000000030 meter) thick fiber. Yet recent evidence has cast doubt on how the "chromatin fiber" actually exists in living cells, as it is surely dynamic and has been shown to be quite variable in size, but the idea of the chromatin fiber is a useful concept for thinking about chromatin organization. The fiber, while not physically true, can be thought of as the last "linear" scale of organization, in that we can imagine everything beyond this scale in three-dimensions, as a bowl of spaghetti, and not worry too much about the fine-scale details of the fiber itself. This might get me into hot water with some genome biologists, but I think it is helpful for our purposes here, and is visualized in Figure 2.
Figure 2: Organization of the genome across spatial scales. This image is taken from source.
The chromatin fiber then is flopping around in the nucleus, folding back on itself, clumping here and there, and mixing about. As we move into larger spatial scales, we see the "chromatin loops" where genomically distant regions (i.e., far apart on the linear piece of DNA) are physically quite close (Figure 2). These loops often bring together regulatory elements, parts on the DNA that regulate gene expression, such as enhancers and promoters. The looping process is helped along by these proteins called CTCF and Cohesin, where Cohesin loads on to the chromatin fiber and pulls it through, kind of like a loop in your shoestring, until getting stalled by CTCF. Of course not all loops are formed this way, and not all loops bring together enhancers and promoters either, but these are the popular ones in most research.
So we have our chromatin loops - what next? If we zoom out a bit further, we see the chromatin grouping up into what are called topologically associated domains (TADs). The exact definition of a TAD is a bit elusive, in my opinion, but it is essentially a bundle of chromatin that is more likely to be near other chromatin in the TAD than chromatin not included in the TAD. TADs are visualized in Figure 2, but experimentally are most clearly seen in a Hi-C map, which I will get into a bit further down. This is one level of compartmentalization, and chromatin is compartmentalized at larger scales as well. The physical genome generally segregates into euchromatin, referring to open, transcriptionally active chromatin, and heterochromatin, which is compact and houses weakly expressed genes. Of course, this segregation is not complete, and what happens is local regions of euchromatin and heterochromatin form compartments called "A" and "B" compartments (the naming is dumb, I know) which are also seen clearly on Hi-C maps.
Finally, individual chromosomes - of which humans have 46 in (basically) every cell, and which are single pieces of DNA (think of one noodle in the bowl of many) - tend to occupy more or less distinct regions, with some intermingling. These are called chromosome territories, shown by the differently colored chromatin in Figure 2. All of these "structures", I should note, are described as they exist during interphase, when a cell is expressing genes and doing its thing, and are described statically but are in fact quite dynamic. Most clearly, go take a look back at Figure 1: as a cell goes through mitosis and divides, an intense reorganization of the chromatin occurs, and many of these structural elements are rearranged. This dynamic chromatin organization is important for gene regulation, and the dynamics occur across temporal scales as well, as I will discuss below.
Figure 3: A cartoon of Cohesin-mediated loop extrusion, stolen from source. The blue line is the DNA, and the orientation of the CTCF triangles shows how Cohesin can bypass CTCF oriented the other way.
The dynamic genome
As I mentioned, the chromatin in the nucleus is constantly moving about. As stem cells differentiate into a specific cell type, studies have shown large reorganization at the level of A/B compartments, TADs, and chromatin loops. For an illustration, look at Figure 4. When looking at a mouse embryonic stem cell (mESC), certain parts of the genome are either in A (green) or B (red) compartments. As the cells differentiate into certain cell types, certain genomic regions transition between compartments as genes are turned on or silenced. In the fully differentiated state, the compartmentalization might be strong as there is no need to access certain genes in a given cell type (exemplified by the dark red in the right of the figure).
Figure 4: A/B compartment reorganization through cell differentiation. Stolen from source.
One thing you might have noticed in the image is the label of TADs, and how they apparently stay the same throughout the process. I'm not sure if this was the intent of the figure, but it does reveal an interesting phenomenon. Multiple studies have shown that TAD boundaries - that is, where on the genome TADs stop and start - don't actually change much between cell types. TADs can be formed through loop extrusion, so this kind of makes sense because the DNA sequence doesn't change between cell types and thus where the loop anchors are might be constant. However, I find this interesting because for all the importance given to TADs for the regulation of gene expression, their relative stability could suggest otherwise.
Although I just called TADs "stable", they are also dynamic when it comes to an individual cell. Which, in another way, throws some skepticism into their importance. By this I mean that a TAD with defined boundaries does not really exist in a single cell at any given time, and the boundaries (and thus the TADs) are a result of averaging many cells in a given population. This, unless the unique structure in each cell is static which is extremely unlikely, means that that the genomic region referred to as a TAD will fluctuate through various structures over time, such that its temporal average looks like the TAD from the figures. Cool studies using super-resolution imaging have exemplified this, showing snapshots of "TAD" structures in different cells, revealing that no individual structure takes on the globule-like form that a true TAD would suggest (Figure 5). Researchers are still unable to track the shape of a TAD in a single living cell to watch it go through these movements, but I expect progress soon and my own research aims to help (in some small way) towards this goal.
Figure 5: Renderings of the chromatin fiber as reconstructed by super-resolution imaging. The color shows the region of the linear piece of DNA, and we see in 4 different examples that the shape the fiber is in is quite variable, and never really like the chromatin "blob" seen in the previous cartoons. Stolen from source.
At a smaller scale, we can look at individual genomic elements. These are segments of DNA that have some role in the regulation of gene expression, whether they are promoters, enhancers, genes themselves, or otherwise. In another article maybe I will explain about noncoding regions of the genome and what they mean, but for now just know that they are relatively small (a few thousand bp or less) and help tell the cell what genes to express. The regions diffuse around the nucleus like dust in the air, bumping into other parts of the chromatin. Interestingly, some segments bump into other parts much more than random chance would suggest, as we discussed above about chromatin loops. These "loops" are also transient, and we know this because we know these genomic regions are bumbling about after watching them using live-cell microscopy. Take a look at Figure 6, which shows a video I took in the lab of a gene in living cell played back in real time. I probably shouldn't publish this but oh well. The gene, the bright green spot, isn't exploring the whole nucleus, but locally wiggling around. And this is just over a few seconds! Other studies have done similar things while labeling two genomic regions and tracking the distance between them, and have found that even though they sometimes come into close proximity - forming what is often called a contact - they are for the most part apart, diffusing doing their own thing. This class of chromatin motion is interesting because it relates to the regulation of individual transcriptional events. How do genes, in a single cell in real time, move about and what does that mean for their expression? This question is an exciting one and an area that I expect to see a lot of progress in in the upcoming decade.
Figure 6: Video of gene motion in a living cell, in real time. Stolen from ME.
Methods to study chromatin structure
How do we know all of this? I've given some hints along the way, but in general the methods fall into three categories: (1) chromosome conformation capture, (2) fluorescence in situ hybridization, and (3) live-cell imaging. Each of these has a unique perspective to add and produces visually pleasing data so we will look at each one in a little detail below.
Chromosome conformation capture
Chromosome conformation capture (3C) works by chemically fixing a cell in time, cutting up its DNA, and then gluing pieces back together that were close, and sequencing them (basically). A high throughput and whole genome version of this is called Hi-C, and when performed on a group of cells you end up with a matrix where each entry in the matrix shows the frequency of interaction between two regions of the genome. The "resolution" of this matrix is the size of each segment of DNA; for example each pixel in the matrix might correspond to 10 kilobases of DNA. An example matrix is shown above, for chromosome 14 in a human cell line. As we expect, the diagonal line is very red, meaning a high frequency of interaction, because this is parts of the chromatin fiber that is linearly very close. As we move away from the diagonal, the signal on average gets weaker, but there are some interesting phenomena like squares, blocking in a region (a TAD), or spots away from the diagonal showing a strong contact between two distant regions (a loop). Thus, with Hi-C (and other similar methods) you can look at what genomic regions are more likely to "interact" across a population of cells than other regions. Notably, this is a static method, and the interaction frequency doesn't necessarily imply physical proximity.
Fluorescence in situ hybridization
DNA fluorescence in situ hybridization (DNA FISH) is a classic method that enables imaging the location of a genomic region in a cell. This works by fluorescently tagging a piece of single-stranded DNA that is complementary to your region of interest, physically or chemically unraveling the double helix in the cell, and attaching your tagged piece (hybridizing it) to the cellular DNA. Then you can take these cells (which are fixed and dead) to a fluorescent microscope and take a picture to see where that genomic region is in a single cell, and/or its distribution across many cells. Researchers can do this with multiple regions using multiple colors (as shown in the image) to understand how different regions localize with respect to each other. Furthermore, there are now methods using super-resolution imaging and the ability to sequentially label regions to "trace" out chromatin structure at very fine resolution. This was shown in Figure 5 above. FISH is a great method because it is relatively easy, robust, and provides single-cell resolution. However, it is still performed on fixed/dead cells and thus only provides a snapshot in time of chromatin structure.
Live-cell imaging and tracking
Live-cell imaging, the method closest to my heart. This is really just like what it sounds. Methods have been developed that enable fluorescently tagging a genomic region of interest without killing the cell, and then by taking movies under the microscope you can track how the region moves over time. Think of it like a live-cell compatible FISH, although the technical implementation is of course different. There are multiple ways to achieve this, and a full review is outside the scope of this article. I'll just say my personal favorite is using CRISPR/Cas9... In any case, live-cell imaging provides worse sequence and spatial resolution than 3C- or FISH- methods, but it provides essentially infinitely better temporal resolution. In the above movie, researchers have labeled the two anchors for a chromatin loop, one in pink and one in green, and are thus able to track the distance between them over time. This revealed that the loop is quite dynamic, and will only every once in a while be in a truly "looped" state where the two anchors are physically close.
So now you know all about genome architecture (or chromatin structure, or whatever else you want to call it)! I find it a fascinating field because it seemingly emerged just as the field was really feeling confidence about its genomic prowess. Synthetic biology - the ability to build biological systems from the "ground up" through thorough understanding of how DNA works - has proved much more difficult to implement than I think many initial enthusiasts expected. Chromatin dynamics is one way that shows how complex biological systems are, and even if we know perfectly how to interpret the DNA sequence, it's not always clear what other factors might get involved and throw our understanding askew. Another reason I like this topic is because in hindsight some of the findings are obvious, in an elegant way. And to end on a perhaps contradictory note - we still don't know how much all of this matters, and whether structure is a driver of function or the other way around. I'm sure it's in the middle ground.