How do transcription factors bind to DNA

Transcription factors




I introduction

There are three different types of RNA polymerases (RNA Pol) in eukaryotic cells. Each RNA Pol is assigned a different class of transcription: PolI transcribes rRNA (ribosomal RNA), PolII mRNA (messenger RNA) and PolIII tRNA (transfer RNA) and other short RNAs. Every protein that is required to initiate the transcription process is called a transcription factor. Many transcription factors work by cisRecognize acting regions of DNA that are part of a promoter or enhancer. However, binding to DNA is not the only mode of action for a transcription factor. One transcription factor can recognize another transcription factor or RNA polymerases. In eukaryotes it is more the transcription factors than the enzymes themselves (polymerases) that recognize a promoter.

Transcription factors can bind specific patterns of short, conserved sequences that are part of every promoter. Some of these elements and factors are common and can be found in many promoters where they are constitutively used. Other elements are specific and regulated in their use.

The factors that assist RNA PolII can be divided into 3 groups:

  • The general factors, which are needed for the initiation of RNA synthesis in all class II promoters (coding genes). Together with RNA PolII, they form a complex that includes the start of transcription, thereby determining the point of initiation. This complex forms the "basal transcription apparatus", the basal transcription apparatus.
  • The Upstream factors are DNA-binding proteins that recognize specific short consensus elements that are "above" or 5 'of the transcription start point (e.g. Sp1, which binds the GC box). These factors are ubiquitous and act on every promoter that has a corresponding binding site. They increase the efficiency of initiation.
  • The inducible factors function similarly to the upstream factors, but have a regulative role. They are synthesized or activated as a function of time and tissue. The sequences they bind are called response elements.

II initiation of transcription

The RNA Pol II enzyme cannot initiate transcription on its own, but is dependent on auxiliary transcription factors (called TFIIX, where "X" is a letter that identifies the individual factor). With these auxiliary factors, the enzyme forms the basic (or minimal) transcription apparatus that is needed to transcribe a class II promoter.

The efficiency and specificity with which a promoter is recognized depends on short sequences, the "above" or 5 ’TATA box, which is recognized by upstream factors and inducible factors. Examples of such sequences are the CAAT box, which plays a major role in determining the efficiency of a promoter and which is recognized by various factors in different promoters. For example the factors of the CTF family, the factors CP1 and CP2 and the factors C / EBP and ACF. The GC Box is recognized by the factor Sp1. These factors have the ability to interact with each other through protein-protein interactions. The main task of these elements is to bring the factors they bind close to the initiation complex, where protein-protein interactions determine the efficiency of the initiation reaction.


Figure 1: Schematic model for the assembly of the basal transcription apparatus.


III families of transcription factors

Common motifs responsible for DNA binding can be found in various transcription factors. There are several groups of proteins that regulate transcription by binding certain DNA motifs.


III.1 Helix-Turn-Helix Proteins

The helix-turn-helix motif was originally found to be the DNA-binding domain of phage repressors. One a-helix comes to rest in the major groove in the DNA, the other lies at an angle across the DNA. Another form of this motif can be found in the homeodomain, a sequence that was first characterized in proteins encoded by genes involved in the embryonic development of Drosophila are involved. It is also present in genes coding for mammalian transcription factors. The homeobox is a sequence that codes for a domain of 60 amino acids. The homeodomain is responsible for binding DNA. The specificity of DNA recognition lies within the homeodomain. The C-th region has homology to the helix-turn-helix motif of prokaryotic repressors.


III.2 Zinc finger proteins

Zinc finger motifs contain DNA-binding domains. This motif was first found in factor TFIIIA, which is used by RNA PolIII to transcribe 5S rRNA genes. These proteins get their name from their structure in which a small number of conserved amino acids bind a zinc ion. Two types of DNA-binding proteins have structures of this type: the classic "zinc finger" proteins and the steroid receptors.

A "Finger protein"typically has a number of zinc fingers, the consensus sequence of a single finger is:


The motif takes its name from the loop of amino acids sticking out of the zinc-binding site and called Cys2/ His2 Finger is written on.

These fingers are usually arranged as a single series of tandem repetitions. The range of fingers ranges from 9 repetitions, which occupies almost all of the protein (as in TFIIIA), to a single small domain consisting of 2 fingers. The general transcription factor Sp1 has a DNA-binding domain consisting of 3 zinc fingers. The C-terminal part of each finger forms an a-helix that binds DNA; the N-terminal part forms b-sheets. The non-conserved amino acids on the C-terminal side of each finger are responsible for the recognition of specific target sites.

Steroid receptorsthat are activated by the binding of certain steroids (e.g. glucocroticoids, thyroid hormone, retinoic acid) and some other proteins have a different finger type. The structure is based on the following zinc-binding consensus sequence:


These will be Cys2/ Cys2 Called finger. Proteins with Cys2/ Cys2 Fingers often have non-repetitive fingers as opposed to the tandem repetitions of the Cys2/ His2 Type. The binding sites on DNA are usually short and palindromic. The glucocorticoid and estrogen receptors each have 2 fingers that form a-helices that fold in such a way that they form a large globular domain.


III. 3 leucine zipper proteins

A leucine zipper is a chain of amino acids rich in leucine monomers that provide a dimerization motif. Dimerization allows the DNA binding region of each subunit to be juxtaposed. A leucine zipper forms an amphipathic helix in which the leucines of the zipper on one protein protrude from the α-helix and can interdigitate with the leucines of the zipper of another protein to form a "coiled coil" domain. The region adjacent to the leucia repeats is highly basic in both zipper proteins and may include a DNA binding site. The 2 leucine zippers form a Y-shaped structure, in which the zippers form the trunk and the 2 basic regions are symmetrically bifurcated branches that bind the DNA. This structure is also known as the bZIP structure motif. It explains why the target sequences for such proteins are inverted repeats with no separation. Zippers can be used to support the formation of homo- or heterodimers. There are 4 repetitions in protein C / EBP (a factor that binds both CAAT Box and SC40 nuclear enhancer as a dimer), 5 repetitions in factors Jun and Fos (which form the heterodimeric transcription factor AP1).


III.4 Helix-Loop-Helix Proteins

The amphipathic helix-loop-helix (HLH) motif was identified in some regulators of embryonic development or in genes that code for eukaryotic DNA-binding proteins. The proteins that have this motif have both the ability to bind DNA and to form dimers. They are linked by a common type of sequence motif: a region of 40-50 amino acids that contains 2 amphiathic α-helices separated by a linker region (the loop) of different lengths. The proteins in this group form homo- and heterodimers through interaction between the hydrophobic monomers on the corresponding faces of 2 helices. The ability to form dimers comes from these amphipathic helices and is common to all HLH proteins.

Most HLH proteins have a region adjacent to the HLH motif itself that is highly basic and is needed for DNA binding. Members of a group with such a region are called bHLH proteins. A dimer in which both subunits have such a basic region can bind DNA. HLH proteins can be divided into 2 general groups. Class A consists of proteins that are ubiquitously expressed, including the mammalian gene E12 / E47. Class B consists of proteins that are expressed in a tissue-specific manner, including the mammalian genes MyoD, Myf5, Myogenin, and MRF4a (a group of transcription factors involved in myogenesis or muscle development called myogenic regulatory factors or MRFs). A common one modus operandi for the tissue-specific bHLH proteins it appears that they form heterodimers with universal partners. There is also a group of gene products that are involved in the development of the nervous system Drosophila melanogaster specify (in which Ac-S specifies a tissue-specific component, and "da" the general component). The Myc proteins form their own class of bHLH proteins.