^{1,a)}, Adam Liwo

^{2,b)}, Antti J. Niemi

^{1,3,c)}and Harold A. Scheraga

^{4,d)}

### Abstract

A heterodimer consisting of two or more different kinds of proteins can display an enormous number of distinct molecular architectures. The conformational entropy is an essential ingredient in the Helmholtz free energy and, consequently, these heterodimers can have a very complex phase structure. Here, it is proposed that there is a state of proteins, in which the different components of a heterodimer exist in different phases. For this purpose, the structures in the protein data bank (PDB) have been analyzed, with radius of gyration as the order parameter. Two major classes of heterodimers with their protein components coexisting in different phases have been identified. An example is the PDB structure 3DXC. This is a transcriptionally active dimer. One of the components is an isoform of the intra-cellular domain of the Alzheimer-disease related amyloid precursor protein (AICD), and the other is a nuclear multidomain adaptor protein in the Fe65 family. It is concluded from the radius of gyration that neither of the two components in this dimer is in its own collapsed phase, corresponding to a biologically active protein. The UNRES energy function has been utilized to confirm that, if the two components are separated from each other, each of them collapses. The results presented in this work show that heterodimers whose protein components coexist in different phases, can have intriguing physical properties with potentially important biological consequences.

This work was supported by grants from the National Institutes of Health (Grant No. GM-14312), the National Science Foundation (Grant No. MCB10-19767), the Polish Ministry of Science and Higher Education (Grant No. DS 8372-4-0138-12), and a CNRS PEPS collaboration grant. Computational resources were provided by (a) Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357, (b) the National Science Foundation (http://www.nics.tennessee.edu/), and by the National Science Foundation through TeraGrid resources provided by the Pittsburgh Supercomputing Center (c) the Informatics Center of the Metropolitan Academic Network (IC MAN) in Gdańsk, (d) our 624-processor Beowulf cluster at Baker Laboratory of Chemistry, Cornell University, and (e) our 184-processor Beowulf cluster at the Faculty of Chemistry, University of Gdańsk.

I. INTRODUCTION

II. METHODS

A. Radius of gyration as an order parameter

B. Protein backbone geometry

C. Soliton description of protein-backbone geometry

D. UNRES model of polypeptide chains

III. RESULTS AND DISCUSSION

A. Phase co-existence in protein oligomers

B. Structure of AICD in the AICD/Fe65 dimer

C. Collapse simulations of isolated AICD and Fe65

IV. SUMMARY

### Key Topics

- Proteins
- 118.0
- Polymers
- 14.0
- Protein folding
- 13.0
- Mean field theory
- 9.0
- Solvents
- 8.0

## Figures

Definitions of the variables of the UNRES model. The virtual-bond angle θ i is determined by the three C^{α} carbons at sites i, i + 1, i + 2 and is defined as the angle between the virtual-bond vector and the virtual-bond vector [Eq. (11) ]. It should be noted that, for consistency with the notation of Sec. II B , the angles θ used in this work are complements of the original angles θ, i.e., π − θ (see, e.g. Ref. 39 ). The C^{α} carbon atoms are represented by small open circles. The virtual-bond-dihedral angle γ i it the angle between the two planes, determined by the C^{α} at sites (i, i + 1, i + 2) and (i + 1, i + 2, i + 3) [Eqs. (12) and (13) ]. In addition, the UNRES energy function [Eqs. (32)–(35) ] involves the following structure, shown in the Figure: The interaction sites are peptide-bond centers (p), and side-chain ellipsoids of different sizes (SC) attached to the corresponding α-carbons with different virtual-bond lengths b SC . The UNRES energy is also a function of the coordinates of the SC and p sites which are functions of (θ, γ, α, β) and also contains terms that depend explicitly on these angles.

Definitions of the variables of the UNRES model. The virtual-bond angle θ i is determined by the three C^{α} carbons at sites i, i + 1, i + 2 and is defined as the angle between the virtual-bond vector and the virtual-bond vector [Eq. (11) ]. It should be noted that, for consistency with the notation of Sec. II B , the angles θ used in this work are complements of the original angles θ, i.e., π − θ (see, e.g. Ref. 39 ). The C^{α} carbon atoms are represented by small open circles. The virtual-bond-dihedral angle γ i it the angle between the two planes, determined by the C^{α} at sites (i, i + 1, i + 2) and (i + 1, i + 2, i + 3) [Eqs. (12) and (13) ]. In addition, the UNRES energy function [Eqs. (32)–(35) ] involves the following structure, shown in the Figure: The interaction sites are peptide-bond centers (p), and side-chain ellipsoids of different sizes (SC) attached to the corresponding α-carbons with different virtual-bond lengths b SC . The UNRES energy is also a function of the coordinates of the SC and p sites which are functions of (θ, γ, α, β) and also contains terms that depend explicitly on these angles.

(a) The (N, R g ) distribution of all individual single-chain PDB proteins with resolution less than 2.0 Å and with less than 30% homology. The lower line (ν = 0.37) describes mostly α-helical proteins and the top line is a ν = 3/5 Flory line. There are practically no single-chain proteins above the Flory line. (b) The (N, R g ) distribution for all multi-chain proteins from the current PDB that are above the Flory line. The two clusters given by equation (38) , are clearly visible. The values of R 0 and ν shown in the graphs were determined by linear regression.

(a) The (N, R g ) distribution of all individual single-chain PDB proteins with resolution less than 2.0 Å and with less than 30% homology. The lower line (ν = 0.37) describes mostly α-helical proteins and the top line is a ν = 3/5 Flory line. There are practically no single-chain proteins above the Flory line. (b) The (N, R g ) distribution for all multi-chain proteins from the current PDB that are above the Flory line. The two clusters given by equation (38) , are clearly visible. The values of R 0 and ν shown in the graphs were determined by linear regression.

The spectrum of the virtual-bond and virtual-torsion angles for 1AIK (top) and 2CUO (bottom), using PDB indexing. The black lines and symbols correspond to the virtual-bond angles θ and the red lines and symbols correspond to the virtual-bond dihedral angles γ, respectively.

The spectrum of the virtual-bond and virtual-torsion angles for 1AIK (top) and 2CUO (bottom), using PDB indexing. The black lines and symbols correspond to the virtual-bond angles θ and the red lines and symbols correspond to the virtual-bond dihedral angles γ, respectively.

The distribution of individual chains on the (N, R g ) plane in the second class of phase coexistent hetero-oligomers found here. The data clearly accumulate around the top line that describes the cluster and the bottom line, the latter describing a Θ-point cluster with best-fit values R 0 = 1.234 and ν = 0.508.

The distribution of individual chains on the (N, R g ) plane in the second class of phase coexistent hetero-oligomers found here. The data clearly accumulate around the top line that describes the cluster and the bottom line, the latter describing a Θ-point cluster with best-fit values R 0 = 1.234 and ν = 0.508.

Cartoon representation of the experimental structure of the AICD/Fe65 complex (PDB: 3DXC). Green: The Fe65 (longer) chain. Red: The AICD (shorter) chain. The first and the last residues of each chain are marked. Residue numbers have been taken from the 3DXC structure.

Cartoon representation of the experimental structure of the AICD/Fe65 complex (PDB: 3DXC). Green: The Fe65 (longer) chain. Red: The AICD (shorter) chain. The first and the last residues of each chain are marked. Residue numbers have been taken from the 3DXC structure.

(a) The spectrum of backbone virtual-bond angles θ i (black line) and virtual-torsion angles γ i (red line) for the AICD component of 3DXC (chain B). (b) The same spectrum after application of the transformation of Eq. (22) to reveal the soliton structure. The sites are indexed with residue numbers from the PDB.

(a) The spectrum of backbone virtual-bond angles θ i (black line) and virtual-torsion angles γ i (red line) for the AICD component of 3DXC (chain B). (b) The same spectrum after application of the transformation of Eq. (22) to reveal the soliton structure. The sites are indexed with residue numbers from the PDB.

The two solitons of 3DXC. Residues are indexed with the numbers from the PDB structure. The black line denotes the residue-wise difference between the coordinates computed from the soliton and those computed from the PDB conformation. The red line denotes the Debye-Waller (one standard deviation) fluctuation distance, computed from the B-factors in the PDB. The grey area describes the estimated 0.15 Å zero-point fluctuation distance around the solitons.

The two solitons of 3DXC. Residues are indexed with the numbers from the PDB structure. The black line denotes the residue-wise difference between the coordinates computed from the soliton and those computed from the PDB conformation. The red line denotes the Debye-Waller (one standard deviation) fluctuation distance, computed from the B-factors in the PDB. The grey area describes the estimated 0.15 Å zero-point fluctuation distance around the solitons.

The evolution of the radius of gyration for AICD in 3DXC (chain B), during the propagation of the first soliton from site 680 towards the proline at site 669; see also Figure 6(b) .

The evolution of the radius of gyration for AICD in 3DXC (chain B), during the propagation of the first soliton from site 680 towards the proline at site 669; see also Figure 6(b) .

The (θ i , γ i ) profile of the 3DXC (chain B) after propagation of the first soliton onto the proline at site 669. See Figure 6 for the initial spectrum. The black lines and symbols correspond to the virtual-bond angles θ and the red lines and symbols correspond to the virtual-bond dihedral angles γ, respectively.

The (θ i , γ i ) profile of the 3DXC (chain B) after propagation of the first soliton onto the proline at site 669. See Figure 6 for the initial spectrum. The black lines and symbols correspond to the virtual-bond angles θ and the red lines and symbols correspond to the virtual-bond dihedral angles γ, respectively.

The cartoon pictures of AICD. On the left, the PDB conformation corresponding to the (θ, γ) spectrum in Figure 6(b) and, on the right, the conformation corresponding to the spectrum in Figure 9 .

Plots of the time evolution of the radius of gyration of isolated AICD (top) and isolated Fe65 (bottom) during Langevin molecular dynamics simulation with UNRES.

Plots of the time evolution of the radius of gyration of isolated AICD (top) and isolated Fe65 (bottom) during Langevin molecular dynamics simulation with UNRES.

## Tables

First class of phase co-existent heterodimers. Single proteins but with multiple subchains that are in different phases.

First class of phase co-existent heterodimers. Single proteins but with multiple subchains that are in different phases.

Second class of phase co-existent heterodimers; more than one protein with subchains in different phases.

Second class of phase co-existent heterodimers; more than one protein with subchains in different phases.

Parameter values for the two solitons implied in Figure 6(b) . For virtual-bond angles, Eq. (31) is used. For virtual-torsion angles Eq. (28) is used. It should be noted that the values of both θ and γ in these two equations are defined mod (2π). The large values of M enable us to describe the irregular structures in Figure 6(b) . These irregularities are due entirely to multi-valuedness of the angular variables.

Parameter values for the two solitons implied in Figure 6(b) . For virtual-bond angles, Eq. (31) is used. For virtual-torsion angles Eq. (28) is used. It should be noted that the values of both θ and γ in these two equations are defined mod (2π). The large values of M enable us to describe the irregular structures in Figure 6(b) . These irregularities are due entirely to multi-valuedness of the angular variables.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content