Full text loading...

^{1,a)}, Trinh Xuan Hoang

^{2}and Amos Maritan

^{3}

### Abstract

Proteins, chain molecules of amino acids, behave in ways which are similar to each other yet quite distinct from standard compact polymers. We demonstrate that the Flory theorem, derived for polymer melts, holds for compact protein native state structures and is not incompatible with the existence of structured building blocks such as helices and strands. We present a discussion on how the notion of the thickness of a polymer chain, besides being useful in describing a chain molecule in the continuum limit, plays a vital role in interpolating between conventional polymer physics and the phase of matter associated with proteinstructures.

This work was supported by PRIN 2003, INFN, NASA, NSF IGERT Grant No. DGE-9987589, NSF MRSEC at Pennsylvania State, and the NSC of Vietnam (Grant No. 410704).

### Key Topics

- Proteins
- 28.0
- Polymers
- 23.0
- Polymer structure
- 8.0
- Acids
- 5.0
- Secondary structure
- 5.0

## Figures

Log-log plot of the radius of gyration of a set of 700 proteins obtained from the Protein Data Bank (PDB) of (Ref. 11) versus their length or the number of constituent amino acids.

Log-log plot of the radius of gyration of a set of 700 proteins obtained from the Protein Data Bank (PDB) of (Ref. 11) versus their length or the number of constituent amino acids.

Log-log plot of the end-to-end distance versus for protein segments. The plot was obtained by averaging over all segments of length selected from the data set depicted in Fig. 1. For a given , was determined as an average over all segments of that length in proteins whose lengths are greater than , in order to avoid finite-size effects (Ref. 9). The error bars are of the order of the size of the symbols. Note the plateau which indicates that is only slowly increasing with around 24. For values of larger than 48, we find that .

Log-log plot of the end-to-end distance versus for protein segments. The plot was obtained by averaging over all segments of length selected from the data set depicted in Fig. 1. For a given , was determined as an average over all segments of that length in proteins whose lengths are greater than , in order to avoid finite-size effects (Ref. 9). The error bars are of the order of the size of the symbols. Note the plateau which indicates that is only slowly increasing with around 24. For values of larger than 48, we find that .

Statistics of the end-to-end distance of segments of proteins of length . For , 64, and 80, the distributions show a nice collapse to the form expected for Gaussian statistics: the solid line denotes the function , where . For , where the presence of secondary motifs play a major role, the distribution is qualitatively different from the other sizes and exhibits a peak arising from the presence of helices.

Statistics of the end-to-end distance of segments of proteins of length . For , 64, and 80, the distributions show a nice collapse to the form expected for Gaussian statistics: the solid line denotes the function , where . For , where the presence of secondary motifs play a major role, the distribution is qualitatively different from the other sizes and exhibits a peak arising from the presence of helices.

Plot of the tangent-tangent and binormal-binormal correlation functions along the protein sequence derived from our data set. The tangent vector at location is defined as an unit vector pointing along the line joining the positions of the and the amino acids. The normal vector is defined by joining the location to the center of the circle drawn through three amino acid locations. The binormal is perpendicular to the plane defined by the tangent and the normal. Note that: (a) the negative tangent-tangent correlation at sequence separation around 13 corresponds to a turning back, on average, of the chain direction and is related to the crossover shown in Fig. 2; (b) the binormal-binormal correlation remains nonzero for large separations.

Plot of the tangent-tangent and binormal-binormal correlation functions along the protein sequence derived from our data set. The tangent vector at location is defined as an unit vector pointing along the line joining the positions of the and the amino acids. The normal vector is defined by joining the location to the center of the circle drawn through three amino acid locations. The binormal is perpendicular to the plane defined by the tangent and the normal. Note that: (a) the negative tangent-tangent correlation at sequence separation around 13 corresponds to a turning back, on average, of the chain direction and is related to the crossover shown in Fig. 2; (b) the binormal-binormal correlation remains nonzero for large separations.

Histogram of the magnitudes of the average tangent and binormal vectors for each protein in our data set. For each protein, we measured the magnitude as , where is either the unit tangent or the unit binormal vector at location and is the number of such vectors for a given protein. For comparison, a histogram of the magnitudes of the average of randomly oriented vectors is shown as the shaded histogram. (Here was selected to be a randomly oriented unit vector.) Note that several proteins have a significant nonzero mean binormal vector due to the presence of helices.

Histogram of the magnitudes of the average tangent and binormal vectors for each protein in our data set. For each protein, we measured the magnitude as , where is either the unit tangent or the unit binormal vector at location and is the number of such vectors for a given protein. For comparison, a histogram of the magnitudes of the average of randomly oriented vectors is shown as the shaded histogram. (Here was selected to be a randomly oriented unit vector.) Note that several proteins have a significant nonzero mean binormal vector due to the presence of helices.

(a) Statistics of the end-to-end distance of segments of length taken from model protein structures (Ref. 18) and from PDB structures. The peak in the distributions arises from the presence of helices. (b) Same as Fig. 3 but for segments of the model structures of lengths and 12. The fits to the Gaussian form given in the caption of Fig. 3 yield for and for .

(a) Statistics of the end-to-end distance of segments of length taken from model protein structures (Ref. 18) and from PDB structures. The peak in the distributions arises from the presence of helices. (b) Same as Fig. 3 but for segments of the model structures of lengths and 12. The fits to the Gaussian form given in the caption of Fig. 3 yield for and for .

Article metrics loading...

Commenting has been disabled for this content