^{1,a)}, Pallav Kosuri

^{2}, Vicente Parot

^{3}and Julio M. Fernandez

^{4}

### Abstract

Atomic force microscopy force spectroscopy has become a powerful biophysical technique for probing the dynamics of proteins at the single molecule level. Extending a polyprotein at constant velocity produces the now familiar sawtooth pattern force-length relationship. Customarily, manual fits of the wormlike chain (WLC) model of polymerelasticity to sawtooth pattern data have been used to measure the contour length of the protein as it unfolds one module at a time. The change in the value of measures the number of amino acids released by an unfolding protein and can be used as a precise locator of the unfolding transition state. However, manual WLC fits are slow and introduce inevitable operator-driven errors which reduce the accuracy of the estimates. Here we demonstrate an extended Kalman filter that provides operator-free real time estimates of from sawtooth pattern data. The filter design is based on a cantilever-protein arrangement modeled by a simple linear time-invariant cantilever model and by a nonlinear force-length relationship function for the protein. The resulting Kalman filter applied to sawtooth pattern data demonstrates its real time, operator-free ability to accurately measure . These results are a marked improvement over the earlier techniques and the procedure is easily extended or modified to accommodate further quantities of interest in force spectroscopy.

I. INTRODUCTION

II. THEORY AND EXPERIMENTS

A. Single-molecule force spectroscopy

B. Mechanical model of the cantilever-protein system

C. EKF estimates of the contour length of an unfolding protein

D. EKF recovers from synthetic data.

E. EKF measurements of contour length increments in an unfolding polyprotein

III. DISCUSSION

## Figures

Single molecule AFM implementation of a KF, for the measurement of the contour length of a protein. (a) Diagram of the experimental arrangement. The deflection of a laser beam by a cantilever measures the force generated by a single protein being stretched by the motion of a piezoelectric actuator. (b) Diagram of the KF implementation. A physical model of the cantilever-protein arrangement (left) is used to generate an *a priori* estimate of the state variables of the measuring system. The *a priori* state variables are used to generate an estimate of the measured cantilever deflection , which is compared with the measured value. The resulting error is multiplied by the optimized Kalman gain to produce a refined *a posteriori* estimate of the state variable ; the contour length of the protein. The KF optimizes the state variable estimates in order to minimize the variance of the error.

Single molecule AFM implementation of a KF, for the measurement of the contour length of a protein. (a) Diagram of the experimental arrangement. The deflection of a laser beam by a cantilever measures the force generated by a single protein being stretched by the motion of a piezoelectric actuator. (b) Diagram of the KF implementation. A physical model of the cantilever-protein arrangement (left) is used to generate an *a priori* estimate of the state variables of the measuring system. The *a priori* state variables are used to generate an estimate of the measured cantilever deflection , which is compared with the measured value. The resulting error is multiplied by the optimized Kalman gain to produce a refined *a posteriori* estimate of the state variable ; the contour length of the protein. The KF optimizes the state variable estimates in order to minimize the variance of the error.

Mechanical properties of the AFM cantilever and of an extending polyprotein. (a) Under thermal equilibrium and free from any molecules attached to it, the typical AFM cantilever used in our experiments shows a principal resonant peak at . It is much more pronounced with higher modes when the cantilever is free (black line) than when it is near the surface (blue line). A fit of the frequency response function of the simple cantilever model described in Eq. (3) (dotted red line), produces a reasonable description of the data derived from and . (b) Force-length relationship of a single ubiquitin polyprotein as it extends in a single molecule AFM experiment, as depicted in Fig. 1(a). The characteristic sawtooth pattern shape of these traces results from a series of sequential protein unfolding events which increase the contour length of the polyprotein at each peak, by an amount that is protein specific. The contour length increases are obtained by fitting the force-length relationship up to each force peak, using the WLC model of polymer elasticity [WLC, Eq. (4), dashed red lines]. Consecutive least-squares fits to the data show that the contour length increases from and up to in discrete increases of .

Mechanical properties of the AFM cantilever and of an extending polyprotein. (a) Under thermal equilibrium and free from any molecules attached to it, the typical AFM cantilever used in our experiments shows a principal resonant peak at . It is much more pronounced with higher modes when the cantilever is free (black line) than when it is near the surface (blue line). A fit of the frequency response function of the simple cantilever model described in Eq. (3) (dotted red line), produces a reasonable description of the data derived from and . (b) Force-length relationship of a single ubiquitin polyprotein as it extends in a single molecule AFM experiment, as depicted in Fig. 1(a). The characteristic sawtooth pattern shape of these traces results from a series of sequential protein unfolding events which increase the contour length of the polyprotein at each peak, by an amount that is protein specific. The contour length increases are obtained by fitting the force-length relationship up to each force peak, using the WLC model of polymer elasticity [WLC, Eq. (4), dashed red lines]. Consecutive least-squares fits to the data show that the contour length increases from and up to in discrete increases of .

Analysis of the extended KF implementation. (a) Simulated data created for the purpose of examining the behavior of the observer, generated by stepwise changes in the contour length and utilizing the same system model described in the text. Independent Gaussian noise with a variance of 15 pN was used to simulate the measurement process. (b) A comparison of the contour length estimates from extended Kalman filters based upon different values of persistence length, compared against the true contour length (dotted line). The convergence behavior of the estimate is strongly dependent on the value of the persistence length. For the true value of the persistence length, the estimate quickly converges to the true . If the persistence length is off, slower convergence results and the manner of convergence identifies whether the persistence length was too large or small.

Analysis of the extended KF implementation. (a) Simulated data created for the purpose of examining the behavior of the observer, generated by stepwise changes in the contour length and utilizing the same system model described in the text. Independent Gaussian noise with a variance of 15 pN was used to simulate the measurement process. (b) A comparison of the contour length estimates from extended Kalman filters based upon different values of persistence length, compared against the true contour length (dotted line). The convergence behavior of the estimate is strongly dependent on the value of the persistence length. For the true value of the persistence length, the estimate quickly converges to the true . If the persistence length is off, slower convergence results and the manner of convergence identifies whether the persistence length was too large or small.

Experiments with ubiquitin polyproteins at an extension rate of 400 nm/s produced 190 sawtooth traces for analysis. (a) A sample trace from the data set. The final peak is a result of the dissociation of the protein from the cantilever tip. These peaks were excluded from the analysis in step sizes. (b) The resulting estimate of the contour length of the protein, corresponding to the data in part (a). The inset enlarges one of the steps in , showing that the convergence behavior does not match any of those from Fig. 3(b). The overshoot that occurs immediately after the step most likely implies an error in the protein model and confirms the expected failure of the WLC model at low forces. (c) The distribution and statistics of the estimated changes in contour lengths during unfolding compared with earlier results fit by hand with the WLC model, as in Fig. 2(b). The data for the hand-fitted steps is from Carrion-Vazquez *et al.* (Ref. 17). A noticeable skew is observed in the EKF step size histogram. Therefore the fit plotted is not a Gaussian distribution as in the case of the hand-fitted data but a generalized extreme value distribution. The parameters of the distribution are: and . , , and are the shape, scale, and location parameters, respectively. The maximum likelihood estimates and variances are listed in the figure for each approach. In the case of the Gaussian fit, the maximum likelihood estimate is equivalent to the mean.

Experiments with ubiquitin polyproteins at an extension rate of 400 nm/s produced 190 sawtooth traces for analysis. (a) A sample trace from the data set. The final peak is a result of the dissociation of the protein from the cantilever tip. These peaks were excluded from the analysis in step sizes. (b) The resulting estimate of the contour length of the protein, corresponding to the data in part (a). The inset enlarges one of the steps in , showing that the convergence behavior does not match any of those from Fig. 3(b). The overshoot that occurs immediately after the step most likely implies an error in the protein model and confirms the expected failure of the WLC model at low forces. (c) The distribution and statistics of the estimated changes in contour lengths during unfolding compared with earlier results fit by hand with the WLC model, as in Fig. 2(b). The data for the hand-fitted steps is from Carrion-Vazquez *et al.* (Ref. 17). A noticeable skew is observed in the EKF step size histogram. Therefore the fit plotted is not a Gaussian distribution as in the case of the hand-fitted data but a generalized extreme value distribution. The parameters of the distribution are: and . , , and are the shape, scale, and location parameters, respectively. The maximum likelihood estimates and variances are listed in the figure for each approach. In the case of the Gaussian fit, the maximum likelihood estimate is equivalent to the mean.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content