Curve Fitting Toolbox Previous page   Next Page

Determining the Best Fit

To determine the best fit, you should examine both the graphical and numerical fit results.

Examining the Graphical Fit Results

Your initial approach in determining the best fit should be a graphical examination of the fits and residuals. The graphical fit results shown below indicate that

Use the Plotting GUI to remove exp1 from the scatter plot display.

Because the goal of fitting the census data is to extrapolate the best fit to predict future population values, you should explore the behavior of the fits up to the year 2050. You can change the axes limits of the Curve Fitting Tool by selecting the menu item Tools->Axes Limit Control.

The census data and fits are shown below for an upper abscissa limit of 2050. The behavior of the sixth degree polynomial fit beyond the data range makes it a poor choice for extrapolation.

As you can see, you should exercise caution when extrapolating with polynomial fits because they can diverge wildly outside the data range.

Examining the Numerical Fit Results

Because you can no longer eliminate fits by examining them graphically, you should examine the numerical fit results. There are two types of numerical fit results displayed in the Fitting GUI: goodness of fit statistics and confidence intervals on the fitted coefficients. The goodness of fit statistics help you determine how well the curve fits the data. The confidence intervals on the coefficients determine their accuracy.

Some goodness of fit statistics are displayed in the Results area of the Fit Editor for a single fit. All goodness of fit statistics are displayed in the Table of Fits for all fits, which allows for easy comparison.

In this example, the sum of squares due to error (SSE) and the adjusted R-square statistics are used to help determine the best fit. As described in Goodness of Fit Statistics, the SSE statistic is the least squares error of the fit, with a value closer to zero indicating a better fit. The adjusted R-square statistic is generally the best indicator of the fit quality when you add additional coefficients to your model.

You can modify the information displayed in the Table of Fits with the Table Options GUI. You open this GUI by clicking the Table options button on the Fitting GUI. As shown below, select the adjusted R-square statistic and clear the R-square statistic.

The numerical fit results are shown below. You can click the Table of Fits column headings to sort by statistics results.

The SSE for exp1 indicates it is a poor fit, which was already determined by examining the fit and residuals. The lowest SSE value is associated with poly6. However, the behavior of this fit beyond the data range makes it a poor choice for extrapolation. The next best SSE value is associated with the fifth degree polynomial fit, poly5, suggesting it may be the best fit. However, the SSE and adjusted R-square values for the remaining polynomial fits are all very close to each other. Which one should you choose?

To resolve this issue, examine the confidence bounds for the remaining fits. By default, 95% confidence bounds are calculated. You can change this level by selecting the menu item View->Confidence Level from the Curve Fitting Tool.

The p1, p2, and p3 coefficients for the fifth degree polynomial suggest that it overfits the census data. However, the confidence bounds for the quadratic fit, poly2, indicate that the fitted coefficients are known fairly accurately. Therefore, after examining both the graphical and numerical fit results, it appears that you should use poly2 to extrapolate the census data.

For more information about confidence bounds, refer to Confidence and Prediction Bounds.


Previous page  Fitting the Data Saving the Fit Results Next page

 © 1994-2004 The MathWorks, Inc.     -    Trademarks    -    Privacy Policy