Home page for Tod Landis
 
 News flash 1-16-08:  Sun has acquired MySQL
 
  
Fitting Lines and Curves to Data
January 15, 2008
 
In my last post I wrote about the new Entrance chart types, and before that about some of the gems in the editor written by Shane Duan.  This time I will focus on three new tools we have provided for fitting lines, quadratic curves, and exponential curves to data.  We will be demonstrating these tools at the MySQL Conference and Expo in April.  
 
 
 
How to do it
The red curve above is a quadratic function fit to the yearly average consumer price index as a function of year.  Blue dots correspond to the data values themselves.  To reproduce this chart:   open the CPI sample data table and bring up Tools | Fit a quardratic...  Change the dialog settings to match these:
 
 
 
Click OK and you have the plot. You also have a new table column containing the function values used to make the plot.:
 
 
 
Interpolating/extrapolating values
The column of values created by a "fit" tool can be used to replace NULL values by interpolation.  For example, here is a table of "squares" missing most values:
 
 
 
 
The "Fit a quadratic..." tool uses the non-null rows to calculate a fit, then calculates the missing values.  Here is the result:
 
 
 
Some details
Behind the scenes, Entrance calculates function parameters by solving a least squares problem using the QR Algorithm provided by JAMA.  The JAMA matrix package is now bundled with Entrance.  
 
Details about how the equations are set up are in the LineFit, QuadraticFit, and ExponentialFit class sources.  In the case of LineFit, a and b are found such that the line
     a * x + b
fits the data.  QuadraticFit finds a,b, and c fitting
     a * (x*x) + b * x + c
to the data, and ExponentialFit finds a and b fitting:
     a * exp(b * x)
 
It took just a few seconds to calculate parameters for fitting functions to values in 100,000 rows on my iMac.  The computing time for finding a solution should grow linearly with the number of rows.  (QR costs 2mn^^2, flops, and in our case n = 2 or 3)
 
There are a few cases that will break the  code.  The line and exponential fit calculations need at least two points, and the the quadratic calculation needs three.   They also need to be "reasonablly"  spread out  (IZZ™ll be providing examples as time goes on).
 
There are more details about JAMA at http://math.nist.gov/javanumerics/jama/    
 
 
Copyright (c) 2008 Tod Landis, All Rights Reserved
 
MySQL is a registered trademark of MySQL AB in the United States, the European Union and other countries.  
 
 
 
Click here for more info about Entrance:  http://dbentrance.com/