Software for regression modelling

Here you can download some regression modelling software. Newer versions will be released from time to time (anyone is welcome to suggest a modification or report a bug).

ARESLab: Adaptive Regression Splines toolbox for Matlab/Octave

Version 1.5.1 (June 2, 2011) - download

[Picture: Two surfaces] ARESLab is a Matlab/Octave toolbox for building piecewise-linear and piecewise-cubic regression models using Jerome Friedman's Multivariate Adaptive Regression Splines technique (also known as MARS). The toolbox allows building models using different settings, testing them on a separate test set or using k-fold Cross-Validation, using them for prediction, outputting equations for deployment, plotting the models etc. The built models can also be used as metamodels (also known as surrogate models) for design optimization tasks.

Multivariate Adaptive Regression Splines have the ability to model very complex and high-dimensional data dependencies. The model takes the form of an expansion in product spline basis functions, where the number of basis functions as well as the parameters associated with each one (product degree and knot locations) are automatically determined by the data through a forward/backward iterative approach.

The toolbox code is licensed under the GNU GPL licence.

Reference manual can be downloaded here (it is also included in the ARESLab archive file).

M5PrimeLab: M5' regression tree and model tree toolbox for Matlab/Octave

Version 1.0.1 (September 3, 2010) - download

[Picture: Bonzai] M5PrimeLab is a Matlab/Octave toolbox for building regression trees and model trees using M5' method. The toolbox allows building the trees using different settings, testing them on a separate test set or using k-fold Cross-Validation, using them for prediction, outputting them in a user-readable way etc. M5PrimeLab accepts input variables to be continuous, binary, and categorical, as well as manages missing values.

Model trees combine a conventional regression tree with the possibility of linear regression functions at the leaves. This representation usually provides higher accuracy than regression trees but preserves the advantage of clear and easy-to-interpret structure.

The toolbox code is licensed under the GNU GPL licence.

Reference manual can be downloaded here (it is also included in the M5PrimeLab archive file).

VariReg

Version 0.10.2 (May 11, 2010) - download

[Picture: VariReg screenshot] This software tool (Windows executable) is developed for work with numerical datasets performing regression modelling using different kinds of regression techniques, evaluating them using Hold-Out or Cross-Validation (CV), and 3D-visualizing the surfaces of the models. The main emphasis is on methods used in metamodelling / surrogate modelling. In this context, the software can be employed for building metamodels / surrogate models for evaluation and comparison of the different techniques as well as for further use in what-if analysis, design optimization, design space exploration etc. The full and sparse polynomial models can be represented in a "spreadsheet-friendly" way. All the regression modelling methods implemented in VariReg can also be put to work from Matlab environment and from a command line. (Note that the implementations in VariReg can be considerably faster (even orders of magnitude) than the same methods scripted in Matlab.)

VariReg also provides means for optimizing the values of output variables using the built regression models as objective functions (i.e., as surrogates). In the current version of VariReg the following optimization algorithms are implemented: Particle Swarm Optimization and a simple Grid Search.

The software is developed for non-commercial research and educational use.

User's manual can be downloaded here (it is also included in the VariReg archive file).

Currently the regression modelling techniques implemented in VariReg (and available from Matlab) are the following:

  • "Full" polynomials of any user-predefined degree;
  • Sparse polynomials created using Sequential Forward Selection (SFS) with F-test, Corrected Akaike's Information Criterion (AICC), Bayesian Information Criterion (BIC) (which is equal to the "two-part" Minimum Description Length criterion, MDL), Generalized Cross-Validation criterion (GCV), or CV;
  • Sparse polynomials created using Steepest Descent Hill Climbing or Random Restart Hill Climbing with AICC, BIC, GCV, or CV;
  • Sparse polynomials created using Sequential Floating Forward Selection (SFFS) with AICC, BIC, GCV, or CV;
  • Sparse polynomials built by Floating Adaptive Basis Function Construction (F-ABFC) with AICC, BIC, or GCV;
  • Ensembles of models built by F-ABFC - EF-ABFC;
  • Locally Weighted Polynomials (LWP), also called Locally Weighted Regression or Moving Least Squares, of any degree with Gaussian weight function of bandwidth found using Leave-One-Out Cross-Validation (LOOCV) (the best degree can also be found using LOOCV);
  • k-Nearest Neighbours (k-NN);
  • Radial Basis Function (RBF) interpolation using multiquadric, thin plate spline or Gaussian basis functions with LOOCV-ed smoothing parameter;
  • Shepard interpolation;
  • Kriging interpolation using different correlation functions including exponential, Gaussian, linear, spherical, and cubic spline;
  • Multivariate Adaptive Regression Splines (MARS);
  • Polynomial Neural Networks (PNN) induced by Group Method of Data Handling (GMDH) using AICC, BIC, GCV, CV, or Hold-Out with different kinds of options for neuron inputs, layer sizes, as well as subset selection algorithms for individual neurons;
  • Model averaging/ensembling/combining techniques (combining together the predictions of the modelling methods above):
    • a simple unweighted average;
    • averaging weighted by LOOCV error;
    • averaging weighted by LOOCV error variance;
    • averaging weighted by LOOCV error correlation;
    • averaging by Stacking.

Source code of ABFC for Matlab/Octave

Version 1.3 (March 21, 2010) - download

The zip file includes full Matlab/Octave source code implementing F-ABFC and EF-ABFC for adaptive regression modelling. The implemented versions of the methods are those which are described in the Machine Learning book chapter.

Unfortunately the implementations in Matlab/Octave can be orders of magnitude slower than those in the VariReg tool.

Source code of F-ABFC for Wireless Sensor Networks in Java

Version 1.0 (October 28, 2008) - download

The zip file includes full source code of software called Adaptive Regression for Wireless Sensor Networks (ARWSN) with a simple and easy to understand implementation of F-ABFC in Java. The implemented version of F-ABFC is that which is described in the IADIS2008 paper. The source code is tested on the Intel Berkeley data. While the code is originally written for Wireless Sensor Networks, it is also adaptable to any other regression modelling problem. Author of the ARWSN software is Parisa Jalili Marandi.

Source code of Locally Weighted Polynomials for Matlab/Octave

Version 1.3 (February 10, 2010) - download

The zip file includes full Matlab/Octave source code of Locally Weighted Polynomials (also called Locally Weighted Regression or Moving Least Squares) with Gaussian weight function. The parameter of the weight function can also be found using Leave-One-Out Cross-Validation.

LWP approximation is designed to address situations in which models of global behaviour do not perform well or cannot be effectively applied without undue effort. The LWP approximation is carried out by pointwise fitting of low-degree polynomials to localized subsets of the data. The advantage of this method is that the analyst is not required to specify a global function. However, the method requires relatively high computational resources when finally predicting output values at the query points.

Source code of Radial Basis Function interpolation for Matlab/Octave

Version 1.1 (August 12, 2009) - download

The zip file includes full Matlab/Octave source code of Radial Basis Function interpolation with biharmonic, multiquadric, inverse multiquadric, thin plate spline, or Gaussian basis functions with or without the polynomial term.

RBF interpolation uses a series of basis functions that are symmetric and centred at each sampling point. Radial basis functions are a special class of functions with their main feature being that their response decreases (or increases) monotonically with distance from a central point. The centre, the distance scale, and the precise shape are parameters of the model.

Source code of GMDH-type Polynomial Neural Networks for Matlab/Octave

Version 1.5 (June 2, 2011) - download

[Picture: GMDH network examples] The zip file includes full Matlab/Octave source code of a simple GMDH-type Polynomial Neural Network building algorithm. The implemented algorithm iteratively builds a polynomial neural network layer-by-layer using training data while the exact structure (connectivity) and size (number of layers) of the network is controlled by an evaluation criterion - either measuring performance in an additional validation data (i.e. using the so-called "regularity criterion") or explicitly taking network's complexity into account (using the Corrected Akaike's Information Criterion or Minimum Description Length). Algorithm's parameters also include maximal number of inputs for individual neurons, degree of polynomials in the neurons, whether to allow the neurons to have inputs not only from the immediately preceding layer but also from the original input variables, number of neurons in a layer, whether to decrease the number of neurons in each subsequent layer etc.

Source code of Shepard interpolation for Matlab/Octave

The zip file includes full source code of a simple Shepard interpolation written in Matlab/Octave - download.

Shepard interpolation is a subset of inverse distance weighting methods and may be viewed as a special case of RBF interpolation. While it is rarely as accurate as other RBF interpolations, it is simple and computationally very efficient (does not require to solve linear equations).



Gints Jekabsons, Dr.sc.ing.

Riga Technical University

Faculty of Computer Science and Information Technology

Institute of Applied Computer Systems

Meza str. 1/3, LV-1048, Riga, Latvia