Resistant Regression

Usage

lqs(x, ...)
lqs.formula(formula, data = NULL, ...,
            method = c("lts", "lqs", "lms", "S", "model.frame"),
            subset, na.action = na.fail, model = TRUE, 
            x = FALSE, y = FALSE, contrasts = NULL)
lqs.default(x, y, intercept, method = c("lts", "lqs", "lms", "S"),
            quantile, control = lqs.control(...), k0 = 1.548, seed, ...)
lmsreg(...)
ltsreg(...)
print.lqs(x, digits, ...)
residuals.lqs(x)

Arguments

formula a formula of the form y ~ x1 + x2 + ...{}{}.
data data frame from which variables specified in formula are preferentially to be taken.
subset An index vector specifying the cases to be used in fitting. (NOTE: If given, this argument must be named exactly.)
na.action A function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to omission of cases with missing values on any required variable. (NOTE: If given, this argument must be named exactly.)
x a matrix or data frame containing the explanatory variables.
y the response: a vector of length the number of rows of x.
intercept should the model include an intercept?
method the method to be used. model.frame returns the model frame: for the others see the Details section. Using lmsreg or ltsreg forces "lms" and "lts" respectively.
quantile the quantile to be used: see Details. This is over-ridden if method = "lms".
control additional control items: see Details.
seed the seed to be used for random sampling: see .Random.seed. The current value of .Random.seed will be preserved if it is set..
... arguments to be passed to lqs.default or lqs.control.

Description

Fit a regression to the good points in the dataset, thereby achieving a regression estimator with a high breakdown point. lmsreg and ltsreg are compatibility wrappers.

Details

Suppose there are n data points and p regressors, including any intercept.

The first three methods minimize some function of the sorted squared residuals. For methods "lqs" and "lms" is the quantile squared residual, and for "lts" it is the sum of the quantile smallest squared residuals. "lqs" and "lms" differ in the defaults for quantile, which are floor((n+p+1)/2) and floor((n+1)/2) respectively. For "lts" the default is `floor(n/2) + floor((p+1)/2)'.

The "S" estimation method solves for the scale s such that the average of a function chi of the residuals divided by s is equal to a given constant.

The control argument is a list with components item{psamp}{ the size of each sample. Defaults to p. } item{nsamp}{ the number of samples or "best" or "exact" or "sample". If "sample" the number chosen is min(5*p, 3000), taken from Rousseeuw and Hubert (1997). If "best" exhaustive enumeration is done up to 5000 samples: if "exact" exhaustive enumeration will be attempted however many samples are needed. } item{adjust}{ should the intercept be optimized for each sample? }

Value

An object of class "lqs".

NOTE

There seems no reason other than historical to use the lms and lqs options. LMS estimation is of low efficiency (converging at rate n^{-1/3}) whereas LTS has the same asymptotic efficiency as an M estimator with trimming at the quartiles (Marazzi, 1993, p.201). LQS and LTS have the same maximal breakdown value of (floor((n-p)/2) + 1)/n attained if floor((n+p)/2) <= quantile <= floor((n+p+1)/2). The only drawback mentioned of LTS is greater computation, as a sort was thought to be required (Marazzi, 1993, p.201) but this is not true as a partial sort can be used (and is used in this implementation).

Adjusting the intercept for each trial fit does need the residuals to be sorted, and may be significant extra computation if n is large and p small.

Opinions differ over the choice of psamp. Rousseeuw and Hubert (1997) only consider p; Marazzi (1993) recommends p+1 and suggests that more samples are better than adjustment for a given computational limit.

The computations are exact for a model with just an intercept and adjustment, and for LQS for a model with an intercept plus one regressor and exhaustive search with adjustment. For all other cases the minimization is only known to be approximate.

Author(s)

B.D. Ripley

References

P. J. Rousseeuw and A. M. Leroy (1987) Robust Regression and Outlier Detection. Wiley.

A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth and Brooks/Cole.

P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. In L1-Statistical Procedures and Related Topics ed Y. Dodge, IMS Lecture Notes volume 31, pp. 201-214.

See Also

predict.lqs

Examples

data(stackloss)
.Random.seed <- 1:4
lqs(stack.loss ~ ., data=stackloss)
lqs(stack.loss ~ ., data=stackloss, method="S", nsamp="exact")


[Package Contents]