formula: Model Formulae (2024)

formula

R Documentation

Description

The generic function formula and its specific methods provide away of extracting formulae which have been included in other objects.

as.formula is almost identical, additionally preservingattributes when object already inherits from"formula".

Usage

formula(x, ...)DF2formula(x, env = parent.frame())as.formula(object, env = parent.frame())## S3 method for class 'formula'print(x, showEnv = !identical(e, .GlobalEnv), ...)

Arguments

`x, object`	R object, for `DF2formula()` a `data.frame`.
`...`	further arguments passed to or from other methods.
`env`	the environment to associate with the result, if notalready a formula.
`showEnv`	logical indicating if the environment should be printedas well.

Details

The models fitted by, e.g., the lm and glmfunctions are specified in a compact symbolic form.The ~ operator is basic in the formation of such models.An expression of the form y ~ model is interpretedas a specification that the response y is modelledby a linear predictor specified symbolically by model.Such a model consists of a series of terms separatedby + operators.The terms themselves consist of variable and factornames separated by : operators.Such a term is interpreted as the interaction ofall the variables and factors appearing in the term.

In addition to + and :, a number of other operators areuseful in model formulae.

The * operator denotes factor crossing: a*b isinterpreted as a + b + a:b.
The ^operator indicates crossing to the specified degree. For example(a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turnexpands to a formula containing the main effects for a,b and c together with their second-order interactions.
See Also
formula function - RDocumentation The R Formula Method: The Good Parts R Formula Tutorial For Beginner
The %in% operator indicates that the terms on its left arenested within those on the right. For example a + b %in% aexpands to the formula a + a:b.
The / operator provides a shorthand, so thata / b is equivalent to a + b %in% a.
The - operator removes the specified terms, hence(a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c.It can also used to remove the intercept term: when fitting a linearmodel y ~ x - 1 specifies a line through the origin.A model with no intercept can be also specified as y ~ x + 0or y ~ 0 + x.

While formulae usually involve just variable and factornames, they can also involve arithmetic expressions.The formula log(y) ~ a + log(x) is quite legal.When such arithmetic expressions involveoperators which are also used symbolicallyin model formulae, there can be confusion betweenarithmetic and symbolic operator use.

To avoid this confusion, the function I()can be used to bracket those portions of a modelformula where the operators are used in theirarithmetic sense. For example, in the formulay ~ a + I(b+c), the term b+c is to beinterpreted as the sum of b and c.

Variable names can be quoted by backticks `like this` informulae, although there is no guarantee that all code using formulaewill accept such non-syntactic names.

Most model-fitting functions accept formulae with right-hand-sideincluding the function offset to indicate terms with afixed coefficient of one. Some functions accept other‘specials’ such as strata or cluster (see thespecials argument of terms.formula).

There are two special interpretations of . in a formula. Theusual one is in the context of a data argument of modelfitting functions and means ‘all columns not otherwise in theformula’: see terms.formula. In the context ofupdate.formula, only, it means ‘what waspreviously in this part of the formula’.

When formula is called on a fitted model object, either aspecific method is used (such as that for class "nls") or thedefault method. The default first looks for a "formula"component of the object (and evaluates it), then a "terms"component, then a formula parameter of the call (and evaluatesits value) and finally a "formula" attribute.

There is a formula method for data frames. When there's"terms" attribute with a formula, e.g., for amodel.frame(), that formula is returned. If you'd like theprevious (R <= 3.5.x) behavior, use the auxiliaryDF2formula() which does not consider a "terms" attribute.Otherwise, ifthere is onlyone column this forms the RHS with an empty LHS. For more columns,the first column is the LHS of the formula and the remaining columnsseparated by + form the RHS.

Value

All the functions above produce an object of class "formula"which contains a symbolic model formula.

Environments

A formula object has an associated environment, andthis environment (rather than the parentenvironment) is used by model.frame to evaluate variablesthat are not found in the supplied data argument.

Formulas created with the ~ operator use theenvironment in which they were created. Formulas created withas.formula will use the env argument for theirenvironment.

Note

In R versions up to 3.6.0, character x of lengthmore than one were parsed as separate lines of R code and the firstcomplete expression was evaluated into a formula when possible. Thissilently truncates such vectors of characters inefficiently and to someextent inconsistently as this behaviour had been undocumented. For thisreason, such use has been deprecated. If you must work via characterx, do use a string, i.e., a character vector of length one.

E.g., eval(call("~", quote(foo + bar))) has been an order of magnitudemore efficient than formula(c("~", "foo + bar")).

Further, character “expressions” needing an eval()to return a formula are now deprecated.

References

Chambers, J. M. and Hastie, T. J. (1992)Statistical models.Chapter 2 of Statistical Models in Seds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Examples

class(fo <- y ~ x1*x2) # "formula"fotypeof(fo) # R internal : "language"terms(fo)environment(fo)environment(as.formula("y ~ x"))environment(as.formula("y ~ x", env = new.env()))## Create a formula for a model with a large number of variables:xnam <- paste0("x", 1:25)(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))