formula | R Documentation |
Description
The generic function formula
and its specific methods provide away of extracting formulae which have been included in other objects.
as.formula
is almost identical, additionally preservingattributes when object
already inherits from"formula"
.
Usage
formula(x, ...)DF2formula(x, env = parent.frame())as.formula(object, env = parent.frame())## S3 method for class 'formula'print(x, showEnv = !identical(e, .GlobalEnv), ...)
Arguments
x, object | R object, for |
... | further arguments passed to or from other methods. |
env | the environment to associate with the result, if notalready a formula. |
showEnv | logical indicating if the environment should be printedas well. |
Details
The models fitted by, e.g., the lm
and glm
functions are specified in a compact symbolic form.The ~
operator is basic in the formation of such models.An expression of the form y ~ model
is interpretedas a specification that the response y
is modelledby a linear predictor specified symbolically by model
.Such a model consists of a series of terms separatedby +
operators.The terms themselves consist of variable and factornames separated by :
operators.Such a term is interpreted as the interaction ofall the variables and factors appearing in the term.
In addition to +
and :
, a number of other operators areuseful in model formulae.
The
*
operator denotes factor crossing:a*b
isinterpreted asa + b + a:b
.The
^
operator indicates crossing to the specified degree. For example(a+b+c)^2
is identical to(a+b+c)*(a+b+c)
which in turnexpands to a formula containing the main effects fora
,b
andc
together with their second-order interactions.The
%in%
operator indicates that the terms on its left arenested within those on the right. For examplea + b %in% a
expands to the formulaa + a:b
.The
/
operator provides a shorthand, so thata / b
is equivalent toa + b %in% a
.The
-
operator removes the specified terms, hence(a+b+c)^2 - a:b
is identical toa + b + c + b:c + a:c
.It can also used to remove the intercept term: when fitting a linearmodely ~ x - 1
specifies a line through the origin.A model with no intercept can be also specified asy ~ x + 0
ory ~ 0 + x
.
While formulae usually involve just variable and factornames, they can also involve arithmetic expressions.The formula log(y) ~ a + log(x)
is quite legal.When such arithmetic expressions involveoperators which are also used symbolicallyin model formulae, there can be confusion betweenarithmetic and symbolic operator use.
To avoid this confusion, the function I()
can be used to bracket those portions of a modelformula where the operators are used in theirarithmetic sense. For example, in the formulay ~ a + I(b+c)
, the term b+c
is to beinterpreted as the sum of b
and c
.
Variable names can be quoted by backticks `like this`
informulae, although there is no guarantee that all code using formulaewill accept such non-syntactic names.
Most model-fitting functions accept formulae with right-hand-sideincluding the function offset
to indicate terms with afixed coefficient of one. Some functions accept other‘specials’ such as strata
or cluster
(see thespecials
argument of terms.formula)
.
There are two special interpretations of .
in a formula. Theusual one is in the context of a data
argument of modelfitting functions and means ‘all columns not otherwise in theformula’: see terms.formula
. In the context ofupdate.formula
, only, it means ‘what waspreviously in this part of the formula’.
When formula
is called on a fitted model object, either aspecific method is used (such as that for class "nls"
) or thedefault method. The default first looks for a "formula"
component of the object (and evaluates it), then a "terms"
component, then a formula
parameter of the call (and evaluatesits value) and finally a "formula"
attribute.
There is a formula
method for data frames. When there's"terms"
attribute with a formula, e.g., for amodel.frame()
, that formula is returned. If you'd like theprevious (R <= 3.5.x) behavior, use the auxiliaryDF2formula()
which does not consider a "terms"
attribute.Otherwise, ifthere is onlyone column this forms the RHS with an empty LHS. For more columns,the first column is the LHS of the formula and the remaining columnsseparated by +
form the RHS.
Value
All the functions above produce an object of class "formula"
which contains a symbolic model formula.
Environments
A formula object has an associated environment, andthis environment (rather than the parentenvironment) is used by model.frame
to evaluate variablesthat are not found in the supplied data
argument.
Formulas created with the ~
operator use theenvironment in which they were created. Formulas created withas.formula
will use the env
argument for theirenvironment.
Note
In R versions up to 3.6.0, character
x
of lengthmore than one were parsed as separate lines of R code and the firstcomplete expression was evaluated into a formula when possible. Thissilently truncates such vectors of characters inefficiently and to someextent inconsistently as this behaviour had been undocumented. For thisreason, such use has been deprecated. If you must work via characterx
, do use a string, i.e., a character vector of length one.
E.g., eval(call("~", quote(foo + bar)))
has been an order of magnitudemore efficient than formula(c("~", "foo + bar"))
.
Further, character “expressions” needing an eval()
to return a formula are now deprecated.
References
Chambers, J. M. and Hastie, T. J. (1992)Statistical models.Chapter 2 of Statistical Models in Seds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
See Also
~
, I
, offset
.
For formula manipulation: terms
, and all.vars
;for typical use: lm
, glm
, andcoplot
.
Examples
class(fo <- y ~ x1*x2) # "formula"fotypeof(fo) # R internal : "language"terms(fo)environment(fo)environment(as.formula("y ~ x"))environment(as.formula("y ~ x", env = new.env()))## Create a formula for a model with a large number of variables:xnam <- paste0("x", 1:25)(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))