Conditional Inference Trees {party} | R Documentation |
Conditional Inference Trees
Description
Recursive partitioning for continuous, censored, ordered, nominal andmultivariate response variables in a conditional inference framework.
Usage
ctree(formula, data, subset = NULL, weights = NULL, controls = ctree_control(), xtrafo = ptrafo, ytrafo = ptrafo, scores = NULL)
Arguments
formula | a symbolic description of the model to be fit. Notethat symbols like |
data | a data frame containing the variables in the model. |
subset | an optional vector specifying a subset of observations to beused in the fitting process. |
weights | an optional vector of weights to be used in the fittingprocess. Only non-negative integer valued weights areallowed. |
controls | an object of class |
xtrafo | a function to be applied to all input variables.By default, the |
ytrafo | a function to be applied to all response variables. By default, the |
scores | an optional named list of scores to be attached to orderedfactors. |
Details
Conditional inference trees estimate a regression relationship by binary recursivepartitioning in a conditional inference framework. Roughly, the algorithmworks as follows: 1) Test the global null hypothesis of independence betweenany of the input variables and the response (which may be multivariate as well). Stop if this hypothesis cannot be rejected. Otherwise select the inputvariable with strongest association to the resonse. Thisassociation is measured by a p-value corresponding to a test for thepartial null hypothesis of a single input variable and the response.2) Implement a binary split in the selected input variable. 3) Recursively repeate steps 1) and 2).
The implementation utilizes a unified framework for conditional inference,or permutation tests, developed by Strasser and Weber (1999). The stopcriterion in step 1) is either based on multiplicity adjusted p-values (testtype == "Bonferroni"
or testtype == "MonteCarlo"
in ctree_control
),on the univariate p-values (testtype == "Univariate"
),or on values of the test statistic(testtype == "Teststatistic"
). In both cases, thecriterion is maximized, i.e., 1 - p-value is used. A split is implemented when the criterion exceeds the value given by mincriterion
asspecified in ctree_control
. For example, when mincriterion = 0.95
, the p-value must be smaller than$0.05$ in order to split this node. This statistical approach ensures thatthe right sized tree is grown and no form of pruning or cross-validationor whatsoever is needed. The selection of the input variable to split inis based on the univariate p-values avoiding a variable selection biastowards input variables with many possible cutpoints.
Multiplicity-adjusted Monte-Carlo p-values are computed following a "min-p" approach. The univariate p-values based on the limiting distribution (chi-squareor normal) are computed for each of the random permutations of the data. This means that one shoulduse a quadratic test statistic when factors are inplay (because the evaluation of the correspondingmultivariate normal distribution is time-consuming).
By default, the scores for each ordinal factor x
are1:length(x)
, this may be changed using scores = list(x = c(1,5,6))
, for example.
Predictions can be computed using predict
ortreeresponse
. The first function accepts argumentstype = c("response", "node", "prob")
where type = "response"
returns predicted means, predicted classes or median predicted survivaltimes, type = "node"
returns terminal node IDs (identical towhere
) and type = "prob"
gives more information aboutthe conditional distribution of the response, i.e., class probabilities orpredicted Kaplan-Meier curves and is identical totreeresponse
. For observations with zero weights,predictions are computed from the fitted tree when newdata = NULL
.
For a general description of the methodology see Hothorn, Hornik andZeileis (2006) and Hothorn, Hornik, van de Wiel and Zeileis (2006). Introductions for novices can be found in Strobl et al. (2009) andat https://github.com/christophM/overview-ctrees.
Value
An object of class BinaryTree-class
.
References
Helmut Strasser and Christian Weber (1999). On the asymptotic theory of permutationstatistics. Mathematical Methods of Statistics, 8, 220–250.
Torsten Hothorn, Kurt Hornik, Mark A. van de Wiel and Achim Zeileis (2006).A Lego System for Conditional Inference. The American Statistician,60(3), 257–263.
Torsten Hothorn, Kurt Hornik and Achim Zeileis (2006). Unbiased RecursivePartitioning: A Conditional Inference Framework. Journal ofComputational and Graphical Statistics, 15(3), 651–674. Preprint availablefrom https://www.zeileis.org/papers/Hothorn+Hornik+Zeileis-2006.pdf
Carolin Strobl, James Malley and Gerhard Tutz (2009).An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random forests.Psychological Methods, 14(4), 323–348.
Examples
set.seed(290875) ### regression airq <- subset(airquality, !is.na(Ozone)) airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxsurrogate = 3)) airct plot(airct) mean((airq$Ozone - predict(airct))^2) ### extract terminal node ID, two ways all.equal(predict(airct, type = "node"), where(airct)) ### classification irisct <- ctree(Species ~ .,data = iris) irisct plot(irisct) table(predict(irisct), iris$Species) ### estimated class probabilities, a list tr <- treeresponse(irisct, newdata = iris[1:10,]) ### ordinal regression data("mammoexp", package = "TH.data") mammoct <- ctree(ME ~ ., data = mammoexp) plot(mammoct) ### estimated class probabilities treeresponse(mammoct, newdata = mammoexp[1:10,]) ### survival analysis if (require("TH.data") && require("survival")) { data("GBSG2", package = "TH.data") GBSG2ct <- ctree(Surv(time, cens) ~ .,data = GBSG2) plot(GBSG2ct) treeresponse(GBSG2ct, newdata = GBSG2[1:2,]) } ### if you are interested in the internals: ### generate doxygen documentation ## Not run: ### download src package into temp dir tmpdir <- tempdir() tgz <- download.packages("party", destdir = tmpdir)[2] ### extract untar(tgz, exdir = tmpdir) wd <- setwd(file.path(tmpdir, "party")) ### run doxygen (assuming it is there) system("doxygen inst/doxygen.cfg") setwd(wd) ### have fun browseURL(file.path(tmpdir, "party", "inst", "documentation", "html", "index.html")) ## End(Not run)
[Package party version 1.3-15 Index]