crossv_kfold
splits the data into k
exclusive partitions,
and uses each partition for a test-training split. crossv_mc
generates n
random partitions, holding out test
of the
data for training. crossv_loo
performs leave-one-out
cross-validation, i.e., n = nrow(data)
training partitions containing
n - 1
rows each.
Usage
crossv_mc(data, n, test = 0.2, id = ".id")
crossv_kfold(data, k = 5, id = ".id")
crossv_loo(data, id = ".id")
Arguments
- data
A data frame
- n
Number of test-training pairs to generate (an integer).
- test
Proportion of observations that should be held out for testing (a double).
- id
Name of variable that gives each model a unique integer id.
- k
Number of folds (an integer).
Value
A data frame with columns test
, train
, and .id
.
test
and train
are list-columns containing resample()
objects.
The number of rows is n
for crossv_mc()
, k
for crossv_kfold()
and nrow(data)
for crossv_loo()
.
Examples
cv1 <- crossv_kfold(mtcars, 5)
cv1
#> # A tibble: 5 × 3
#> train test .id
#> <named list> <named list> <chr>
#> 1 <resample [25 x 11]> <resample [7 x 11]> 1
#> 2 <resample [25 x 11]> <resample [7 x 11]> 2
#> 3 <resample [26 x 11]> <resample [6 x 11]> 3
#> 4 <resample [26 x 11]> <resample [6 x 11]> 4
#> 5 <resample [26 x 11]> <resample [6 x 11]> 5
library(purrr)
cv2 <- crossv_mc(mtcars, 100)
models <- map(cv2$train, ~ lm(mpg ~ wt, data = .))
errs <- map2_dbl(models, cv2$test, rmse)
hist(errs)