Time Series Cross-validation 5

January 24, 2013 • By Zach Deane-Mayer

Note: This post is NOT financial advice! This is just a fun way to explore some of the capabilities R has for importing and manipulating data.

The caret package for R now supports time series cross-validation! (Look for version 5.15-052 in the news file). You can use the createTimeSlices function to do time-series cross-validation with a fixed window, as well as a growing window. This function generates a list of indexes for the training set, as well as a list of indexes for the test set, which you can then pass to the trainControl object.

Caret does not currently support univariate time series models (like arima, auto.arima and ets), but perhaps that functionality is coming in the future? I’d also love to see someone write a timeSeriesSummary function for caret that calculates error at each horizon in the test set and a createTimeResamples function, perhaps using the Maximum Entropy Bootstrap.

Here’s a quick demo of how you might use this new functionality:

#Download S&P 500 data, adjust, and convert to monthly
set.seed(42)
library(quantmod)
getSymbols('^GSPC', from='1990-01-01')
#> [1] "GSPC"
GSPC <- adjustOHLC(GSPC, symbol.name='^GSPC')
GSPC <- to.monthly(GSPC, indexAt='lastof')
Target <- ClCl(GSPC)

#Calculate some co-variates
periods <- c(3, 6, 9, 12)
Lags <- data.frame(lapply(c(1:2, periods), function(x) Lag(Target, x)))
EMAs <- data.frame(lapply(periods, function(x) {
  out <- EMA(Target, x)
  names(out) <- paste('EMA', x, sep='.')
  return(out)
}))
RSIs <- data.frame(lapply(periods, function(x) {
  out <- RSI(Cl(GSPC), x)
  names(out) <- paste('RSI', x, sep='.')
  return(out)
}))
DVIs <- data.frame(lapply(periods, function(x) {
  out <- DVI(Cl(GSPC), x)
  out <- out$dvi
  names(out) <- paste('DVI', x, sep='.')
  return(out)
}))
dat <- data.frame(Next(Target), Lags, EMAs, RSIs, DVIs)
dat <- na.omit(dat)

#Custom Summary Function
mySummary <- function (data, lev = NULL, model = NULL) {
  positions <- sign(data[, "pred"])
  trades <- abs(c(1,diff(positions)))
  profits <- positions*data[, "obs"] + trades*0.01
  profit <- prod(1+profits)
  names(profit) <- 'profit'
  return(profit)
}

#Fit a model
library(caret)
model <- train(dat[,-1], dat[,1], method='rpart', 
               metric='profit', maximize=TRUE,
               trControl=trainControl(
                 method='timeslice',
                 initialWindow=12, fixedWindow=TRUE, 
                 horizon=12, summaryFunction=mySummary,
                 verboseIter=FALSE))
model
#> CART 
#> 
#> 306 samples
#>  18 predictor
#> 
#> No pre-processing
#> Resampling: Rolling Forecasting Origin Resampling (12 held-out with a fixed window) 
#> Summary of sample sizes: 12, 12, 12, 12, 12, 12, ... 
#> Resampling results across tuning parameters:
#> 
#>   cp          profit  
#>   0.01646595  1.072231
#>   0.02408571  1.072231
#>   0.06312675  1.072231
#> 
#> profit was used to select the optimal model using the largest value.
#> The final value used for the model was cp = 0.06312675.

Time Series Cross-validation 5

Ready to ship your next AI feature?