Logo Icon

Scraping Web Data in R

Note: This post is NOT financial advice! This is just a fun way to explore some of the capabilities R has for importing and manipulating data.

In my last post, I went through a lot of effort to scrape the PMI index off the ISM website. It turns out that was unnecessary effort, as commentator “senne” pointed out that this index is available from FRED, with the symbol NAPM. I’ve updated my code, which now pulls all the data straight from FRED.

However, it was surprisingly easy to scrape web data into R, using the readHTMLTable function in the XML package. I thought I’d keep the code I used on my blog, as it’s a good example of how easily you can pull web data into R.

library(RCurl)
library(XML)
library(xts)
set.seed(42)

#Scrape data from the website
url <- 'https://web.archive.org/web/20131224165222/http://www.ism.ws/ISMReport/content.cfm?ItemNumber=10752'
html_data <- getURL(url)
rawPMI <- readHTMLTable(html_data)
PMI <- data.frame(rawPMI[[1]])
names(PMI)[1] <- 'Year'

#Reshape
library(reshape2)
PMI <- melt(PMI,id.vars='Year')
names(PMI) <- c('Year','Month','PMI')
PMI$PMI <- as.numeric(as.character(PMI$PMI))
PMI <- na.omit(PMI)

#Convert to XTS
numMonth <- function(x) {
    months <- list(jan=1,feb=2,mar=3,apr=4,may=5,jun=6,jul=7,aug=8,sep=9,oct=10,nov=11,dec=12)
    x <- tolower(x)
    sapply(x,function(x) months[[x]])
}
PMI$Month <- numMonth(PMI$Month)
PMI$Date <- paste(PMI$Year,PMI$Month,'1',sep='-')
PMI$Date <- as.Date(PMI$Date,format='%Y-%m-%d')
PMI <- xts(PMI$PMI,order.by=PMI$Date)
names(PMI) <- 'PMI'
plot(PMI)
A line plot showing the historical Purchasing Managers' Index (PMI) values from January 1948 to November 2013. The PMI values fluctuate significantly over time, with peaks often exceeding 70 and troughs dropping below 40. The plot highlights economic cycles, with periods of expansion and contraction in the manufacturing sector, as indicated by the PMI's movements above and below the 50 threshold.

stay in touch