Using the Google Prediction API From R
Google has a “black box” prediction API that they provide for use with creating recommender systems or filtering spam. Furthermore, they provide an R package for interfacing that API, but try as I might I cannot get it to work under windows. Here are the instructions for setting up the API to run in R under linux. I haven’t tried this out yet, so let me know in the comments if it works, or if you can get it to run on Windows.
First we have to setup the Google Prediction API, as well as some dependencies: 1. Go to the Google APIs Console. This is your home base for managing google APIs. 2. In the upper left hand corner of the website (under the Google APIs logo) is a dropdown menu. Use this to create a new project, called something informative like “R predictions.” 3. Activate the Google storage API and turn it on. Activating may require opening a new page. 4. Activate the Google prediction API and turn it on. Activating may require opening a new page. 5. Click on the “Billing” tab, and make sure billing is enabled. You may have to enter your billing information. Note that you get 5GB of free storage through the end of 2011, and there’s a free quota on the prediction API for 5MB trained per day and 100 predictions per day, up to 20,000 total predictions. 6. Click the “Google Storage” tab, and make a note of the “x-goog-project-id.” You will need this when installing GSUtils.
Next we have to install some software on our computer to enable
communication between R and the prediction API: 1. Install
python, if you do not already have
it. 2. Make sure you can run python from the command prompt. You may
need to add python to your “path” or “environment” variables to do this.
On windows, run the command prompt as administrator. 3. Install the R
packages rjson and RCurl using install.packages() in R. 4. Make sure you
can open .tar archives. This is no problem on Mac/Linux systems, but on
windows you need 7zip. 5. Download
GSUtil,
and follow the directions to install it on your system. This is the
tricky part. 6. When you run GSUtil for the first time, make sure to use
the following command: python gsutil config -b
to allow gsutil to open
a web page and authorize access to your google storage account. 6. When
prompted, enter the project ID you recorded in part 1. 7. Download the
googlepredictionapi
package. 8. Open R, and setwd() to the folder containing the downloaded
package. 9. Install the R package from source using this command:
Now we’re all set to start using the prediction API: 1. First we need to create a bucket to store our data. Do this from the Google Storage Web Console. Name your bucket something useful, like rdata. Don’t use capital letters or symbols. 2. Run the following script to test that everything works. Note that you have to save your data frame as a .csv file before GSUtil can upload it to google storage for modeling:
## Load googlepredictionapi and dependent libraries
library(rjson)
library(RCurl)
library(googlepredictionapi)
set.seed(42)
#Save dataframe to a file, upload to google storage, and train a model
write.csv(iris,'iris.csv')
model <- PredictionApiTrain(data='iris.csv', remote.file="gs://rdata/iris")
#Summarize model and predict for new data
summary(model)
predict(model, iris[10,])
Good luck! Here are some links for future reference:
- Google directions for installing the googlepredictionapi package in R
- Google directions for installing gsutil
- Google API console for managing APIs and billing
- Google storage console for managing buckets
- Google APIs overview/introduction