Logo Icon

Data Scientist Spotlight

This post first appeared on DataRobot’s blog.

You’ve decided: DataRobot is cool. You saw a demo. Your people tell you they like it. You like the way it makes data scientists more productive. And you love the way it helps you introduce new people to machine learning.

There’s just one question. Can you trust DataRobot?

Seriously? How do you know that the models it produces are good? You don’t care about winning data science contests. You just want to be sure that DataRobot builds models you can use. Without wrecking your business, or getting into trouble with regulators.

Meet Zach Deane-Mayer

Zach Deane-Mayer at DataRobot

Zach graduated from Dartmouth ten years ago with a degree in Biology and Math. Since then, he’s worked hands-on as a data scientist.

Three years ago, Zach joined DataRobot. Today, he leads the Core Modeling Team. Zach and his team make sure that DataRobot builds models you can use.

Inside DataRobot, there’s a bit of DNA that we call “blueprints.” This is the logic that drives the thousands of tasks that go into a machine learning project. If DataRobot were a person, we would call it “knowledge” or “expertise.”

Zach’s team develops, tests, and monitors the DataRobot blueprints. When there is a new idea or capability in data science, the team builds and tests a new blueprint. If the new blueprint performs well and meets DataRobot standards, it goes into production.

Otherwise, it goes into the circular file. That is the nature of science. We keep ideas and innovations that survive an empirical test. We discard those that don’t.

Testing doesn’t stop when a blueprint goes into production. Zach’s team tests the production blueprints. Every. Single. Night. If a blueprint consistently underperforms, Zach and his team yank it.

Blueprint, you’re fired.

DataRobot has thousands of blueprints. Testing thousands of blueprints every night requires an awesome testing platform. The DataRobot testing platform is so cool we’re writing a separate blog about it.

Some of our competitors use blueprints to drive automation. One of them just proudly announced a new blueprint.

So now they have three.

But let’s get back to the main question. Why should you trust DataRobot to build models you can use?

“Empirical testing drives everything we do,” says Zach. “We test things and evaluate them honestly. We’ve tried lots of fun new data science techniques and found they didn’t add any value to the models. Our goal is to deliver tools that help our customers solve real business problems.”

Like a Center of Excellence for machine learning?

“Exactly.”

What are some examples of innovations that don’t live up to the hype?

“Deep neural nets. A few years ago people thought deep learning would eat the world. It works great with dense data — image recognition, speech recognition, things like that. But for standard predictive analytics, it doesn’t add much value compared to other techniques. Like gradient boosted machines.”

Some of our competitors say that you only need one or two machine learning algorithms. Does our testing support that view?

“It’s true that some techniques outperform the others. But they don’t deliver the best results on every problem, and there is no way to know in advance which technique will work best. So if you want consistent results over a wide range of problems, you have to test as many different algorithms as you can.

“It’s like an investment portfolio. You don’t put all of your money in one stock. For returns you can trust, you spread the risk across many companies. It’s the same with machine learning. You test many different algorithms so you can trust the results.”

Does DataRobot use deep learning?

“Yes, we include deep learning blueprints in the mix, and sometimes they do well. But that’s the point. We don’t force customers to use one or two techniques. We build as many as we can into DataRobot. That way, the data and the business problem determine the technique.”

Any other examples?

“Sure. There’s a lot of interest in using LSTM deep neural nets for text analysis. Our testing shows that — regardless of theory — the technique doesn’t outperform simpler techniques like”bag of words.” Simpler techniques run faster and take less computing power.”

Testing a thousand blueprints every night. How big is your team?

Zach laughs. “We’re hiring. Right now I’m looking for machine learning engineers and Python developers.”

A team of experts, with a disciplined approach. Comprehensive and rigorous testing. A guiding philosophy that says: nothing goes into the product unless we can prove that it works.

It’s why you can trust DataRobot.