ComScore, Rubicon And Others Look To MapR For Big Data Processing

As digital marketers hone the complementary sciences of behavioral metrics, segmentation and targeting, the deluge of consumer data consistently overwhelms traditional processing technology.

The conventional data warehouse and relational database setup may be a workhorse for rigid systems like financial software, but for slicing and dicing marketing and advertising information, the infrastructure shows its age.

The problem: Traditional systems require schemas and data modeling that can’t keep up with trillions of pieces of data from a wide variety of sources in a mixture of formats. It’s the problem posed by the so-called “3 Vs of big data” — volume, velocity and variety — all of which make it a challenge to collect, store and process everything in real time.

That’s where nonrelational data-management technology comes in by offering greater flexibility in how data is stored, queried and processed. It’s the tack taken by MapR, a company that powers big data analytics in a number of different fields, including online marketing and advertising. Built upon the open-source software framework of Apache Hadoop, MapR’s technology crunches data on behalf of heavyweights such as comScore and Rubicon Project.

Hadoop allows a business to take commodity servers with local disks and cluster them to store data while running analytics or modeling processing in parallel. The workload can be distributed across thousands of machines in a cluster so that the system can store and analyze petabytes more data at much lower costs. The outcome is an analytics tool that costs anywhere between 20 to 40 times less to run than a traditional data warehouse, said Tomer Shiran, MapR’s VP of product management.

Competing with other companies running their own flavors of Hadoop distributions, such as Cloudera and Hortonworks, MapR brings to the table a management and infrastructure layer that’s meant to support big-time enterprise deployments, Shiran said. He recently spoke with AdExchanger about the challenges of processing marketing data in real time and how his firm has helped some of digital advertising’s biggest firms.

What are the key challenges online marketers face in collecting data and making real-time decisions based on that information?

The data that’s being considered in many of these applications is very big. They’re looking at every single click that’s happening on a site or every single event that’s being generated, every interaction or customer behavior, and that really can’t be done with traditional systems.

A big reason Hadoop is driving a lot of marketing is the fact that it allows companies to break down the silos. It’s much less rigid, in terms of the schema requirements. You don’t have to define a schema for every table in advance and maintain that schema and keep it up to date all the time. So, traditionally if you look at enterprises, they can have 10,000 Oracle databases with data that is spread across all these systems, and it prevents them from having that single view of the customer that spans all their customer’s activities.

So one large MapR customer, in the retail side of financial services, went from taking months to understand customers to minutes. I’ll give you an example. They want a query that shows all the consumers that have been skiing in California. In the past, there’d be many separate systems and you’d have to get some information from one system, get another analyst to email a spreadsheet from another, and there’s much more manual processing that could take months. With MapR, that takes several minutes to do because now they have a single system that has all that information and the power to actually run such a query.

This allows you to do much more accurate customer targeting because it’s not just based on what you bought over the last year, but it’s based on what are you are looking at right now, and what you have been looking at in the last week, and so forth.

For example, one of the large cable companies was able to do ad insertions in video on demand, based on what you’re doing with your set-top box, whether you clicked stop, rewind or fast-forward, or skipped certain sections of the show.

Your use cases are all over the map — everything from cybersecurity to sales performance-management systems. Where does marketing fall into the mix?

Marketing is one of the major use cases that our customers use our technology for, across all the verticals and across many channels.

Cisco, for example, developed a 360-degree customer application that’s collecting all the information that it knows about its customers, from the billing system, support system, social media and every interaction point it has with its customers. And it uses that for lead generation. It’s helping its own partners identify new sales opportunities and making decisions as to which partner to provide that opportunity data to. It’s analyzing the dial-home data, the behavior of its customers on its websites and when using its products. It increased revenue by $40 million just in the first year of deploying MapR.

Another example is at one of the world’s largest retailers, which makes decisions on pricing based on MapR. It looks at competitors’ pricing and social media data to determine which products to stock in which stores. In that example you see the marketing spanning across all the four Ps of marketing — product, place, price and promotion.

Also, one of the leading IT vendors uses our product to make decisions on its website as to how to create a customized flow for the user to increase the probability that the customer will buy and increase the amount that they will spend on the site. So every customer is getting a personalized flow on the website, and that’s all based on MapR and using machine-learning technologies.

What about ad tech?

Online advertising or ad tech is a very large vertical for us and for Hadoop in general. A lot of the early adopters were in the online advertising space. If you look at companies like Rubicon, it’s the largest ad exchange in the US by reach. People are bidding on the exchange and it analyzes all of those bids and auctions that are taking place – we’re talking 90 billion events every day. It’s predicting the price that the next auction is going to close at and making decisions on which ad to show for each slot that’s available. So it’s doing that matching of the publishers and advertisers, and to do that in an optimal way, it needs to analyze all that data that’s coming out of that system.

The data is about what people are clicking, what they are looking at and how much they are engaging after that initial view or click. That helps Rubicon do better matching. If you’re looking at the amounts of data here, we’re talking many petabytes of data analyzed to make a decision. The scale is way too significant to use a relational database or data warehouse for this use case. Also, the type of analysis it does is beyond a SQL query. And the data changes often.

Meanwhile, as the de facto standard measurement company on the Internet, comScore tracks who is looking at which different online properties. It also has a panel of users about whom it collects all online behavioral data. ComScore then analyzes that data and produces information that it provides to advertisers. It’s analyzing 1.7 trillion events every month, so it’s reaching more than 90 percent of the Internet population now, in almost 200 countries.

So, chances are, if you’ve done something on your phone or using your browser this morning, then you’ve generated an event on comScore’s system. And that is running on MapR and is being analyzed and aggregated on a MapR cluster.

Tagged in: