The conventional data warehouse and relational database setup may be a workhorse for rigid systems like financial software, but for slicing and dicing marketing and advertising information, the infrastructure shows its age.
The problem: Traditional systems require schemas and data modeling that can’t keep up with trillions of pieces of data from a wide variety of sources in a mixture of formats. It's the problem posed by the so-called “3 Vs of big data” — volume, velocity and variety — all of which make it a challenge to collect, store and process everything in real time.
That's where nonrelational data-management technology comes in by offering greater flexibility in how data is stored, queried and processed. It's the tack taken by MapR, a company that powers big data analytics in a number of different fields, including online marketing and advertising. Built upon the open-source software framework of Apache Hadoop, MapR's technology crunches data on behalf of heavyweights such as comScore and Rubicon Project.
Hadoop allows a business to take commodity servers with local disks and cluster them to store data while running analytics or modeling processing in parallel. The workload can be distributed across thousands of machines in a cluster so that the system can store and analyze petabytes more data at much lower costs. The outcome is an analytics tool that costs anywhere between 20 to 40 times less to run than a traditional data warehouse, said Tomer Shiran, MapR’s VP of product management.