International Bibliography of Periodical Literature (Photo credit: Wikipedia)
[This blog post is a repost of http://blog.atos.net/sc/2011/12/09/watch-this-space-big-data-%e2%80%93-big-problems/ ]
When you run out of space in your cupboard, you go out and buy a new cupboard. You might even choose a similar model so it looks good in your bedroom or kitchen.
If you run out of floor-space in your house, the problem is a bit more complex from a financial point of view – but the solution is similar.
I think we had, for many years, the same expectation in IT. If we ran out of storage, we would buy additional storage. Well, it seems we need to wake up and face the problem, because the solution is not that simple anymore. In a published whitepaper from the Atos Scientific Community (“Open Source Solutions for Big Data Management”) I read:
“[…] several major changes in the IT world have dramatically increased [data storage and processing needs] rate of growth.
[…] Computer capabilities have not increased fast enough to meet these new requirements. When data is counted in terabytes or petabytes, traditional data and computing models can no longer cope.”
This problem forces us to have a different view on storage and database technologies.
Traditional databases that use a relational model cannot process the data quick enough and adding additional computing power and memory is not the solution.
The issue is luckily addressed by storage and database vendors – they coined the term “Big Data” and are developing new solutions to make sure we can cope with the rapid increase in the information we want to be available online.
Unfortunately the impact of these new technologies is big (no pun intended) and there is limited experience in the way the technology is applied successfully and sustainable.
Some vendors are looking towards changes in hardware and provide dedicated storage-boxes that are hardwired to handle large databases or large data-files. Others are looking to provide solutions using new database software.
Most of the software developers and vendors that are facing big data issues are reconsidering the ‘traditional’ relational database model and are bringing new ‘NoSQL’ database models into view.
Based on the amount of marketing and buzz , this ‘NoSQL’ seems to be the next best thing to go with for these type of solutions.
So, do we really need all of this stuff? The Scientific Community whitepaper claims:
“In most situations, using NoSQL solutions instead of RDBMS (relational database management systems – paj) does not make sense in cases where the limits of the database have not been reached. Although, given the current exponential growth of data storage requirements, these limits are increasingly likely to be reached in the future. Unless RDBMS evolves quickly to include more advanced data distribution features, NoSQL solutions will become more and more important.”
The specialists have spoken – it is important. We need to care and we need to take action.
Additional problem is that the field is evolving quickly, good solutions are provided by small companies and will soon become part of large providers through acquisitions or other business activities.
I also expect some patent-conflicts (do we not love those?) and maybe some bad choices leading to loss of data.
My recommendation is you start looking for areas in your organization where this challenge will become a problem very soon. Ask your systems administrator about the time they need to do backups of databases and restore times. Ask your system developers if they foresee issues with your next generation document management or transaction processing system.
And while you are at it – ask your business analyst about the data they need to create meaningful business intelligence reports (and how much time it takes to create them). This will give you a good overview of your Big Data improvement areas.
Do not ask your vendor before having done an internal assessment. You do not want to be stuck with the wrong technology.
The Atos whitepaper can be downloaded here