In the Big Data ecosystem, one of the most important aspects is to store a
huge volume of data. Today businesses are dealing with structured,
unstructured, and semi-structured datasets continuously. Data coming
from Transactional systems (ATM, POS), ERPs, CRM software, IoT Devices.
On the other hand, a huge amount of external data is coming from social
media and other websites.
In order to use these data, which are continuously coming from varied
sources in huge volumes, data storage is of extreme importance. While
Data Warehouse used to store a big volume of structured denormalized
datasets, the data lakes store the huge volume of data (structured,
unstructured, semi-structured) in raw format. Apart from storing a huge
volume of data, the capability of processing those data is also important so
that these can be further used for various Use Cases.