Optimizing your Data Strategy: On-Premise vs On cloud vs Hybrid?

"Optimizing data strategy dataqraft blog"

Author : Aina Raj

Optimizing your Data Strategy : On-Premise vs On cloud vs Hybrid?  

Traditionally organizations have invested in on-premise infrastructure for applications and data. The IT head would provision for the infrastructure based on annual data storage and computing needs. Today the scenario has changed. With the magnitude of data being generated every second, predicting and managing workloads and provisioning for upcoming needs has become a major challenge for the IT department of large and medium enterprises.

Until a few years ago, organizations were reluctant to adopt a cloud strategy. But today with data integration, enterprise data warehousing and big data analytics, organizations are recognizing the importance of a cloud-first strategy in realizing their digital transformation goals. This article will help you understand whether an on-premise, on-cloud or hybrid strategy will serve you better while optimizing your data strategy.

Key features of a robust data architecture:

As businesses make data as an integral part of their decision making, it is important to build a robust data architecture that is capable of capturing and ingesting different forms of data and making clean and good quality data available for analysis at finger tips. Before deciding whether to adopt cloud or deploy an on-premise model as a part of your data strategy, it is essential to consider these key parameters that will influence this decision.

  • High availability: A critical feature to be considered especially in data centric projects is High Availability (HA) of your data. HA ensures data is available at real time for execution of simple and complex queries. For operational flexibility it is recommended that a copy of data and code is maintained in a secondary location often geographically separated.

In an on-premise data warehouse, provisioning for HA would mean added cost of hardware, software and storage. Computing resources which also come at an additional cost is required to create redundancy of current data. Modern cloud data warehouse architecture separates storage, computing and service layers to ensure data resilience and data consistency in case of planned or unplanned interruptions or node failures.

  • Connecting to data sources:
    Gone are those days when organizations used one or two data sources to generate reports and dashboards. Today data is available everywhere and we should think of ways to connect to relevant data sources to optimize results. Data sources can be both on premise or on cloud.

Let’s analyze the scenario with a use case like creating a customer 360 view. Data is captured from multiple sources like on-premise CRM and other internal systems, third party data, cloud data sources like social media, portals, surveys, campaigns, videos, call logs, images, voice, APIs et al. Transferring data from multiple sources can become a challenge sometimes, especially if the transfer is between on premise data sources and cloud services.
ETL platforms like Talend and Informatica come with more than 900 in-built data connectors that can quickly and securely connect to several data sources. Depending on where most of the data is residing, these platforms can be devised on-premise, on cloud or on a hybrid model.

  • Scalability & Elasticity: If you are a company just starting off or looking to expand to other geographic markets, cloud infrastructure is a good idea as it does not involve initial set up cost. You only pay for what you use. You may be aiming for rapid growth and not in a position to predict and provision for future storage and computing needs. In this case the flexibility to scale up and scale down on the go is extremely important especially if decisions are made based on real time analytics
  • Cost: Cloud service providers typically offer a pay per use model to customers unlike On premise investment which involves initial CAPEX (capital expenditure). CAPEX involves investing in new hardware, software, infrastructure etc. In a public cloud, all these expenses are handled by the cloud provider. Hence cloud solutions are cost effective with minimal upfront costs. Many cloud solutions are SaaS based models where in the ongoing costs are a monthly or annual subscription with flexible pricing options available on the go.

Forrester’s Total Economic Impact report evaluates the cost benefits associated with cloud data platforms. According to this report, cloud data warehouse Snowflake was able to deliver cost saving over a period on 3 years for its customers:

  •    Legacy storage costs saved over $3.5 million
  •    Legacy compute costs saved over $731,000
  •    ETL Labor savings amount to nearly $995,000
  • Nature of your business: Another key parameter to consider while optimizing your data strategy is the nature of your business. If you are in a regulated industry with data privacy concerns, hosting your data on premise will give you better control. If you are aiming to expand to different markets with access to remote teams and flexibility to scale up and scale down on the fly, cloud is a better option.

Concluding Thoughts:

If you are looking for control and accessibility and in a position to predict the future requirements of your organization on-premise may be a good option. But if you wish to minimize vendor dependencies, would like scalability, elasticity and high availability of data then cloud is a better fit.
Still if you are unsure which is the optimal solution to suit your business’s needs, you could go ahead with an hybrid approach, storing your data in an on-prem data center and using the cloud for data processing and analytics or vice versa