TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
NEW! Try Stackie AI
Data / Storage

To Store in the Cloud or on Premises? How about Door No. 3?

A walk through the pros and cons of using cloud and on-premises storage as well as a third, best-of-both-worlds option: cloud-adjacent storage.
Feb 28th, 2024 10:37am by
Featued image for: To Store in the Cloud or on Premises? How about Door No. 3?
Image from Javier Brosch on Shutterstock

A data platform is how a company stores, manages and analyzes its data, its most valuable asset. The more powerful and efficient the platform, the more effectively the data can be put to use.

Data platforms enable conversion of data streams from sources of all kinds, from line-of-business apps to IoT platforms to AI tools, into plans of action for achieving business outcomes like product improvements, better processes and new commercial opportunities.

It’s important to get your data platform architecture right. A big part of that is having optimal infrastructure — a combination of storage, network and compute resources that’s efficient and that doesn’t lead to cost and complexity spinning out of control.

The large data volumes and complex analysis procedures in modern data platforms require specialized infrastructure to support performance and reliability at scale. While all aspects of the infrastructure must be considered, as far as data platforms are concerned, compute, networking and reliability considerations are usually secondary to storage.

Let’s walk through the pros and cons of using cloud and on-premises storage infrastructure for a data platform as well as a third, best-of-both-worlds option: cloud-adjacent storage.

Generally speaking, cloud storage gives you the advantage of scalability and access to cloud-based tools, while on-premises storage infrastructure gives you full control of your data. Cloud-adjacent storage gives you the control of on-premises without giving up the ability to use cloud tools.

Advantages of Cloud Storage

Wholly cloud-based storage solutions deposit your data in a remote infrastructure that you provision through a service provider. The main benefits of this approach are that it:

  • Eliminates the need to purchase and maintain physical storage hardware: Cloud has no (or very low) upfront costs since you don’t need to procure any storage devices yourself. You pay for what you need as you go, thus reducing capital expenditures. The infrastructure is fully managed, so you also don’t have the operational cost of system maintenance.
  • Supports integrated ingest and analysis capabilities: Many cloud storage providers directly integrate with data platforms or offer their own all-in-one services. You can ingest, analyze, transform and output data all in one place, minimizing costly and time-consuming data transfers.
  • Offers seamless scalability, reliability and high availability: Effortless scalability is arguably the biggest advantage of cloud storage. You pay for what you use, adding storage capacity and compute performance as needed. You also get a convenient way to build resiliency and high availability.
  • Has automated security and compliance protections: Cloud storage usually comes with integrated security options to help protect your data and prevent unauthorized access. Providers can also support compliance requirements through specialized services.

Drawbacks of Cloud Storage

Using cloud storage for a data platform isn’t without its drawbacks, which include:

  • Less control over your resources: Because your data lives in the cloud, you don’t have full control over the infrastructure it runs on or how internal aspects of the storage service are managed. This can be restrictive for large and complex data sets with specific performance requirements.
  • Fewer options for achieving exact compute, networking and storage combinations: Infrastructure configuration options available to you are limited to what your cloud provider offers. This can lead to overprovisioning (and thus overpaying for) some elements of the infrastructure to create the platform you need (more vCPU cores than necessary, for example).
  • Higher security and privacy risks: Public cloud platforms are shared infrastructure, so there is a risk of lateral attacks. You also don’t have full control of your network’s exposure to the public internet. Services can require complex configuration policies to apply essential security protections, which increases the risk of security oversights.
  • Higher costs over time: Cloud has low upfront costs, but this doesn’t necessarily translate to a lower cost of ownership throughout the life cycle of your data platform. Cloud storage fees can add up, especially when using a dedicated data warehouse solution. On-demand plans from Amazon Redshift range from around $1.08 to $13.04 per hour, for example, whereas Snowflake’s credit model results in typical fees that range from $2 per hour to over $1,024 per hour on the largest plans.

Advantages of on-Premises Storage

On-premises infrastructure refers to resources that you own and operate within your own organization. For a data platform, this means purchasing and configuring servers and storage drives and then deploying them into your data center environment. On-premises infrastructure is more complex to set up and maintain, but it gives you complete control over your platform.

Other advantages include:

  • Your infrastructure is an asset: Buying your own servers and storage drives carries a high upfront cost, but it’s an asset for your organization. If you’re confident you can predict your storage requirements, owning your infrastructure can significantly reduce long-term costs compared to cloud storage options.
  • Full control over your hardware and data: You have full control of your infrastructure, down to the storage servers, file systems and operating systems. You can configure any combination of compute, networking and storage and craft a platform that meets your exact requirements.
  • Zero resource contention: Running everything on your own hardware means there’s no resource contention, unlike cloud services that can be affected by “noisy neighbor” issues. You will also be safe from cloud outages, assuming you’re confident in your own systems’ reliability.
  • Option to run air-gapped data stores: Critically sensitive data might not be suitable for the cloud. On-premises infrastructure lets you keep these resources private and air-gap them if required. Zero contact with the internet drastically cuts your potential security exposure.

Drawbacks of on-Premises Storage

Standing up and running storage infrastructure that supports a large data platform is no small task:

  • High upfront costs: Assembling an on-premises data platform is expensive up front. You’ll need to purchase or lease a data center space, then fill it with your compute, networking and storage devices. Additionally, storage media need to be regularly added and replaced to ensure scalability and reliability.
  • Requires expertise: Building, operating and maintaining an on-premises data platform requires dedicated expertise, and there’s a relatively small pool of skilled engineers available. You need to have this talent on your team or pay for outsourcing it.
  • Challenging to scale the platform: Scaling on-premises data platforms is tricky and time-consuming. You need to purchase additional drives, add them to your storage arrays and check for correct operation. Similarly, increasing compute and networking capacity requires complex hardware installation that may necessitate platform downtime. In practice, this work is often outsourced to infrastructure management specialists, but being dependent on contractors can further negate the cost and control benefits provided by on-premises systems.
  • Harder to integrate with analytics services: The pool of tools and services that you can use to analyze data on-premises is much smaller than the pool of cloud-based ones. Many AI and ML data engines are designed for cloud use, which can limit the variety of ways in which you can leverage your data.

Cloud-Adjacent Storage: The Best of Both Worlds

The cloud-adjacent storage model combines the benefits of cloud and on-premises models for a data platform, leaving out the major drawbacks.

“Cloud adjacency” here refers to maintaining private storage infrastructure but connecting it privately to public cloud platforms via cloud onramps located within the same data center campuses.

You can do this by colocating your own storage infrastructure with a data center operator that provides access to cloud onramps. You can also do this using a dedicated cloud solution like Equinix’s, which gives you fully managed compute and storage that’s single tenant yet provisioned remotely and on demand.

The benefits of this approach include:

  • Control over configurations: You have full control over your infrastructure resources, including server hardware and software that underpin your data platform as well as networking.
  • Direct cloud-to-cloud networking: Cloud onramps allow you to network between clouds using private connections, providing more options for where you process your data. This extends to direct Layer 2 and 3 networking, allowing you to reduce latency, accelerate intensive operations such as ingest and backups, and minimize the risk of traffic being intercepted or manipulated.
  • Reduced costs: Cloud adjacency can work out to cost less than either a full cloud or an on-premises setup. You can save on storage itself but also on data egress costs by designing your network in a way that avoids transferring a lot of data out of a cloud region or service.
  • Access to detailed cloud analytics: Cloud-adjacent storage allows you to use the full spectrum of cloud-based data analytics, processing and transformation tools. You can use a cloud-based tool on privately stored data via a fast and reliable private connection.

A robust data platform needs scalable storage infrastructure that meets your current and future data requirements.

Cloud and on-premises solutions each accommodate a different set of use cases, but you don’t have to choose only one of the two. Many organizations find that a hybrid approach using dedicated cloud and cloud-adjacent storage is the most flexible and cost-effective overall option. It facilitates secure, private networking and data analytics across clouds while offering reliable performance and configuration control.

Group Created with Sketch.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.