It’s not uncommon for organizations that have migrated to Google Cloud Platform (GCP) to encounter sticker shock.
After all, one of the major selling points of GCP—and all public clouds—is the ability to reduce your costs compared to managing your own datacenter on-premises or at a co-location.
There are a number of reasons why costs on GCP may be higher than expected, including:
- Your applications were simply “lifted and shifted” into the cloud without being modernized to use their new environment efficiently
- Your compute usage is not being properly monitored
- GCP-provided alerting tools for cloud usage aren’t being utilized
Still, while these issues can certainly add to your overall costs, in our experience the main culprit for skyrocketing expenses in GCP is also one of its most powerful features: BigQuery.
What is BigQuery?
BigQuery is Google’s platform for enterprise data warehousing and analytics. It’s essentially a search engine on steroids, powered by Google’s own infrastructure and capable of processing queries in large amounts of data very quickly.
It’s also a single tool for both storage and compute in GCP, and is not only compatible with a wide range of popular tools (Looker, Azure Power BI, Tableau) but it can be accessed via an array of methods, including:
- The GCP Cloud Console
- bq command-line interfaces in Windows/Linux/Mac
- Via calls to the BigQuery REST API in seven different programming languages
Combined, these and other features add up to a uniquely powerful analytics platform for enterprises. But in order to reap the benefits of BigQuery—and avoid running up costs—you need to use it as it’s been designed to be used.
In other words, the ways you’re used to running queries in a traditional database are not going to be effective in BigQuery. In fact, they can be outright detrimental to your business.
Avoiding big costs with BigQuery
The first thing to know about BigQuery is that the amount of data you store, the number of queries you run—these are what you’re charged for.
The second thing to know is that, unlike traditional databases, BigQuery is column-based.
This may not seem like a big difference, but failing to keep it in mind when using the platform is usually the #1 reason for unexpected costs.
Because BigQuery is column-based, it’s able to return results very quickly compared to row-centric traditional databases.
The tradeoff, though, is that unless you limit the number of columns you ask BigQuery to read, it’s easy to overspend. That old database trick of simply using SELECT *? will only force BigQuery to scan every byte within the entire data set whether or not you want it to—at a cost you definitely don’t want.
Use the tools you’re provided
One of the best ways to start cutting your costs is to use the built-in cost control measures of BigQuery.
For specific projects, you can set a soft limit on usage by setting up a billing alert that will hit your administrator’s inbox as you near your monthly usage. You can also specify limits per project or per user using custom quotas.
Beyond setting limitations, BigQuery provides you with relatively easy cost-monitoring tools via the Cloud Console dashboard.
Another critical step is to use the query validator or dry-run to estimate the costs of your query before setting it into motion.
As for setting up your data sets for queries, here are some important tips to follow:
- Partition your data whenever possible so that you can easily dictate which partitioned column you want BigQuery to read
- Reduce the cost of storage by setting expiration dates on your data sets, particularly if you’re conducting experimental queries
- Materialize your query results in stages to limit the amount of data BigQuery reads
- Avoid streaming data as much as you can, since loading data into BigQuery is free but streaming comes with a cost
Together, these tips and tools will go a long way toward helping you avoid unexpected costs in GCP.
They will also help you fully leverage BigQuery for rapid, quality results as you mine your data for insights.
Keep up with Redapt
- Enterprise Infrastructure
- Data & Analytics
- Cloud Adoption
- Cloud Native
- Application Modernization
- Workplace Modernization
- Google Cloud Platform (GCP)
- Multi-Cloud Operations
- Dell EMC
- Security & Governance
- Tech We Like
- Business Transformation
- IoT and Edge
- Managed Services
- Microsoft Azure
- Emerging Tech
- Google Resale