Caching in Google BigQuery Cloud Data Warehouse
What is BigQuery Caching ?
When you execute a SQL in BigQuery the results are cached for 24 hours. If you happen to rerun the same query within the 24 hours, Results will be served from the cache (There is no charge for results served from cache). Caveat to note: Caching is offered at the user account level, which means the caching benefits are available only to you, If you teammate runs the same query, his query will not utilize your cache, it will scan the disk instead
How does Caching work Internally ?
Under the hood, whenever you execute a SQL, Once BigQuery completes the query processing and fetches the output result set, it automatically creates a temp table and stores the results in it. The next time you execute the same query, BigQuery will directly serve the results from the temp table. (There is no storage cost for the temp tables, and they will be automatically dropped after 24 hours)
What happens to Cache results when the underlying data changes ?
BigQuery Cache results are invalidated when the data in the underlying partition changes. Finally, results are normally retained for 24 hours, after which results query the remote disk
How to disable BigQuery Results Caching?
From the BigQuery WebUI:
- Click on "More" and choose "Query Settings"
- Uncheck "Use Cached Results" Checkbox and hit "Save"
If you are not using the WebUI, You can slightly modify the query by adding a whitespace or changing indentation, it will force the query engine to reprocess the query from disk
BigQuery Caching Best Practices:
- Always enable "Cached Results" in the settings
- Use "Service Accounts" instead of Human accounts for shared dashboards to benefit from caching