Monitoring Query Costs in BigQuery by user using SQL and the Query Logs

BigQuery is a SAAS Software, you get charged based on the amount of data scanned by your SQL Queries. Monitoring cost is very crucial in a cloud data warehouse. It helps us optimize the bad queries and reports and helps us get the maximum ROI from the data warehouse.  BigQuery recently introduced a new INFORMATION_SCHEMA views for viewing the query logs. This view will give us,

  • List of all the queries executed
  • Who and when was the query executed
  • Query metrics, including bytes processed, time taken, and the actual query itself

Example:

DECLARE timezone STRING DEFAULT "US/Eastern";
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
A.project_id,
DATASET.dataset_id as DATASET,
USER_EMAIL,
DATE(end_time, "America/Tijuana") as end_date,
SUM(ROUND(IF(cache_hit = true,0, total_bytes_billed)/(1024 * 1024))/(1024 * 1024)) AS TOTAL_TB_SCANNED,
SUM(IF(cache_hit = true,0, (ROUND(total_bytes_billed/ (1024 * 1024)) * cost_per_tb_in_dollar)/(1024 * 1024))) as TOTAL_COST_IN_DOLLAR
FROM
(
SELECT
job_id,
J.project_id,
USER_EMAIL,
end_time,
total_bytes_billed,
cache_hit,
ANY_VALUE(REF) AS DATASET
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT J
CROSS JOIN UNNEST(J.referenced_tables) AS REF
WHERE
DATE(end_time, "America/Tijuana") BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY) and CURRENT_DATE()
AND statement_type<>'SCRIPT'
AND REF.DATASET_ID NOT LIKE '_script%'
GROUP BY 1,2,3,4,5,6
)A
GROUP BY 1,2,3,4
ORDER BY TOTAL_COST_IN_DOLLAR DESC;
  • feel free to replace JOBS_BY_PROJECT with JOBS_BY_USER or JOBS_BY_ORGANIZATION

Execute the Query in the Big Query Console:

Usage Notes:

  • the JOBS_BY_* views are regionalized, i.e. we must prefix the region (see region-us in the view specification) and must run the job in that region
  • The views will need additional IAM permission in order to work
  • The data in these views are currently only available for the past 180 days

nVector

posted on 23 Nov 20

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds