BigQuery ROW_NUMBER Order by Error: Resources exceeded during query execution: The query could not be executed in allotted memory
Error Message: Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 144% of limit. Top memory consumer(s): ORDER BY operations: 100%
Why does this happen?
Even though Big query is a distributed database and most queries are executed on multiple nodes, there are certain operations that requires all data to be processed on a single node due to calculation needs, which means that all the data needed for the calculation needs to exist on the same node. When your query requests for too much data to be processed on a single node that can no longer fit onto that node, you will get the “Resources exceeded during query execution” error and the whole query fails. You will not be billed for any failed queries in BigQuery
Which SQL Operations cause this?
- Un-partitioned window functions like RANK() OVER() or ROW_NUMBER() OVER() will operate on a single node
If your query contains an ORDER BY clause, all the data is still going to be passed to a single node and then sorted
Solution:
If you are using a ROW_NUMBER() OVER() as id to generate a sequence number for a each row in your table. Use the below workaround.
Simply change from
ROW_NUMBER() OVER(ORDER BY eventdate) AS STRING)
to
ROW_NUMBER() OVER(PARTITION BY eventdate) AS STRING)
The above command generates a non-unique value, try the below commands instead if you want a unique key
or
CONCAT(CAST(ROW_NUMBER() OVER(PARTITION BY eventdate) AS STRING),'|', (CAST(event_date AS STRING)) as id
or use
GENERATE_UUID()
nVector
posted on 15 Jan 20Enjoy great content like this and a lot more !
Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds
Post Comment