Bigquery Error: Table too large for JOIN. Consider using JOIN EACH

Problem:

I'm trying to run a query in BigQuery which has joins and a Group By Clause, It throws the error:

\n\n
Table too large for JOIN. Consider using JOIN EACH. For more details, \nplease see https://developers.google.com/bigquery/docs/query-reference#joins
\n\n

Below is the problematic SQL,

\n\n
select\nt1.device_uuid,\nt1.session_uuid,\nt1.nth,\nt1.Diamonds_Launch,\nt2.Diamonds_Close\nfrom (\n    select\n    device_uuid,\n    session_uuid,\n    nth,\n    sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Launch\n    from [project_id].[table_id]\n    where name = 'App Launch'\n    and attributes.Name = 'Inventory - Diamonds'\n    group by device_uuid, session_uuid, nth\n    ) as t1\njoin each (\n    select\n    device_uuid,\n    session_uuid,\n    nth,\n    sum(cast([project_id].[table_id].attributes.Value as integer)) as Diamonds_Close\n    from [project_id].[table_id]\n    where name = 'App Close'\n    and attributes.Name = 'Inventory - Diamonds'\n    group by device_uuid, session_uuid, nth\n    ) as t2\non t1.device_uuid = t2.device_uuid\nand t1.session_uuid = t2.session_uuid\n

Solution:

The Quey has got a GROUP BY inside a JOIN EACH. "GROUP BY" hits limits with cardinality (the number of distinct values) and the final grouping is not parallelizable. This limits BigQuery's ability to do the join.

\n\n

If you change the GROUP BY to GROUP EACH BY, this will most likely work

dan-irving

posted on

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds