Home » Bigquery » Tutorials » Create table

BigQuery Create table

Use the Create table statement to create a new permanent table in BigQuery

Create table example

CREATE TABLE mydataset.Employee
    (
    Id INT64 NOT NULL, 
    Name STRING(300) NOT NULL,
    Join_Date DATE NOT NULL,
    Branch INT64,
    Salary NUMERIC
    )
    PARTITION BY Join_Date
    CLUSTER BY Branch, Id

Create or replace table, if exists

CREATE OR REPLACE TABLE mydataset.Employee
    (
    Id INT64 NOT NULL, 
    Name STRING(300) NOT NULL,
    Join_Date DATE NOT NULL,
    Branch INT64,
    Salary NUMERIC
    )
    PARTITION BY Join_Date
    CLUSTER BY Branch, Id

Create table, if not exists

CREATE TABLE IF NOT EXISTS mydataset.Employee
    (
    Id INT64 NOT NULL, 
    Name STRING(300) NOT NULL,
    Join_Date DATE NOT NULL,
    Branch INT64,
    Salary NUMERIC
    )
    PARTITION BY Join_Date
    CLUSTER BY Branch, Id

Partition table by month

CREATE TABLE mydataset.Employee
    (
    Id INT64 NOT NULL, 
    Name STRING(300) NOT NULL,
    Join_Date DATE NOT NULL,
    Branch INT64,
    Salary NUMERIC
    )
    PARTITION BY DATE_TRUNC(Join_Date, MONTH)
    CLUSTER BY Branch, Id

Partition by Timestamp column

CREATE TABLE mydataset.Employee
    (
    Id INT64 NOT NULL, 
    Name STRING(300) NOT NULL,
    Join_Date DATE NOT NULL,
    Branch INT64,
    Salary NUMERIC,
    Create_TS Timestamp
    )
    PARTITION BY TIMESTAMP_TRUNC(Join_Date, DAY)
    CLUSTER BY Branch, Id

Partition by Integer column

CREATE TABLE mydataset.Employee
    (
    Id INT64 NOT NULL, 
    Name STRING(300) NOT NULL,
    Join_Date DATE NOT NULL,
    Branch INT64,
    Salary NUMERIC,
    Create_TS Timestamp
    )
    PARTITION BY RANGE_BUCKET(Id, GENERATE_ARRAY(1, 100000, 1000))
    CLUSTER BY Branch, Id;

Syntax for Partition by

  • DATE({ timestamp_column | datetime_column }). Partition by a TIMESTAMP or DATETIME column with daily partitions.
  • DATETIME_TRUNC(datetime_column, { DAY | HOUR | MONTH | YEAR }). Partition by a DATETIME column with the specified partitioning type.
  • TIMESTAMP_TRUNC(timestamp_column, { DAY | HOUR | MONTH | YEAR }). Partition by a TIMESTAMP column with the specified partitioning type.
  • DATE_TRUNC(date_column, { MONTH | YEAR }). Partition by a DATE column with the specified partitioning type.
  • RANGE_BUCKET(int64_column, GENERATE_ARRAY(start, end[, interval])). Partition by an integer column with the specified range, where: start is the start of range partitioning, inclusive. end is the end of range partitioning, exclusive. interval is the width of each range within the partition. Defaults to 1.

Explanation

This BigQuery CREATE TABLE example creates a new table called Employee which has 5 columns.

  • The first column is called "Id" which is created as a number datatype and can not contain null values
  • The second column is called Name which is a Varchar datatype (300 maximum characters in length) and also can not contain null values.
  • and so on...

Choosing Partition and Cluster keys: tips

You can select 1 column as part of parition key and upto 4 columns as part of cluster keys

The Partition column is used to split the data and store in respective chunks

  • Choose a column that gets you even data distribution. eg. the partition key results in equal sized partitions
  • Make sure the partition key doesn't result in too many partitions. Having lesser than 4000 partitions is recommended
  • The partition key is frequently used in the filter clause during data retrieval

The cluster by columns are used to sort the data within the parititions

  • Make sure the most popular filter column is specified in the leftmost followed by the less frequently used ones. The column order matters
  • There is no performance or cost overhead for using cluster keys, so feel free to define them in all your tables

Syntax reference

CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ]
[[project_name.]dataset_name.]table_name
[(
  column[, ...]
)]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(table_option_list)]
[AS query_statement]