Home » Bigquery » Tutorials » Create table

BigQuery Create table

Use the Create table statement to create a new permanent table in BigQuery

Create table example

CREATE TABLE mydataset.Employee\n    (\n    Id INT64 NOT NULL, \n    Name STRING(300) NOT NULL,\n    Join_Date DATE NOT NULL,\n    Branch INT64,\n    Salary NUMERIC\n    )\n    PARTITION BY Join_Date\n    CLUSTER BY Branch, Id
\n\n

Create or replace table, if exists

\n
CREATE OR REPLACE TABLE mydataset.Employee\n    (\n    Id INT64 NOT NULL, \n    Name STRING(300) NOT NULL,\n    Join_Date DATE NOT NULL,\n    Branch INT64,\n    Salary NUMERIC\n    )\n    PARTITION BY Join_Date\n    CLUSTER BY Branch, Id
\n\n

Create table, if not exists

\n
CREATE TABLE IF NOT EXISTS mydataset.Employee\n    (\n    Id INT64 NOT NULL, \n    Name STRING(300) NOT NULL,\n    Join_Date DATE NOT NULL,\n    Branch INT64,\n    Salary NUMERIC\n    )\n    PARTITION BY Join_Date\n    CLUSTER BY Branch, Id
\n\n

Partition table by month

\n
CREATE TABLE mydataset.Employee\n    (\n    Id INT64 NOT NULL, \n    Name STRING(300) NOT NULL,\n    Join_Date DATE NOT NULL,\n    Branch INT64,\n    Salary NUMERIC\n    )\n    PARTITION BY DATE_TRUNC(Join_Date, MONTH)\n    CLUSTER BY Branch, Id
\n\n

Partition by Timestamp column

\n
CREATE TABLE mydataset.Employee\n    (\n    Id INT64 NOT NULL, \n    Name STRING(300) NOT NULL,\n    Join_Date DATE NOT NULL,\n    Branch INT64,\n    Salary NUMERIC,\n    Create_TS Timestamp\n    )\n    PARTITION BY TIMESTAMP_TRUNC(Join_Date, DAY)\n    CLUSTER BY Branch, Id
\n\n

Partition by Integer column

\n
CREATE TABLE mydataset.Employee\n    (\n    Id INT64 NOT NULL, \n    Name STRING(300) NOT NULL,\n    Join_Date DATE NOT NULL,\n    Branch INT64,\n    Salary NUMERIC,\n    Create_TS Timestamp\n    )\n    PARTITION BY RANGE_BUCKET(Id, GENERATE_ARRAY(1, 100000, 1000))\n    CLUSTER BY Branch, Id;\n
\n

Syntax for Partition by

\n
    \n
  • DATE({ timestamp_column | datetime_column }). Partition by a TIMESTAMP or DATETIME column with daily partitions.
  • \n
  • DATETIME_TRUNC(datetime_column, { DAY | HOUR | MONTH | YEAR }). Partition by a DATETIME column with the specified partitioning type.
  • \n
  • TIMESTAMP_TRUNC(timestamp_column, { DAY | HOUR | MONTH | YEAR }). Partition by a TIMESTAMP column with the specified partitioning type.
  • \n
  • DATE_TRUNC(date_column, { MONTH | YEAR }). Partition by a DATE column with the specified partitioning type.
  • \n
  • RANGE_BUCKET(int64_column, GENERATE_ARRAY(start, end[, interval])). Partition by an integer column with the specified range, where: start is the start of range partitioning, inclusive. end is the end of range partitioning, exclusive. interval is the width of each range within the partition. Defaults to 1.
  • \n

Explanation

This BigQuery CREATE TABLE example creates a new table called Employee which has 5 columns.

\n
    \n
  • The first column is called "Id" which is created as a number datatype and can not contain null values
  • \n
  • The second column is called Name which is a Varchar datatype (300 maximum characters in length) and also can not contain null values.
  • \n
  • and so on...
  • \n
\n

Choosing Partition and Cluster keys: tips

\n

You can select 1 column as part of parition key and upto 4 columns as part of cluster keys

\n

The Partition column is used to split the data and store in respective chunks

\n
    \n
  • Choose a column that gets you even data distribution. eg. the partition key results in equal sized partitions
  • \n
  • Make sure the partition key doesn't result in too many partitions. Having lesser than 4000 partitions is recommended
  • \n
  • The partition key is frequently used in the filter clause during data retrieval
  • \n
\n

The cluster by columns are used to sort the data within the parititions

\n
    \n
  • Make sure the most popular filter column is specified in the leftmost followed by the less frequently used ones. The column order matters
  • \n
  • There is no performance or cost overhead for using cluster keys, so feel free to define them in all your tables
  • \n

Syntax reference

CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ]\n[[project_name.]dataset_name.]table_name\n[(\n  column[, ...]\n)]\n[PARTITION BY partition_expression]\n[CLUSTER BY clustering_column_list]\n[OPTIONS(table_option_list)]\n[AS query_statement]