Learn Bigquery
Introduction
Architecture
Architecture Compute Storage
Unique Features
Enncryption Column Level Security Time Travel Caching Compression
Database Management
Create Table Create View Create Procedure
Best Practices
Execution plans
Interview Questions
Top Questions
11 Dec 20 · mCollins · #Bigquery ·   Bookmark  

Column Level access control and security in Google BigQuery

Protecting PII & PHI and other sensitive information and controlling fine grained access to them has always been a challenge in Data Warehousing. Traditionally, we use to create multiple views for different levels of user access. BigQuery elegantly solves this issue with column-level security

  • PII - Personally identifiable information eg. SSN, Passport No, Drivers License numbers
  • PHI - Protected health information
  • PCI - Protected Card information eg. Credit Card Numbers, CVV etc.

What is BigQuery Column Level Security

Using BigQuery Column-level security, you can create policies that check, at query time, whether a user has proper access to read that specific column. Column-level security is enforced on top of the existing dataset level access control. A user needs both dataset permission and policy tag permission in order to access data protected by column-level security

How to setup Column-level security in BigQuery

It all starts with the Data Catalog, which is the data discovery and metadata management service from Google. First, you define a taxonomy (Collection) and policy tags (Individual columns types) in data Catalog

Assign policy tags to your BigQuery columns. In BigQuery Web UI, Click on the table name to show its schema, use schema annotations to assign a policy tag to each column where you want to restrict access

Then click on "Add policy tag" to assign a access policy to that column

The next steps is to use Identity and Access Management (IAM) policies to restrict access to each policy tag. You need to provide "Fine grained access" to the "Policy tag" that you defined on the column to the IAM group in order for the user to query the columns that are  level security enforced

Enforcing Column Level security using bq CLI

You can also set the policy tag using the bq update command. Copy the policy id url from the Data Catalog and run the update

[
...
{
"name": "ssn",
"type": "STRING",
"mode": "REQUIRED",
"policyTags": {
"names": ["projects/project-id/locations/location/taxonomies/taxonomy-id/policyTags/policytag-id",]
}
},
...
]

Reading data from tables with Column level security

If you don't have access to the policy, you will not be able to query only the columns with column level security enforced.

  • If you have dataset access and has the Data Catalog Fine-Grained Reader role, the column data is available to the user. The user runs a query as normal.
  • If you have dataset access but does not have the Data Catalog Fine-Grained Reader role, the column data is not available to you. If you run SELECT *, you will receive an error which lists the column(s) that you cannot access. To resolve the error, you can modify the query to exclude the columns that you don't have access to
SELECT * EXCEPT (ssn) FROM ...
  • Views behave the same way. Only when you have access to table and column you can query the view column. otherwise you will have to choose the list of columns instead of a SELECT *

Loading data to a table with Column Level Security

If you have full access (both Table and the Columns) all data load / modify operations will work fine without any issues

If you only have table access and no column level clearance,

  • Inserts will work fine
  • Updates / Deletes / Merge will not work
  • Loading files is not allowed
  • Streaming inserts are allowed

Limitations of BigQuery Column Level Security

  • If you overwrite the destination table, you will lose the policy tags on the table
  • The inability to define the column policies on the table DDLs
  • A column can have only one policy tag

BigQuery Column Level Security Pricing

In addition to the standard BigQuery pricing, there is also a cost associated with using the Data Catalog. Data Catalog has two part pricing

1. You pay for the storage size of the metadata in Data Catalog
2. You pay for the Data Catalog API operations