Column Level access control and security in Google BigQuery
Protecting PII & PHI and other sensitive information and controlling fine grained access to them has always been a challenge in Data Warehousing. Traditionally, we use to create multiple views for different levels of user access. BigQuery elegantly solves this issue with column-level security
- PII - Personally identifiable information eg. SSN, Passport No, Drivers License numbers
- PHI - Protected health information
- PCI - Protected Card information eg. Credit Card Numbers, CVV etc.
What is BigQuery Column Level Security
Using BigQuery Column-level security, you can create policies that check, at query time, whether a user has proper access to read that specific column. Column-level security is enforced on top of the existing dataset level access control. A user needs both dataset permission and policy tag permission in order to access data protected by column-level security
How to setup Column-level security in BigQuery
It all starts with the Data Catalog, which is the data discovery and metadata management service from Google. First, you define a taxonomy (Collection) and policy tags (Individual columns types) in data Catalog
Assign policy tags to your BigQuery columns. In BigQuery Web UI, Click on the table name to show its schema, use schema annotations to assign a policy tag to each column where you want to restrict access
Then click on "Add policy tag" to assign a access policy to that column
The next steps is to use Identity and Access Management (IAM) policies to restrict access to each policy tag. You need to provide "Fine grained access" to the "Policy tag" that you defined on the column to the IAM group in order for the user to query the columns that are level security enforced
Enforcing Column Level security using bq CLI
You can also set the policy tag using the bq update command. Copy the policy id url from the Data Catalog and run the update
[
...
{
"name": "ssn",
"type": "STRING",
"mode": "REQUIRED",
"policyTags": {
"names": ["projects/project-id/locations/location/taxonomies/taxonomy-id/policyTags/policytag-id",]
}
},
...
]
Reading data from tables with Column level security
If you don't have access to the policy, you will not be able to query only the columns with column level security enforced.
- If you have dataset access and has the Data Catalog Fine-Grained Reader role, the column data is available to the user. The user runs a query as normal.
- If you have dataset access but does not have the Data Catalog Fine-Grained Reader role, the column data is not available to you. If you run SELECT *, you will receive an error which lists the column(s) that you cannot access. To resolve the error, you can modify the query to exclude the columns that you don't have access to
SELECT * EXCEPT (ssn) FROM ...
- Views behave the same way. Only when you have access to table and column you can query the view column. otherwise you will have to choose the list of columns instead of a SELECT *
Loading data to a table with Column Level Security
If you have full access (both Table and the Columns) all data load / modify operations will work fine without any issues
If you only have table access and no column level clearance,
- Inserts will work fine
- Updates / Deletes / Merge will not work
- Loading files is not allowed
- Streaming inserts are allowed
Limitations of BigQuery Column Level Security
- If you overwrite the destination table, you will lose the policy tags on the table
- The inability to define the column policies on the table DDLs
- A column can have only one policy tag
BigQuery Column Level Security Pricing
In addition to the standard BigQuery pricing, there is also a cost associated with using the Data Catalog. Data Catalog has two part pricing
1. You pay for the storage size of the metadata in Data Catalog
2. You pay for the Data Catalog API operations
Post Comment