Hashing functions in Google Bigquery (MD5, SHA1..) - Generate Unique Id concatenating multiple fields
A hash function can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes and they do not change over time (are constant). Hashing is a one way algorithm, you cannot decrypt a hashed key.
Why do you need to generate Hashed keys?
In certain tables, the key columns can contain sensitive data like SSN, or other PII. Instead of exposing them as join keys, its a good idea to hash them and then store them in the table as the join field
In this guide we will go through the various hashing functions in BigQuery
FARM_FINGERPRINT
This function hashes the input string and returns a fixed integer. The output of this function for a particular input will never change.
WITH example AS (
SELECT 1 AS x, "foo" AS y, true AS z UNION ALL
SELECT 2 AS x, "apple" AS y, false AS z UNION ALL
SELECT 3 AS x, "" AS y, true AS z
)
SELECT
*,
FARM_FINGERPRINT(CONCAT(CAST(x AS STRING), y, CAST(z AS STRING)))
AS row_fingerprint
FROM example;
+---+-------+-------+----------------------+
| x | y | z | row_fingerprint |
+---+-------+-------+----------------------+
| 1 | foo | true | -1541654101129638711 |
| 2 | apple | false | 2794438866806483259 |
| 3 | | true | -4880158226897771312 |
+---+-------+-------+----------------------+
MD5 Algorithm
The MD5 Algorithm computes the hash of the input. The input can either be STRING or BYTES. The string version treats the input as an array of bytes. This function returns 16 bytes hashed output
SELECT MD5("Hello World") as md5;
+--------------------------+
| md5 |
+--------------------------+
| sQqNsWTgdUEFt6mb5y4/5Q== |
+--------------------------+
SHA1 Algorithm
Computes the hash of the input using the SHA-1 algorithm. The input can either be STRING or BYTES. The string version treats the input as an array of bytes. This function returns 20 bytes
SELECT SHA1("Hello World") as sha1;
+------------------------------+
| sha1 |
+------------------------------+
| Ck1VqNd45QIvq3AZd8XYQLvEhtA= |
+------------------------------+
SHA256 & SHA512 Algorithm
Computes the hash of the input using the SHA-256 / 512 algorithms respectively. The input can either be STRING or BYTES. The string version treats the input as an array of bytes. This function returns 32 bytes
SELECT SHA256("Hello World") as sha256;
SELECT SHA512("Hello World") as sha512;
nVector
posted on 05 Sep 20Enjoy great content like this and a lot more !
Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds
Post Comment