BigQuery Migration Interview questions

Migrating data from on premise to the Cloud

To migrate the data from on-premise to the cloud, use Google "Transfers". It involves installing an agent provided by google in one of the VMs in on premise. The Google data transfer service communicates with the agent and initiates any transfer. Here's the general process

  • Create a new transfer in the web user interface. You specify the source, target and frequency for the data transfer
  • The Google data transfer service talks to the agent
  • The agent extracts the data from the on-premise data warehouse and uploads it to a cloud storage bucket and eventually inserts into a BigQuery table

Migrating database tables, views and procedures

  • Understand the differences in datatypes, functions and dialects between the source and the target database (BigQuery)
  • Document the list of objects
  • Convert the objects (Combination of manual efforts and using some automated code conversion tools)
  • Create the objects in the BigQuery dataset

Cloud Data Migration Testing Strategy

Data testing can be heavily automated using tools like python scripts. Its a good rule of thumb to test the data thrice

  • Once the history data is copied over to the cloud. Compare the history data between the cloud and the on-premise database
  • Once the day-1 data is loaded to the cloud using ETL. Compare the day-1 data between the cloud and the on-premise database
  • Once the day-2 data is loaded to the cloud using ETL. Compare the day-2 data between the cloud and the on-premise database

Teams involved in a Cloud migration project

The upfront ground work is done by the,

  • The Security team (to review and establish security best practices)
  • The Networking team (to setup the VPNs, bridges, whitelisting firewall settings, etc)
  • The Cloud Vendor team
  • The Leadership team

The ongoing cloud migration project consists of the core team,

  • The Infrastructure team (to setup projects, datasets, roles and permissions)
  • The DBA team (to create, validate and migrate the database objects and copy data)
  • The ETL developers (to repoint the existing ETL workloads to the cloud)
  • The QAS testing team (to compare the data and make sure)

Cloud Migration Best Practices

  • Heavily invest in building reusable generic frameworks and components that can accelerate the migration process
  • Do frequent scanning of data to ensure data protection
  • Involve all stakeholders, especially the end users (line of business), we need the users to adopt the platform for a successful cloud migration project
  • Involve the admins and experts from the consumption tools (Tableau, Qlik, PowerBI, SAS, Python, R...) to asses the impact
  • Provide ample time for the users to learn and adopt to the new platform
  • Provide ample training and technical support for the end users to learn and switch to the new platform