Using Apache Beam to automate your Preprocessing in Data Science

Photo by Danil Sorokin on Unsplash
Overview of Data Source Structure (Icons: Freepik)
python traffic_beam.py
python traffic_beam.py --query_date 2021-11-17
  • table: This tells GCP where to write our data to
  • custom_gcs_temp_location: This will upload the results first to a temporary GCP storage bucket, before actually writing it to your Bigquery table
  • write_disposition: to append new data (optionally overwrites everything)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Felix Ude

Felix Ude

Data Scientist and Economist — based in Hamburg