Accessing Big-query (Google Cloud Platform) using command line:

Posted on Sept. 9, 2018
GCP
Big-Query
Google cloud platform
511

This article contains high level information about the cloud shell command to communicate with Cloud Datawarehouse( BigQuery)

Requirements: Project should be created in GCP and must Initialize the cloud shell/Install the cloud SDK.

Either you can use the command line installed (Cloud SDK) on machine or Cloud Shell

bq is a python-based, command-line tool for BigQuery

bq help will gives you complete details of commands that you can use with bq

bq help command : will gives you description about the given command

USAGE: bq [--global_flags] <command> [--command_flags] [args]

Creating datasets:

you can create the datasets/table using bq mk command

bq --location=[LOCATION] mk --dataset --default_table_expiration [INTEGER] --description [DESCRIPTION] [PROJECT_ID]:[DATASET]

ex: bq mk new_dataset.new_table

bq --location=EU mk -d --default_table_expiration 3600 --description "This is my new dataset." [My First Project]:testing_dataset

listing the objects under the dataset :

ex:

satya_apiiit@learning-platform-project:~$ bq ls  learning-platform-project:new_dataset

   tableId        Type       Labels   Time Partitioning
 ------------ ------- -------- -------------------
  firsttable      TABLE
  new_table    TABLE

bq show gives you the metadata level details about dataset/tables

ex:

satya_apiiit@learning-platform-project:~$ bq show learning-platform-project:new_dataset

Dataset learning-platform-project:new_dataset

   Last modified              ACLs             Labels

 ----------------- -------------------------- --------

  08 Sep 19:52:15           Owners:

                                 projectOwners,

                                 satya.apiiit@gmail.com

                                 Writers:

                                projectWriters

                                Readers:

                                projectReaders

Creating a Table:

We can create the table in following ways:

1. Creating an empty table with a schema definition

satya_apiiit@learning-platform-project:~$ bq mk new_dataset.firsttable name:string,value:integer

Table 'learning-platform-project:new_dataset.firsttable' successfully created.

satya_apiiit@learning-platform-project:~$ bq show new_dataset.firsttable

Table learning-platform-project:new_dataset.firsttable

   Last modified         Schema         Total Rows   Total Bytes   Expiration   Time Partitioning   Labels

 ----------------- ------------------- ------------ ------------- ------------ ------------------- --------

  08 Sep 20:05:11    |- name: string          0            0

                              |- value: integer

2. Creating a table from a query result

To create a table from a query result, write the results to a destination table.

bq --location=[LOCATION] query --destination_table [PROJECT_ID]:[DATASET].[TABLE] --use_legacy_sql=false '[QUERY]'

ex:

bq --location=US query --destination_table new_dataset.mytable --use_legacy_sql=false 'SELECT name,number

 FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE gender = "M" ORDER BY number DESC'

3. Loading data into table from csv file:

In the Cloud Shell, create an empty CSV file.

ex: touch customer_transactions.csv

open the csv file, edit & enter the data in it

ex: cloudshell edit customer_transactions.csv

above command will open a new browser window with a code editor and Cloud Shell panel.

enter some comma-separated values to load into BigQuery as below.

 

ID,Zipcode,Timestamp,Amount,Feedback,SKU

c123,78757,2018-02-14 17:01:39Z,1.20,4.7,he4rt5

c456,10012,2018-03-14 15:09:26Z,53.60,3.1,ppiieee

c123,78741,2018-04-01 05:59:47Z,5.98,2.0,ch0c0

Save the file in CSV format and run the below command to load the data from CSV file to table namely customer_transactions_new

ex:

satya_apiiit@learning-platform-project:~$ bq load   --skip_leading_rows=1   new_dataset.customer_transactions_new     ./customer_transactions.csv 

 id:string,zip:string,ttime:timestamp,amount:numeric,fdbk:float,sku:string

Upload complete. Waiting on bqjob_r5bd883f92d4db78d_00000165bd68d959_1 ... (0s) Current status: DONE

Querying the table :          

satya_apiiit@learning-platform-project:~$ bq query 'select count(*) from [bigquery-public-data.samples.shakespeare]'

Waiting on bqjob_r4863e503ca3d0eca_00000165b9c09bf8_1 ... (0s) Current status: DONE

Limiting the row counts:

Command : bq head --max_rows [INTEGER1] --start_row [INTEGER2] --selected_fields "[FIELDS]" [PROJECT_ID]

following commands list all fields in the first 5 rows

bq head --max_rows=5  bigquery-public-data:medicare.inpatient_charges_2011 

following commands list two fields in the first 3 rows

bq head --max_rows=3 --selected_fields "provider_id,provider_name" bigquery-public-data:medicare.inpatient_charges_2011

bq head -n 3 --selected_fields "provider_id ,provider_name" bigquery-public-data:medicare.inpatient_charges_2011

+-------------+-------------------------------------------------------------+

| provider_id |          provider_name                                      |

+-------------+----------------------------------+--------------------------+

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

+-------------+----------------------------------+--------------------------+

following command to list two fields from the 11th row(skip first 10 rows)

bq head --start_row=10 --selected_fields "provider_id,provider_name" bigquery-public-data:medicare.inpatient_charges_2011

following command to list two fields of five rows from the 11th row

bq head --start_row=10 --max_rows=5 --selected_fields "provider_id,provider_name" bigquery-public-data:medicare.inpatient_charges_2011

+-------------+----------------------------------+-----------------------+

| provider_id |          provider_name                                   |

+-------------+----------------------------------+-----------------------+

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

|       20001 | PROVIDENCE ALASKA MEDICAL CENTER |

+-------------+----------------------------------+-----------------------+

Updating table properties:

ex:

satya_apiiit@learning-platform-project:~$ bq update --description "This is firt table in dataset" new_dataset.firsttable

Table 'learning-platform-project:new_dataset.firsttable' successfully updated.

Getting dataset metadata using meta-tables:

BigQuery offers some special tables whose contents represent metadata, such as the list of tables and views in a dataset. The "meta-tables" are read-only. To access metadata about the tables and views in a dataset, use the __TABLES_SUMMARY__ meta-table in a query's SELECT statement.

ex:

satya_apiiit@learning-platform-project:~$ bq query 'select * from new_dataset.__TABLES_SUMMARY__'

Waiting on bqjob_r63a4891ae31e2519_00000165ba5d96bd_1 ... (0s) Current status: DONE  

+---------------------------+-------------+------------+------------+--------------------+---------+

|        project_id           |        dataset_id    |   table_id    | creation_time  |    type  |

+---------------------------+-------------+------------+------------+--------------------+----------+

| learning-platform-project | new_dataset | firsttable  |1536417311869   |     1     |

| learning-platform-project | new_dataset | newview  |1536417475416   |     2     |

+---------------------------+-------------+------------+------------+--------------------+----------+

copying the tables:

copying a single table

satya_apiiit@learning-platform-project:~$ bq cp new_dataset.mytable new_dataset.newtable1

Waiting on bqjob_r6208d23cc5e24855_00000165bd01499d_1 ... (0s) Current status: DONE  

Tables 'learning-platform-project:new_dataset.mytable' successfully copied to 'learning-platform-project:new_dataset.newtable1'

The -n shortcut is used to prevent overwriting a table with the same name.

satya_apiiit@learning-platform-project:~$ bq cp -a new_dataset.mytable new_dataset.firsttable

Waiting on bqjob_r747380c9a4ea12ca_00000165bd0bfc01_1 ... (0s) Current status: DONE  

Tables 'learning-platform-project:new_dataset.mytable' successfully copied to 'learning-platform-project:new_dataset.firsttable'

The -a shortcut is used to append to the destination table.

satya_apiiit@learning-platform-project:~$ bq cp -n new_dataset.mytable new_dataset.firsttable

Table 'learning-platform-project:new_dataset.firsttable' already exists, skipping

copying multiple tables

satya_apiiit@learning-platform-project:~$ bq cp new_dataset.mytable,new_dataset.mytable1 new_dataset.copytable

Waiting on bqjob_r6adee27a4f495748_00000165bd14738b_1 ... (0s) Current status: DONE  

Tables 'learning-platform-project:new_dataset.mytable, learning-platform-project:new_dataset.mytable1' successfully copied to 'learning-platform-project:new_dataset.copytable'

Deleting the tables:

satya_apiiit@learning-platform-project:~$ bq rm new_dataset.mytable1

rm: remove table 'learning-platform-project:new_dataset.mytable1'? (y/N) y

The command uses the -f shortcut to bypass confirmation.

satya_apiiit@learning-platform-project:~$ bq rm -f new_dataset.mytable1

Creating a View:

satya_apiiit@learning-platform-project:~$ bq mk --view='select 1 as num' new_dataset.newview

View 'learning-platform-project:new_dataset.newview' successfully created.

satya_apiiit@learning-platform-project:~$ bq show new_dataset.newview

Table learning-platform-project:new_dataset.newview

   Last modified              Schema                   Type   Expiration   Labels

 ----------------- ---------------------------- ------ ------------ --------

  08 Sep 20:07:55   |- num: integer (required)   VIEW

 

satya_apiiit@learning-platform-project:~$ bq mk --use_legacy_sql=false --expiration 3600 --description "This is my view" --label organization:devel

opment --view 'SELECT name,number FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE gender = "M" ORDER BY number DESC' new_dataset.myview

View 'learning-platform-project:new_dataset.myview' successfully created.

Loading the data into big query table from cloud storage(bucket) :

bq load --source_format=CSV \ new_dataset.names_2011 \ gs://bucket_command/babynames/yob2011.txt \ name:string,gender:string,count:integer

here we are loading the yob2011.txt data into the table names_2011 with the schema mentioned in the command

we can load the multiple files in one go as mentioned below:

bq load --source_format=CSV \ new_dataset.all_names \ gs://bucket_command/babynames/yob*.txt \ name:str

ing,gender:string,count:integer

Exporting a CSV file on cloud storage bucket from big query table:

bq extract bigquery-public-data:samples.shakespeare gs://bucket_command/export/all_names.csv

Waiting on bqjob_r3eaa1d58a72e2b67_00000165b9d92e8c_1 ... (0s) Current status: DONE  

Working in interactive mode:

you can write commands in interactive mode using 'bq shell'.

Adavantage: you dont need to prefix the commands with bq

ex:

satya_apiiit@learning-platform-project:~$ bq shell

Welcome to BigQuery! (Type help for more information.)

learning-platform-project> show new_dataset

Dataset learning-platform-project:new_dataset

   Last modified              ACLs             Labels

 ----------------- -------------------------- --------

  08 Sep 19:52:15        Owners:

                                     projectOwners,

                                    satya.apiiit@gmail.com

                                    Writers:

                                   projectWriters

                                   Readers:

                                   projectReaders

to come out of interactive mode you can use exit command

learning-platform-project> exit

 

 

!! Thanks for Reading ! Keep watching for more topics on Google cloud platform !!

 




0 comments

Please log in to leave a comment.

Share this