Create Google Drive external tables
This document describes how to create an external table over data stored in Google Drive.
BigQuery supports external tables over both personal Drive files and shared files. For more information on Drive, see Drive training and help.
You can create external tables over files in Drive that have the following formats:
- Comma-separated values (CSV)
- Newline-delimited JSON
- Avro
- Google Sheets
Before you begin
Before you create an external table, gather some information and make sure you have permission to create the table.
Retrieve Drive URIs
To create an external table for a Google Drive data source, you must provide the Drive URI. You can retrieve the Drive URI directly from the URL of your Drive data:
URI format
https://docs.google.com/spreadsheets/d/FILE_ID
or
https://drive.google.com/open?id=FILE_ID
where FILE_ID
is the alphanumeric ID for your
Drive file.
Enable Drive access
Accessing data hosted within Drive requires an additional OAuth scope. While not enabled by default, it can be included in the Google Cloud console, in the bq command-line tool, or via the API via the following mechanisms:
Console
Follow the web-based authentication steps when creating a permanent table in the Google Cloud console. When you are prompted, click Allow to give BigQuery Client Tools access to Drive.
gcloud
To enable Drive access:
Enter the following command to ensure that you have the latest version of the Google Cloud CLI.
gcloud components update
Enter the following command to authenticate with Drive.
gcloud auth login --enable-gdrive-access
API
If you are using the BigQuery API, request the OAuth scope for Drive in addition to the scope for BigQuery.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Required roles
To create an external table, you need the bigquery.tables.create
BigQuery Identity and Access Management (IAM) permission.
Each of the following predefined Identity and Access Management roles includes this permission:
- BigQuery Data Editor (
roles/bigquery.dataEditor
) - BigQuery Data Owner (
roles/bigquery.dataOwner
) - BigQuery Admin (
roles/bigquery.admin
)
If you are not a principal in any of these roles, ask your administrator to grant you access or to create the external table for you.
For more information on Identity and Access Management roles and permissions in BigQuery, see Predefined roles and permissions.
Create external tables
You can create a permanent table linked to your external data source by:
- Using the Google Cloud console
- Using the bq command-line tool's
mk
command - Creating an
ExternalDataConfiguration
when you use thetables.insert
API method - Using the client libraries
To create an external table:
Console
- In the Google Cloud console, open the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
Expand the
Actions option and click Open.In the details panel, click Create table
.On the Create table page, in the Source section:
For Create table from, select Drive.
In the Select Drive URI field, enter the Drive URI. Note that wildcards are not supported for Drive URIs.
For File format, select the format of your data. Valid formats for Drive data include:
- Comma-separated values (CSV)
- Newline delimited JSON
- Avro
- Sheets
(Optional) If you choose Sheets, in the Sheet range (Optional) box, specify the sheet and cell range to query. You can specify a sheet name, or you can specify
sheet_name!top_left_cell_id:bottom_right_cell_id
for a cell range; for example, "Sheet1!A1:B20". If Sheet range is not specified, the first sheet in the file is used.On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset, and in the Table name field, enter the name of the table you're creating in BigQuery.
Verify that Table type is set to External table.
In the Schema section, enter the schema definition.
- For JSON or CSV files, you can check the Auto-detect option to enable schema auto-detect. Auto-detect is not available for Datastore exports, Firestore exports, and Avro files. Schema information for these file types is automatically retrieved from the self-describing source data.
- Enter schema information manually by:
- Enabling Edit as text and entering the table schema as a JSON
array.
Note: You can view the schema of an existing table in JSON
format by entering the following command in the bq command-line tool:
bq show --format=prettyjson DATASET.TABLE
. - Using Add field to manually input the schema.
- Enabling Edit as text and entering the table schema as a JSON
array.
Note: You can view the schema of an existing table in JSON
format by entering the following command in the bq command-line tool:
Click Create table.
If necessary, select your account and then click Allow to give the BigQuery client tools access to Drive.
You can then run a query against the table as if it were a standard BigQuery table, subject to the limitations on external data sources.
After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Sheets. See Download, save, and export data for more information.
bq
You create a table in the bq command-line tool using the bq mk
command. When you use
the bq command-line tool to create a table linked to an external data source, you can
identify the table's schema using:
- A table definition file (stored on your local machine)
- An inline schema definition
- A JSON schema file (stored on your local machine)
To create a permanent table linked to your Drive data source using a table definition file, enter the following command.
bq mk \ --external_table_definition=DEFINITION_FILE \ DATASET.TABLE
Where:
DEFINITION_FILE
is the path to the table definition file on your local machine.DATASET
is the name of the dataset that contains the table.TABLE
is the name of the table you're creating.
For example, the following command creates a permanent table named mytable
using a table definition file named mytable_def
.
bq mk --external_table_definition=/tmp/mytable_def mydataset.mytable
To create a permanent table linked to your external data source using an inline schema definition, enter the following command.
bq mk \ --external_table_definition=SCHEMA@SOURCE_FORMAT=DRIVE_URI \ DATASET.TABLE
Where:
SCHEMA
is the schema definition in the formatFIELD:DATA_TYPE,FIELD:DATA_TYPE
.SOURCE_FORMAT
isCSV
,NEWLINE_DELIMITED_JSON
,AVRO
, orGOOGLE_SHEETS
.DRIVE_URI
is your Drive URI.DATASET
is the name of the dataset that contains the table.TABLE
is the name of the table you're creating.
For example, the following command creates a permanent table named sales
linked to a Sheets file stored in Drive with the following
schema definition: Region:STRING,Quarter:STRING,Total_sales:INTEGER
.
bq mk \
--external_table_definition=Region:STRING,Quarter:STRING,Total_sales:INTEGER@GOOGLE_SHEETS=https://drive.google.com/open?id=1234_AbCD12abCd \
mydataset.sales
To create a permanent table linked to your external data source using a JSON schema file, enter the following command.
bq mk \ --external_table_definition=SCHEMA_FILE@SOURCE_FORMAT=DRIVE_URI \ DATASET.TABLE
Where:
SCHEMA_FILE
is the path to the JSON schema file on your local machine.SOURCE_FORMAT
isCSV
,NEWLINE_DELIMITED_JSON
,AVRO
, orGOOGLE_SHEETS
.DRIVE_URI
is your Drive URI.DATASET
is the name of the dataset that contains the table.TABLE
is the name of the table you're creating.
For example, the following command creates a table named sales
linked
to a CSV file stored in Drive using the /tmp/sales_schema.json
schema file.
bq mk \
--external_table_definition=/tmp/sales_schema.json@CSV=https://drive.google.com/open?id=1234_AbCD12abCd \
mydataset.sales
After the permanent table is created, you can then run a query against the table as if it were a standard BigQuery table, subject to the limitations on external data sources.
After your query completes, you can download the results as CSV or JSON, save the results as a table, or save the results to Sheets. See Download, save, and export data for more information.
API
Create an ExternalDataConfiguration
when you use the tables.insert
API method. Specify the schema
property or set the autodetect
property
to true
to enable schema auto detection for supported data sources.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Query external tables
For more information, see Query Drive data.
The _FILE_NAME pseudo-column
Tables based on external data sources provide a pseudo column named _FILE_NAME
. This
column contains the fully qualified path to the file to which the row belongs. This column is
available only for tables that reference external data stored in Cloud Storage
and Google Drive.
The _FILE_NAME
column name is reserved, which means that you cannot
create a column by that name in any of your tables.