Skip to content

GoogleCloudPlatform/terraform-genai-doc-summarization

Repository files navigation

Generative AI Document Summarization

Description

Tagline

Create summaries of a large corpus of documents using Generative AI.

Detailed

This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an end-to-end demonstration of document summarization going all the way from raw documents, detecting text in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Document AI Optical Character Recognition (OCR), and BigQuery.

PreDeploy

To deploy this blueprint you must have an active billing account and billing permissions.

Architecture

Document Summarization using Generative AI

  • User uploads a new document triggering the webhook Cloud Function.
  • Document AI extracts the text from the document file.
  • A Vertex AI Large Language Model summarizes the document text.
  • The document summaries are stored in BigQuery.

Documentation

Deployment Duration

Configuration: 1 mins Deployment: 5 mins

Cost

Cost Details

Inputs

Name Description Type Default Required
disable_services_on_destroy Whether project services will be disabled when the resources are destroyed. bool false no
documentai_location Document AI location, see https://cloud.google.com/document-ai/docs/regions string "us" no
labels A set of key/value label pairs to assign to the resources deployed by this blueprint. map(string) {} no
project_id The Google Cloud project ID to deploy to string n/a yes
region The Google Cloud region to deploy to string "us-central1" no
unique_names Whether to use unique names for resources bool false no

Outputs

Name Description
bigquery_dataset_id The name of the BigQuery dataset created
bucket_docs_name The name of the docs bucket created
bucket_main_name The name of the main bucket created
documentai_processor_id The full Document AI processor path ID
neos_walkthrough_url The URL to launch the in-console tutorial for the Generative AI Document Summarization solution
unique_id The unique ID for this deployment

Requirements

These sections describe requirements for using this module.

Software

The following dependencies must be available:

Service Account

A service account with the following roles must be used to provision the resources of this module:

  • Storage Admin: roles/storage.admin

APIs

A project with the following APIs enabled must be used to host the resources of this module:

  • Google Cloud Storage JSON API: storage-api.googleapis.com

Contributing

Refer to the contribution guidelines for information on contributing to this module.

Security Disclosures

Please see our security disclosure process.