试用 Gemini 1.5 模型，体验 Vertex AI 中最新的多模态模型，了解运用 100 万个词元的上下文窗口可以构建怎样的应用。 试用 Gemini 1.5 模型，体验 Vertex AI 中最新的多模态模型，了解运用 100 万个词元的上下文窗口可以构建怎样的应用。

创建长音频

本文档将引导您完成合成长音频的过程。长音频合成在输入上异步合成最多 100 万个字节。如需详细了解 Text-to-Speech 中的基本概念，请阅读 Text-to-Speech 基础知识。

准备工作

您必须先完成以下操作，然后才能向 Text-to-Speech API 发送请求。如需了解详情，请参阅准备工作页面。

在 GCP 项目上启用 Text-to-Speech。
1. 确保已为 Text-to-Speech 启用结算功能。
2. 确保您具有输出 GCS 存储桶的以下 Identity and Access Management (IAM) 角色。
  - Storage Object Creator
  - Storage Object Viewer
安装 Google Cloud CLI，然后通过运行以下命令初始化 Google Cloud CLI：
```
gcloud init
```

使用命令行将文字合成为长音频

您可以通过向 https://texttospeech.googleapis.com/v1beta1/projects/{$project_number}/locations/global:synthesizeLongAudio 端点发出 HTTP POST 请求，将长文本转换为音频。在 POST 命令正文中，指定以下字段。

• voice：要合成的语音类型。

• input.text：要合成的文本。

• audioConfig：要创建的音频类型。

• output_gcs_uri：GCS 输出文件路径，格式为“gs://bucket_name/file_name.wav”。

• parent：父级路径，格式为“projects/{您的项目编号}/locations/{您的项目位置}”。

输入最多可包含 1MB 字符，确切限制因不同的输入而异。

在用于运行合成的项目下创建一个 Google Cloud Storage 存储桶。确保用于运行合成的服务账号拥有对输出 GCS 存储桶的读写权限。

在命令行执行以下命令以使用 Text-to-Speech 从文本合成音频。该命令使用 gcloud auth application-default print-access-token 命令检索请求的授权令牌。

确保运行 GET 操作的服务账号具有 Text-to-Speech Editor 角色。

HTTP 方法和网址：

POST https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio

请求 JSON 正文：

{
  "parent": "projects/12345/locations/global",
  "audio_config":{
      "audio_encoding":"LINEAR16"
  },
  "input":{
      "text":"hello"
  },
  "voice":{
      "language_code":"en-us",
      "name":"en-us-Standard-A"
  },
  "output_gcs_uri": "gs://bucket_name/file_name.wav"
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio"

PowerShell (Windows)

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 0,
    "startTime": "2022-12-20T00:46:56.296191037Z",
    "lastUpdateTime": "2022-12-20T00:46:56.296191037Z"
  },
  "done": false
}

REST 命令的 JSON 输出包含 name 字段中长时间运行的操作的名称。在命令行执行下面的 REST 请求，以查询长时间运行的操作的状态。

确保运行 GET 操作的服务账号与用于合成的账号位于同一项目中。

HTTP 方法和网址：

GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

执行以下命令：

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456"

PowerShell (Windows)

执行以下命令：

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/12345/locations/global/operations/23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 100
  },
  "done": true
}

查询在给定项目下运行的所有操作的列表，并执行下面的 REST 请求。

确保运行 LIST 操作的服务账号与用于合成的账号位于同一项目中。

HTTP 方法和网址：

GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

执行以下命令：

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations"

PowerShell (Windows)

执行以下命令：

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "operations": [
    {
      "name": "12345",
      "done": false
    },
    {
      "name": "23456",
      "done": false
    }
  ],
  "nextPageToken": ""
}

长时间运行的操作成功完成后，在 output_gcs_uri 字段的给定存储桶 URI 中找到输出音频文件。如果操作未成功完成，请使用 GET REST 命令找到错误，更正错误，然后再次发出 RPC。

使用客户端库将文字合成为长音频

安装客户端库

Python

在安装库之前，请确保已经为 Python 开发准备好环境。

pip install --upgrade google-cloud-texttospeech

创建音频数据

您可以使用 Text-to-Speech 来创建合成人类语音的长音频文件。使用以下代码在 GCS 存储桶中创建长音频文件。

Python

在运行该示例之前，请确保已经为 Python 开发准备好环境。

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from google.cloud import texttospeech

def synthesize_long_audio(project_id, location, output_gcs_uri):
    """
    Synthesizes long input, writing the resulting audio to `output_gcs_uri`.

    Example usage: synthesize_long_audio('12345', 'us-central1', 'gs://{BUCKET_NAME}/{OUTPUT_FILE_NAME}.wav')

    """
    # TODO(developer): Uncomment and set the following variables
    # project_id = 'YOUR_PROJECT_ID'
    # location = 'YOUR_LOCATION'
    # output_gcs_uri = 'YOUR_OUTPUT_GCS_URI'

    client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

    input = texttospeech.SynthesisInput(
        text="Test input. Replace this with any text you want to synthesize, up to 1 million bytes long!"
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    )

    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", name="en-US-Standard-A"
    )

    parent = f"projects/{project_id}/locations/{location}"

    request = texttospeech.SynthesizeLongAudioRequest(
        parent=parent,
        input=input,
        audio_config=audio_config,
        voice=voice,
        output_gcs_uri=output_gcs_uri,
    )

    operation = client.synthesize_long_audio(request=request)
    # Set a deadline for your LRO to finish. 300 seconds is reasonable, but can be adjusted depending on the length of the input.
    # If the operation times out, that likely means there was an error. In that case, inspect the error, and try again.
    result = operation.result(timeout=300)
    print(
        "\nFinished processing, check your GCS bucket to find your audio file! Printing what should be an empty result: ",
        result,
    )

清理

为避免产生不必要的 Google Cloud Platform 费用，请使用 Google Cloud 控制台删除您不需要的项目。

后续步骤

如需详细了解 Cloud Text-to-Speech，请阅读基础知识。
查看可用于合成语音的可用语音列表。