{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2024 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "0etRtS83RcWS" }, "source": [ "# Gemini API: Audio Quickstart\n", "\n", "\n", " \n", "
\n", " Run in Google Colab\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "r1IzNLho-NqV" }, "source": [ "This notebook provides an example of how to prompt Gemini 1.5 Flash using an audio file. In this case, you'll use a [sound recording](https://www.jfklibrary.org/asset-viewer/archives/jfkwha-006) of President John F. Kennedy’s 1961 State of the Union address." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "id": "Y6eH_Aq_NyNi" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/142.2 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r", "\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━━\u001b[0m \u001b[32m133.1/142.2 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m142.2/142.2 kB\u001b[0m \u001b[31m3.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/664.5 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r", "\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m368.6/664.5 kB\u001b[0m \u001b[31m11.0 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r", "\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m \u001b[32m655.4/664.5 kB\u001b[0m \u001b[31m10.8 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.5/664.5 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25h" ] } ], "source": [ "!pip install -q -U google-generativeai" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "id": "LSe1pMEpR2L2" }, "outputs": [], "source": [ "import google.generativeai as genai" ] }, { "cell_type": "markdown", "metadata": { "id": "TXiv-NeZR5WA" }, "source": [ "## Configure your API key\n", "\n", "To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "id": "dm-iaNMGPdid" }, "outputs": [], "source": [ "from google.colab import userdata\n", "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')\n", "\n", "genai.configure(api_key=GOOGLE_API_KEY)" ] }, { "cell_type": "markdown", "metadata": { "id": "2YoxMrCdR7hf" }, "source": [ "## Upload an audio file with the File API\n", "\n", "To use an audio file in your prompt, you must first upload it using the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).\n" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "id": "OHvNLws4RRjx" }, "outputs": [], "source": [ "URL = \"https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3\"" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "id": "Cxq31LDwSFH6" }, "outputs": [], "source": [ "!wget -q $URL -O sample.mp3" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "id": "MAObE0BpaAwG" }, "outputs": [], "source": [ "your_file = genai.upload_file(path='sample.mp3')" ] }, { "cell_type": "markdown", "metadata": { "id": "m01XDoo4UQvN" }, "source": [ "## Use the file in your prompt" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "id": "YmISEsqpafRb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "## Summary of President John F. Kennedy's 1961 State of the Union Address:\n", "\n", "**Domestic Concerns:**\n", "\n", "* The address primarily focused on the concerning state of the American economy, highlighting issues like recession, unemployment, and falling farm incomes.\n", "* Kennedy pledged to address these issues through measures such as improving unemployment benefits, expanding food assistance programs, and stimulating economic growth.\n", "* He acknowledged other domestic problems like inadequate housing, education, and healthcare, promising to introduce new programs and initiatives to tackle them.\n", "\n", "**International Challenges:**\n", "\n", "* Kennedy emphasized the rising tensions of the Cold War and the threat posed by communist expansion in Asia, Africa, and Latin America.\n", "* He reaffirmed the nation's commitment to containing communism and supporting allies across the globe.\n", "* He proposed a multifaceted approach involving strengthening the military, improving economic aid programs, and utilizing diplomacy to achieve international stability. \n", "\n", "**Specific Actions:**\n", "\n", "* Kennedy outlined plans to bolster the nation's military capabilities by increasing airlift capacity, expanding the Polaris submarine program, and accelerating missile development.\n", "* He advocated for a new and more effective foreign aid program to assist developing nations and promote economic growth in the non-communist world. \n", "* He expressed the desire for increased cooperation with the Soviet Union in areas like scientific exploration and weather prediction, while also remaining firm against communist aggression and subversion.\n", "* He pledged support for the United Nations as a crucial instrument for maintaining peace and international order.\n", "\n", "**Overall Tone and Message:**\n", "\n", "* Despite acknowledging the critical challenges faced by the nation, Kennedy's address maintained a tone of optimism and determination.\n", "* He called upon the American people and government to rise to the occasion, embracing the spirit of sacrifice and service to overcome these obstacles.\n", "* He emphasized the importance of unity, perseverance, and dedication to the national interest in navigating the turbulent years ahead. \n", "\n" ] } ], "source": [ "prompt = \"Listen carefully to the following audio file. Provide a brief summary.\"\n", "model = genai.GenerativeModel('models/gemini-1.5-flash')\n", "response = model.generate_content([prompt, your_file])\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "ln36O5eNLltg" }, "source": [ "## Inline Audio" ] }, { "cell_type": "markdown", "metadata": { "id": "AVaX93lvLqQB" }, "source": [ "For small requests you can inline the audio data into the request, like you can with images. Use PyDub to trim the first 10s of the audio:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "id": "XLZT7rrzLpzp" }, "outputs": [], "source": [ "!pip install -Uq pydub" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "id": "0Kbji0xRMIhr" }, "outputs": [], "source": [ "from pydub import AudioSegment" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "id": "umFIVVlHLlQD" }, "outputs": [], "source": [ "sound = AudioSegment.from_mp3(\"sample.mp3\")" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "id": "hKoLR5mdMMdn" }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sound[:10000] # slices are in ms" ] }, { "cell_type": "markdown", "metadata": { "id": "66MNT0mFP4x-" }, "source": [ "Add it to the list of parts in the prompt:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "id": "420qRCkGNg9j" }, "outputs": [], "source": [ "response = model.generate_content([\n", " \"Please transcribe this recording:\",\n", " {\n", " \"mime_type\": \"audio/mp3\",\n", " \"data\": sound[:10000].export().read()\n", " }\n", "])" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "id": "Rhpeo_3uPLtq" }, "outputs": [ { "data": { "text/markdown": [ "## Transcription of Recording:\n", "\n", "\"The President's State of the Union Address to a joint session of the Congress from the rostrum of the House of Representatives...\" \n" ], "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython import display\n", "\n", "display.Markdown(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "WVFm2MOLWJO5" }, "source": [ "## Count audio tokens\n", "\n", "You can count the number of tokens in your audio file like this." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "O0xk2-6CWLfC" }, "outputs": [ { "data": { "text/plain": [ "total_tokens: 83552" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.count_tokens([your_file])" ] }, { "cell_type": "markdown", "metadata": { "id": "zxxIUR8SV6dK" }, "source": [ "## Learning more" ] }, { "cell_type": "markdown", "metadata": { "id": "zudj6gxEWR2Q" }, "source": [ "* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.\n", "\n", "* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length for audio files." ] } ], "metadata": { "colab": { "name": "Audio.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }