Detailed impressions written back at the hotel after finishing Google I/O 2024 1st day. (Obviously, these are subjective thoughts)
1. Google mentioned "Gemini" more often than "AI" this time.
(Sundar Pichai mentioned AI 121 times.)
2. Google is focusing on integrating Gemini Multimodality across all areas, covering both recognition and generation of video, audio, and text.
3. Increasing Gemini's token size seems aimed at detailed personalized info and integrated services. This was clear in Project Astra with real-time Gemini Pro integration and Context Caching, which saves costs and speeds up processing.
- Comparing Google’s Gemini and OpenAI’s GPT:
- Both have significant multimodal achievements. OpenAI leads in generative capabilities, while Google excels in token range. OpenAI has better API integration, but Google is superior in on-device integration.
- OpenAI's massive model has high reasoning abilities, but Google's strategy includes:
3-1. On-device capabilities (Nano in Android OS, circle image search on Samsung phones, PaliGemma). Google’s mobile OS and partnerships give it an edge.
3-2. Workspace (Docs, Sheets, Slides, Email, Photos). Examples include organizing receipts in Drive and Sheets or Google Photos creating albums based on specific moments, showing the large model’s business applications.
3-3. As a search engine company, Google is developing technology to verify data with search engine data to prevent hallucinations, giving it a competitive edge.
4. The realtime capability of the multimodal model was surprising. I used the Project Astra demo; multimodal audio recognition was accurate and real-time, with seamless interrupt handling. The process wasn’t in sync but handled with low latency, showing significant improvements in usability for voice interfaces and multimodal models.
5. The main takeaway from this I/O was that multimodal models can be responsive enough for actual products and are currently available.
6. Examples were provided of how Gemini can integrate into Chrome, Search, Android Mobile, and Google Workspace to enhance user convenience. ("Here's an example" was a common phrase.) Integrating these elements with multimodal capabilities could significantly change everyday life.
I have a Gemini Office Hour with a Googler tomorrow, full of questions about Gemini’s multimodal latency. Until 2 PM during this event, I didn't hear keywords like TensorFlow/Jax, and there was little talk about the Cloud. Vertex AI was mentioned occasionally with Gemini, and the rest was about TPU 6th generation.
7. The core question was whether multimodal models could positively impact our lives. Even now, there are no significant use cases beyond chatbots. Who will make this a reality? This has been my interest, and at this I/O, I saw numerous examples. If Google had integrated chat and document apps, we’d spend most of our time with Gemini. For code, it seems to be entering with Project IDX, sharing portions of the IDE.
Technical AI Sales Expert | Building the Bridge Between Technology and Business
2wHad a chance to try Burak, agree totally nailed it. Can’t wait to see where these come into play in the CX world.