OpenAI on May 13 announced GPT-4o, its maiden artificial intelligence model with multimodal capabilities to reason across audio, visual and text. A day after, on May 14, Google kicked off its annual developers-centric conference (Google I/O) with announcements focused on AI integration with its platforms and services, including Android and Search. These two technology behemoths have set the bar high with AI-focused events, but all eyes are now on Apple, which has scheduled its annual worldwide developers conference (WWDC) for June 10.

Apple has been trailing in the AI space while the competition has made significant strides. OpenAI, for example, made its most advanced GPT-4o model free for all. It also announced a dedicated app for its AI chatbot ChatGPT for Apple’s macOS. While the ChatGPT app has been available on iPhones, the macOS app brings deeper integration into the desktop platform. With the macOS app, ChatGPT users will be able to take a screenshot of what’s on the display and share it directly with the chatbot for discussion. This gives OpenAI an early mover advantage in Apple’s ecosystem, especially due to the lack of a native alternative.

Apple’s exploration of the AI space, reportedly, began years ago. However, the company accelerated the development process only after the AI technology jumped onto the mainstream, fuelled by OpenAI’s ChatGPT, Microsoft’s Copilot, and Google’s Gemini.

Apple is playing a catch-up game with big technology rivals in the AI space, but it may change come June 10 when it is poised to lay out a strategy for AI at WWDC. A hint of it was made at Apple’s May 7 launch event where it debuted the iPad Pro and iPad Air, with the former featuring its next-generation M4 silicon with a new 16-core neural engine. The company said this new neural engine or Neural Processing Unit (NPU) makes the “M4 an outrageously powerful chip for AI.” This was the first instance where Apple mentioned “AI” in its event.

Earlier, at Apple’s quarterly earnings call on May 2, Apple’s CEO Tim Cook pointed out generative artificial intelligence as the company’s next frontier. He said Apple continues to make significant investments in generative AI and that the company will share “some very exciting things” soon.

These instances along with the fact that researchers at Apple have been continuously publishing papers on new generative AI tools suggest that Apple is poised to enter the AI space very soon. However, if it will be able to catch up with the competition is to be seen. For context, Bloomberg has reported that Apple has been in talks with Google and OpenAI to bring AI features to iOS 18 for iPhones. A recent report from Bloomberg stated that Apple has closed in on an agreement with OpenAI to use its AI technology on the iPhone. According to the report both the companies are finalising terms for a pact to use ChatGPT features in iOS 18, Apple’s next operating system for iPhone that is likely to be unveiled at the WWDC 2024.

OpenAI GPT4o: What’s new

OpenAI’s GPT-4o is said to be its most advanced AI model for understanding and interpreting texts, images and audio. The model is capable of taking any combination of multimodal inputs and produces output along the same lines. OpenAI said that the GPT-4o model can respond to audio inputs in 232 milliseconds, which it said is similar to a human’s response time during a conversation. The model also brings improvements in understanding images. For example, if a user shares an image of a food menu, the AI chatbot powered by the GPT-4o model can translate it, learn about the food’s history, and get recommendations based on it. The model also brings improvements to ChatGPT’s TalkBack features by reducing the latency in processing, allowing a more natural conversational experience.

Google Gemini: What’s new

Google’s Gemini AI model has been updated with new capabilities. Called Gemini 1.5 Pro, the new and improved version comes with a longer context window and added support for more languages. The improved Gemini AI also powers new features and tools across Google’s services such as Search, Photos, Workspace, Android, and more. For example, the Gemini-powered “Ask Photos” feature for Google Photos lets users search their entire library on Google Photos and follow up the results with even more complex prompts.

The Gemini Assistant for Android devices is set to get new capabilities too. Google said that its AI-powered digital assistant will soon be able to harness multimodality for understanding the video playing on the display and let users ask questions based on the video. It is also getting a new “Live” feature, which allows the Gemini to understand live videos in real-time.

First Published: May 16 2024 | 1:50 PM IST

Source link