Meta Launches Llama 3.2 and Gives Its AI a Voice
- September 27, 2024
- No Comments
Meta Launches Llama 3.2 and Gives Its AI a Voice
Meta is launching the multimodal Llama 3.2, a free model with visual skills, which means Meta’s AI assistants, can now talk and see the world.
Today Mark Zuckerberg made an announcement that Meta, the social media-turned-metaverse-turned-artificial intelligence conglomerate, will be updated, and now its AI assistants have a range of celebrity voices, including those of Dame Judi Dench and John Cena. Not only is this, but another big upgrade for Meta’s the new ability of its models to see users’ photos and other visual information.
Today, Meta also announced that Llama 3.2 is the first version of its open-source AI models with visual capabilities, increasing the models’ applicability and relevance for virtual reality, robotics, and so-called AI agents.
Additionally, several Llama 3.2 versions are the first to be mobile device-oriented. This could make it easier for developers to create AI-powered apps that operate on smartphones and can use the camera or screen to operate apps on your behalf.
Mark Zuckerberg said on stage at Connect, a Meta event held in California, “This is our first open source, multimodal model, and it’s going to enable a lot of interesting applications that require visual understanding.”
A new generation of AI assistants that are more talkative and visually appealing may be introduced to a large number of people with the assistant update, especially considering Meta’s massive reach across Facebook, Instagram, WhatsApp, and Messenger.
According to a statement made by Meta today, more than 180 million individuals use meta-AI.
At Connect, Zuckerberg gave a demonstration of several new AI features. He played clips of a pair of Ray-Ban smart glasses running Llama 3.2 commenting on clothes spotted on a store rack and offering food recommendations based on what ingredients are visible.
Additionally, the CEO of Meta demonstrated a number of the company’s experimental AI features. These consist of software that allows for automatic dubbing of videos into multiple languages, live translation between Spanish and English, and an avatar for producers that may respond to inquiries from fans on their behalf.
Recently, Meta has increased the prominence of its AI in its apps—for instance, by integrating it into the search bar of Messenger and Instagram. Among the new celebrity voice choices accessible to users are Awkwafina and Keegan, Kristen Bell, and Michael Key.
Before this, Meta had given text-based assistants with famous identities, but these characters didn’t catch on. The business introduced AI Studio, a platform that enables customers to design chatbots with any persona they like, in July.
Users in the US, Canada, Australia, and additional New Zealand will be able to access the additional voices throughout the course of the following month, according to Meta. The company has not said when the features may be available in other areas, but the Meta AI imaging capabilities will be pushed out in the US.
Along with offering commentary and information regarding user-submitted images, the new version of Meta AI will also be able to identify the species of any bird you may be unsure of. Additionally, it will be able to assist with image editing by, for example, instantly adding new backdrops or details. In April, Google launched a comparable tool for Google Photos and its Pixel handsets.
An enhanced Llama, Meta’s top large language model, powers Meta AI’s new capabilities. Considering how many developers and startups have already embraced the Lama family, the free model that was unveiled today might also have a significant effect.
Unlike OpenAI’s models, Llama is free to download and use locally, although there are certain limitations on its widespread commercial application. Additionally, with more training, Llama may be more easily adjusted or customized for particular jobs.
According to Patrick Wendell, vice president of engineering and cofounder of Databricks, an organization that hosts Llama and other AI models, many businesses choose open models because they provide them with greater control over their own data security.
Big language models are becoming more and more “multimodal,” or trained to handle input other than text, such as graphics and audio. This increases a model’s functionality and enables programmers to create new types of AI applications on top of it, such as so-called AI agents that can operate computers to perform beneficial tasks. With Llama 3.2, developers should find it simpler to create AI agents that, for example, search the web and, given a brief description, look for discounts on a specific kind of goods.
“Multimodal models are important because the data that people and businesses use is not limited to text; it can also be in a variety of other formats, such as images, audio, or more specialized formats like financial ledgers or protein sequences,” explains MIT professor Phillip Isola. Our language models have improved over the past several years, and we now have models that perform well with voices and visuals as well. More data modalities are becoming available to these systems annually.”
“Meta’s Llama 3.1 demonstrated that open models could ultimately surpass their proprietary equivalents,” argues Nathan Benaich, general partner and founder of Air Street Capital, as well as the author of a well-known yearly study on artificial intelligence. Multimodal models typically outperform bigger text-only models, says Benaich. “I’m eager to see how 3.2 develops,” he remarks.
Molmo is an advanced multimodal model that is open source and was released earlier today by the Seattle-based research institute, the Allen Institute for AI (AI2). Compared to Llama, Molmo was made available with a less restricted license, and AI2 is also making available the specifics of its training data, which will enable academics and developers to play with and tweak the model.
Today, Meta announced that Llama 3.2 will be available in multiple sizes, each with comparable features. Meta is providing less capable 1 billion and 3 billion parameter versions that are intended to function well on portable devices, in addition to two more potent instantiations with 11 billion and 90 billion parameters—a measure of a model’s complexity as well as its size. These versions, according to Meta, are designed for MediaTek and Qualcomm ARM-based mobile CPUs.
The AI revolution at Meta occurs at a competitive moment as IT companies strive to provide the most sophisticated AI. It may have an advantage in supplying the basis for a lot of AI products and services because of the company’s decision to make its most valuable models available for free, especially as businesses start looking into the possibilities of AI agents.