maadaa Al Data News: Audio-Visual AI by Alibaba, Tesla’s AI Assistant, Pika’s LipSync, Figure’s Robotics And More

June 24, 2024Updated 7:54 am

maadaa Al Data News (Feb.27～Mar.4)

1. Alibaba’s EMO Framework: Audio Portrait Videos from Images

A group of researchers from Alibaba has developed a new framework called EMO (Emote Portrait Alive) that can generate audio portrait videos with realistic facial expressions and head poses. This framework works by taking an input of a reference image and audio and then using it to create a video. The team has published their findings in arXiv.

The paper’s website: arxiv.org/pdf/2402.17485.pdf

Project homepage: humanaigc.github.io/emote-portrait-alive/

2. Pika Launches Lip Sync Feature: Voice Your Characters

Pika, an AI video generation platform, has recently introduced a new feature called LipSync, which allows users to voice their characters in generated videos while simultaneously syncing the lip movements. This feature is currently only available to Pro users and utilizes the audio generation technology of ElevenLabs, an AI voice cloning startup.

3. Tesla App Launches Beta AI Chat Assistant

Tesla’s mobile app has received a beta version of a chat assistant that can answer questions about Tesla or other products. Electrek reported this.

4. From Text to Music: Adobe Releases AIGC Music Creation Tool

Adobe recently introduced Project Music GenAI Control on February 28th. This tool is a generative AI technology that can be used for creating and modifying customized audio. With this tool, users can generate music from textual prompts and perform detailed audio editing. The inventor of this technology, Nicholas Bryan, compares it to the popular image editing software, Photoshop, but for audio.

5. Figure’s $675M Boost to Revolutionize Humanoid Robots

Sunnyvale-based robotics startup Figure announced that it has raised $675 million in a funding round. The investors include tech giants Nvidia, Microsoft, Amazon, and Amazon founder Jeff Bezos. The company is now valued at $2.6 billion.

In addition to the funding, Figure has also signed a collaboration agreement with OpenAI to develop generative artificial intelligence for its humanoid robots. The AI models that Figure will develop will be based on OpenAI’s latest GPT models, which will be specifically trained on robotics actions data collected by Figure. This will enable the humanoid robots to interact with people, perceive their surroundings, and perform physical tasks, according to Brett Adcock, Figure’s founder and CEO.

6. Altman responds to Musk prosecution: attacks will keep happening

On Friday, OpenAI CEO, Altman, responded to a tweet he had posted in May 2019 on social media platform X. In the tweet, he expressed his disappointment towards people who were against Tesla and urged for support towards trends and innovation. He also mentioned that historically, betting against Elon is a mistake and the best product usually wins. Musk thanked him for this tweet at that time. Recently, Musk released a 46-page lawsuit alleging that OpenAI violated the founding agreement, where he uncovered five “sins” of OpenAI. Just before Altman tweeted, Musk’s allegations came to light. According to a memo by Altman, he admitted that this year will be challenging for the company and he expects the attacks to continue.

Related Open and Commercial Datasets:

maadaa.ai has also found some datasets related to this week’s news. Hope it helps. Stay tuned!

1. AI Photo-Video Editing Open Dataset

This open collection of datasets includes specialized fine segmentation datasets for precise object recognition and manipulation, human body segmentation for advanced body-based manipulation, face segmentation for realistic and personalized face manipulation, and more.

2. Face Parsing Dataset (MD-Image-006)

This dataset features over 100k internet-sourced facial images, meticulously selected to ensure a globally diverse representation. It encompasses a balanced mix of genders, a wide range of ages, and a variety of hairstyles, providing a comprehensive and authentic spectrum for enhanced facial recognition and parsing applications.

3. Lips Segmentation Dataset (MD-Image-037)

The dataset is tailored for the beauty and media & entertainment industries, featuring a collection of internet-collected images with resolutions spanning from 231 x 231 to 987 x 987 pixels. This dataset is dedicated to the semantic segmentation of the lip area, including both the upper and lower lips, to support detailed makeup applications and digital content creation.

Citation:

1. https://venturebeat.com/ai/alibabas-new-ai-system-emo-creates-realistic-talking-and-singing-videos-from-photos/

2. https://petapixel.com/2024/02/28/pika-labs-makes-ai-generated-characters-talk-with-lip-sync-feature/#:~:text=Pika%20built%20the%20new%20Lip,as%20the%20technology%20increasingly%20matures

3. https://electrek.co/2024/02/27/tesla-updates-mobile-app-ai-assistant-highlights-non-car-products/

4. https://techcrunch.com/2024/02/28/adobe-reveals-a-genai-tool-for-music/

5. https://www.reuters.com/technology/robotics-startup-figure-raises-675-mln-microsoft-nvidia-other-big-techs-2024-02-29/?utm_source=opentoolsai-newsletter&utm_medium=newsletter&utm_campaign=ai-as-air-traffic-controller

6. https://www.reuters.com/legal/elon-musk-sues-openai-ceo-sam-altman-breach-contract-2024-03-01/

7.https://maadaa.ai/datasets/OpenDatasetDetail/AI-Photo-Video-Editing-Open-Dataset

8. https://maadaa.ai/datasets/DatasetsDetail/Face-Parsing-Dataset

9. https://maadaa.ai/datasets/DatasetsDetail/Lips-Segmentation-Dataset

Any further information, please contact us.