(maadaa AI News Weekly: May 14~ May 20)

1. Reddit Strikes $60M Deal to Feed the AI Giant

News:

Reddit has partnered with an unnamed major AI company, likely OpenAI or Google, to license its vast repository of user-generated content for training AI language models like ChatGPT. The deal is reportedly worth around $60 million per year. This move allows AI companies to legally access Reddit’s data trove while providing it with a new revenue stream as it prepares for an IPO.

Key Points:

1. Reddit gains access to sell its user data to AI companies for training purposes

2. The $60 million annual deal helps Reddit boost revenue ahead of going public

3. The partnership integrates Reddit data into AI models like ChatGPT for more relevant outputs.

4. The FTC investigates potential privacy violations from selling user content without consent.

Why It Matters?

This agreement is significant as it grants AI companies access to a vast, continually updated dataset of user-generated text from Reddit’s 100,000+ communities, covering nearly every topic. This wealth of diverse, dynamic content can greatly improve the performance of large language models like ChatGPT in handling current events, niche topics, and generating natural responses. The variety of perspectives and informal language on Reddit makes it especially valuable for training.

2. Slack Faces Backlash Over AI Training Policy

News:

Slack, the popular workplace messaging platform, is under fire for its controversial policy of using user data to train AI models without explicit consent. The issue came to light when Slack’s updated privacy policy revealed plans to leverage public data, including user information, for training machine learning and AI systems.

Key Points:

1. Slack’s updated privacy policy allows the company to use collected user data and publicly available information to train AI models.

2. This move aligns with Elon Musk’s ambitions for his AI company, xAI, which plans to utilize data from X (formerly Twitter) to train its models.

3. Musk confirmed the policy change, clarifying that only public data, not private messages, would be used for AI training purposes.

Why It Matters?

The controversy surrounding Slack’s policy highlights the growing importance of data acquisition for training AI models. By tapping into its vast user base and public data sources, Slack aims to enhance its AI training datasets, potentially leading to more accurate and capable AI assistants and tools. This approach aligns with industry trends, as tech giants increasingly leverage user data to fuel their AI ambitions, raising concerns over privacy and consent.

3. From Couch Potatoes to AI Tutors: Sitcoms Teach Machines Sarcasm

News:

Researchers at the University of Groningen have developed an AI model that can detect sarcasm with around 75% accuracy by training it on video clips, audio, and text from popular sitcoms like “Friends” and “The Big Bang Theory”. The model analyzed emotional cues from the actors’ speech, transcripts, and audio to identify sarcastic exchanges.

Key Points:

1. The AI was trained on the Multimodal Sarcasm Detection Dataset (MUStARD) containing annotated sarcastic scenes.

2. Examples include Sheldon’s sarcastic remark to Leonard and Chandler’s unenthusiastic response about assembling furniture.

3. Researchers aim to improve accuracy by incorporating visual cues like facial expressions and using synthetic data.

4. Understanding sarcasm is crucial for seamless human-machine interaction as humans frequently use it.

Why It Matters?

Using popular sitcoms like “Friends” and “The Big Bang Theory” as training data provides a rich and diverse dataset for the AI model to learn from. These shows are replete with sarcastic exchanges and nuanced emotional cues that can help the model better understand the complexities of sarcasm in human communication. By leveraging such a comprehensive and well-annotated dataset, the researchers can train their AI to recognize sarcasm more accurately, paving the way for more natural and engaging interactions between humans and AI assistants. This advancement could significantly enhance the conversational abilities of AI systems, making them more attuned to the subtleties of human language and emotion.

image credit: https://www.imdb.com/title/tt0898266/

Additional News:

Tesla is moving forward with plans to use Chinese data to develop its self-driving system as part of a strategic shift by Elon Musk.
Writers at the Democrat & Chronicle, a Rochester, New York daily newspaper owned by Gannett, are upset after discovering that the company updated their contracts to allow extensive use of AI in “news content.”
Apple just unveiled amazing new accessibility features for iOS 18, including eye tracking, music haptics, and vocal shortcuts.
MIT and the University of Basel researchers have created a machine learning framework using generative AI to map phase diagrams for complex systems.
Meta is developing a new wearable product called “Camerabuds” to integrate cameras and onboard artificial intelligence into headphones.
Sony Music Group sent letters to 700+ tech firms and AI developers, cautioning against using their music for AI training without a license.

Shared Open & Commercial Datasets

Open Dataset 1: DeepFashion2

A comprehensive fashion dataset containing 491K images of 13 popular clothing categories from both commercial and consumer sources. Each item is meticulously labeled with attributes such as scale, occlusion, viewpoint, category, and style, making it a versatile benchmark for clothing image understanding.

https://www.v7labs.com/open-datasets/deepfashion2

Open Dataset 2: Fashionpedia

This dataset includes 48,825 clothing images annotated with segmentation masks and fine-grained attributes, built upon an expertly crafted ontology that encompasses 27 main apparel categories and 19 apparel parts. It’s designed to advance tasks combining both detection and attributes classification in the fashion domain.

https://fashionpedia.github.io/home/

Commercial Dataset 3: Multi-modal Generative AI Large Datasets — Licensed

maadaa.ai’s large dataset is specially developed for state-of-the-art multi-modal large language models, including various structured datasets like image-text pairs, video-text pairs, and e-book in markdown. Following the rules of international copyright authorization, this large dataset ensures the infusion of authenticity and diversity into Generative AI model training, propelling Generative AI models towards unprecedented accuracy and innovation.

https://maadaa.ai/datasets/GenDatasetDetail/Multi-modal-Generative-AI-Large-Datasets---Licensed

Commercial Dataset 4: Cloudy Day Crossroad Dash Cam Video Dataset

The “Cloudy Day Crossroad Dash Cam Video Dataset” captures crossroad navigation under cloudy weather conditions. It features high-resolution recordings and annotates typical urban objects, essential for developing autonomous driving systems for complex urban intersections.

https://maadaa.ai/datasets/DatasetsDetail/Cloudy-Day-Crossroad-Dash-Cam-Video-Dataset

Source:

1. https://www.businessinsider.com/reddit-openai-deal-ai-data-partnership-2024-5

2. https://techcrunch.com/2024/05/17/slack-under-attack-over-sneaky-ai-training-policy/?utm_source=www.neatprompts.com&utm_medium=newsletter&utm_campaign=free-computing-power

3. https://www.theguardian.com/technology/article/2024/may/16/researchers-build-ai-driven-sarcasm-detector

4. https://www.reuters.com/business/autos-transportation/musk-pushes-plan-china-data-power-teslas-ai-ambitions-2024-05-17/?utm_source=opentoolsai-newsletter&utm_medium=newsletter&utm_campaign=biggest-ai-success-story

5. https://futurism.com/the-byte/gannett-ai-news-content?utm_source=opentoolsai-newsletter&utm_medium=newsletter&utm_campaign=biggest-ai-success-story

6. https://www.thehindu.com/sci-tech/technology/internet/apple-s-ios-18-update-to-bring-eye-tracking-vocal-shortcuts-features-to-iphones-and-ipads/article68185595.ece?utm_source=www.aiwithvibes.com&utm_medium=newsletter&utm_campaign=ai-agents-you-re-missing#:~:text=Apple%20announced%20AI%2Dpowered%20accessibility,with%20its%20iOS%2018%20update&text=Photo%20Credit%3A%20AP-,Apple%20will%20launch%20accessibility%20features%20like%20eye%20tracking%2C%20music%20haptics,vocal%20shortcuts%20later%20this%20year

7. https://news.mit.edu/2024/scientists-use-generative-ai-complex-questions-physics-0516?utm_source=www.neatprompts.com&utm_medium=newsletter&utm_campaign=anthropic-s-new-cpo

8. https://www.techtimes.com/articles/304763/20240517/meta-camerabuds-ai-powered-headphones-built-cameras-development-insiders-reveal.htm

9. https://www.musicbusinessworldwide.com/metro-boomin-billie-eilish-more-tell-ai-developers-stop-using-artificial-intelligence-to-infringe-upon-and-devalue-the-rights-of-human-artists/