(maadaa AI News Weekly: June 18 ~ June 24)
1. Claude 3.5 Sonnet: Anthropic’s AI Powerhouse Redefines Intelligence and Collaboration
News:
Anthropic has launched Claude 3.5 Sonnet, a state-of-the-art AI model that excels in reasoning, coding, and vision tasks. It introduces the Artifacts feature for enhanced collaboration and strongly focuses on safety and privacy. The model is free on Claude.ai and the Claude iOS app, with additional access through various platforms.
Key Points:
- Claude 3.5 Sonnet outperforms leading models in benchmarks for reasoning, knowledge, and coding.
- Operates at twice the speed of its predecessor, Claude 3 Opus.
- Introduces Artifacts for real-time collaboration on AI-generated content.
- Emphasizes safety and privacy, with rigorous testing and no training on user data without permission.
- Available for free on Claude.ai and the Claude iOS app, with higher usage limits for subscribers.
Why It Matters?
Claude 3.5 Sonnet’s enhanced capabilities and new features like Artifacts significantly improve the efficiency and effectiveness of training datasets. By excelling in complex tasks and providing a collaborative environment, it allows for more accurate and nuanced data generation and analysis, which is crucial for developing advanced AI applications.
2. Elon Musk’s xAI Collaborates With Dell and Super Micro To Create A Supercomputer For AI Development
News:
Elon Musk’s AI startup xAI is building a supercomputer for its chatbot Grok, with Dell Technologies and Super Micro Computer providing server racks. Dell is assembling half of the racks, while Super Micro is supplying the other half
Key Points:
- xAI aims to have the supercomputer operational by fall 2025
- The Grok 2 model required 20,000 Nvidia H100 GPUs for training
- Grok 3 and future versions will need 100,000 Nvidia H100 chips
- Dell is collaborating with Nvidia on an “AI factory” for xAI
Why It Matters?
The news highlights the need for vast computational power in advancing AI through the creation of a supercomputer for xAI, reflecting the rising demand for high-performance computing in AI R&D. This push for better hardware aims to produce larger, more sophisticated training datasets, paving the way for more advanced AI systems.
3. AI vs. Publishers: Forbes Takes on Perplexity in High-Stakes Copyright Showdown
News:
Forbes has accused AI search startup Perplexity of copyright infringement, sending a letter demanding the removal of infringing content and compensation for ad revenues. This incident highlights the growing tension between publishers and AI companies over intellectual property rights in the digital age.
Key Points:
- Forbes alleges Perplexity’s AI chatbot plagiarized their reporting without proper attribution
- Perplexity CEO acknowledged the issue as a new feature with “rough edges”
- Forbes demands removal of infringing content, reimbursement, and assurances against future infringement
- Perplexity has raised $165 million in funding and is valued at over $1 billion
Why It Matters?
This news highlights the difficulties in creating high-quality AI training datasets while complying with copyright laws. AI companies must navigate complex intellectual property rights to improve their models with accurate information. This case may set precedents for ethically and legally incorporating published content into AI training data, potentially leading to more transparent attribution practices and licensing agreements.
4. Additional News
1. Stability AI has a new CEO, Prem Akkaraju, following Emad Mostaque’s departure. Sean Parker and other investors are providing a cash infusion.
2. Ilya Sutskever, co-founder and former chief scientist of OpenAI, has launched Safe Superintelligence Inc. to develop a superintelligent AI system prioritizing safety.
3. Robotics funding has decreased since its peak in 2021–2022, but the issues exposed by the pandemic persist. The main driver for venture funding in robotics is the ongoing labor shortage.
4. Universal Music Group is teaming up with AI startup SoundLabs to utilize their voice cloning technology for creative applications involving their artists’ voices.
5. Apple announced it will add AI training to its Developer Academy program starting this fall.
6. Microsoft unveiled Florence-2, an AI model capable of interpreting and responding to text prompts across various tasks, setting a new standard with extensive training on the FLD-5B dataset of 5.4 billion visual annotations.
maadaa.ai Shared Open & Commercial Datasets
Open Dataset 1: Stanford Cars Dataset
Description: This dataset contains images of 196 car models, annotated with bounding boxes and model labels. It’s excellent for training and testing algorithms for object recognition within automotive industries.
URL: https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset
Open Dataset 2: GDXray (Welds Dataset)
Description: Part of the GDXray dataset, the Welds collection contains images of welds used in industrial settings, which are useful for defect detection and quality control applications via computer vision.
URL: https://domingomery.ing.puc.cl/material/gdxray/
Commercial Dataset 1: Large-Scale Professional Domain Corpus Dataset — Chinese
Key Features:
- 120M Electronic Documents
- 2PB fine structured data
- Most of popular e-book formats
- Hundred of professional domains
- Comprehensive Format Support: most of the popular e-book formats such as PDF, EPUB, mobi, azw (3), and DjVu.
- Advanced OCR engine for Formulas: Equations and multiline formulas in PDFs are easily transformed into Latex text.
-Precise Layout Reproduction: Ensures the original formatting of PDFs is preserved, including text arrangement, headings, and diagrams.
https://maadaa.ai/datasets/GenDatasetDetail/Large-Scale-Professional-Domain-Corpus-Dataset---Chinese
Commercial Dataset 2: Multi-modal Generative AI Large Datasets — Licensed
maadaa.ai’s large dataset is specially developed for state-of-the-art multi-modal large language models, including various structured datasets like image-text pairs, video-text pairs, and e-book in markdown. Following the rules of international copyright authorization, this large dataset ensures the infusion of authenticity and diversity into Generative AI model training, propelling Generative AI models towards unprecedented accuracy and innovation.
https://maadaa.ai/datasets/GenDatasetDetail/Multi-modal-Generative-AI-Large-Datasets---Licensed