maadaa ai

Generative AI Datasets

Multi-modal Generative AI Large Datasets - Licensed

The multi-modal large language models (MLLMs), known for their ability to understand and generate content across various data types, have garnered widespread interest from both the research community and the tech industry. maadaa.ai’s large dataset is specially developed for state-of-the-art multi-modal large language models, including various structured datasets like image-text pairs, video-text pairs and e-book in markdown. Follow the rules of international copyright authorization, this large dataset ensures the infusion of authenticity and diversity into Generative AI models training, propelling Generative AI models towards unprecedented accuracy and innovation.

Product Highlights

Over 300 Million Image-Text pairs：covers an extensive range of high-resolution professional shooting images including humans, animals, scenes, photography and vector images.
More than 6 Million Video-Text pairs：provides rich text descriptions of characters, scenes, relationships, actions, etc.
More than 2 million e-books and 15,000 journals：enriching the dataset with literary and academic depth.
Genuine Media Reporting Data：Incorporating text data from major domestic media outlets ensures the inclusion of current and relevant content.

Product Statistics

Image-Text Pairs Statistics:

Video-Text Pairs Statistics:

Any further information, please contact us.