Modern fashion and e-commerce AI relies on high-quality, large-scale datasets for model training, multimodal understanding, and RAG-powered applications. At maadaa.ai, we have built one of the industry’s most comprehensive dataset collections, covering clothing classification, keypoints, segmentation, fabric recognition, human body parsing, and millions of e-commerce product images.
Fast Overview
- Covering classification, segmentation, keypoints, fabric, patterns, human & clothing parsing
- Millions of curated images across real-world scenarios
- Designed for e-commerce, virtual try-on, digital humans, content moderation, and recommendation AI
- Supports custom data collection & annotation for enterprise applications
1. Human-Centric Datasets for Fashion & E-Commerce AI
These datasets provide foundational human body parsing, face parsing, and multi-person segmentation—critical for try-on, apparel overlay, human–clothing interaction, and UGC moderation.
1.1 Portrait & Face Segmentation Datasets
- Single-person Portrait Matting (50k) – MD-Image-003
- Eastern Asia Portrait Segmentation (50k) – MD-Image-004
- Face Parsing (100k) – MD-Image-006
- Facial Parts Semantic Segmentation (2.79M) – MD-Image-019
- Hair, Lips, Upper Eyelid Segmentation – MD-Image-035 / 037 / 032
1.2 Human Body Semantic Datasets
- Human Body Semantic Segmentation (100k) – MD-Image-005
- Human Body High Precision Segmentation (424.8k) – MD-Image-022
- Human Body Segmentation (85.7k) – MD-Image-016
- Contour Segmentation & Keypoints (14.4k) – MD-Image-048
- High-precision Human Body Segmentation (Video) – MD-Video-005
1.3 Multi-person & Scene-Level Segmentation Datasets
- Indoor Multi-person Segmentation (7.5k) – MD-Image-009
- Multiple Scenarios & Persons Segmentation (54k) – MD-Image-018
- Human & Accessories Segmentation (74.3k) – MD-Image-017
These datasets support:
- try-on model preprocessing
- human vs. clothing separation
- precise masks for VFX or AR
- content quality control in e-commerce platforms
2. Fashion-Specific Datasets (Core for Clothing AI Models)
This is the most important section for ranking in e-commerce fashion dataset, clothing classification, fabric classification, etc.
2.1 Clothing Classification (2M images)
80+ clothing types, gender tags, scenarios, styles, UGC & e-commerce images.
Ideal for search ranking, recommendation, attribute extraction.
2.2 Clothing Pattern Classification (200k)
30+ pattern categories: stripes, floral, plaid, geometric, etc.
2.3 Clothing Segmentation (500k)
Pixel-level segmentation of clothing, body parts, accessories. Virtual try-on, segmentation-based editing.
2.4 Clothing Fabric Classification + Segmentation (200k)
11 fabric types + 80 clothing categories; essential for material recognition, sustainability analysis, texture modeling.
2.5 Clothing Keypoints (1M)
Keypoints for sleeves, collars, hems, waistlines, necklines, etc. Foundation for shape estimation & 3D try-on.
2.6 Specialty Fashion Datasets
- Person & Clothes Semantic Segmentation (197k) – MD-Image-026
- Clothes Segmentation (14.3k) – MD-Image-027
- Glasses Segmentation (13.9k) – MD-Image-036
- Scarf Segmentation (2k) – MD-Image-061
These datasets fill niche needs like accessory parsing, scarf modeling, glasses reflection removal, etc.
3. E-Commerce Datasets for Product AI
Beyond fashion, these large-scale datasets support multimodal search, product recognition, OCR, and product attribute extraction.
3.1 E-Commerce Product Dataset (2M images, 200k SKUs)
16 categories: shoes, bags, jewelry, home goods, digital devices, etc.
Enables product matching, search, classification, and deduplication.
3.2 Multilingual OCR Datasets
- Arabic • Thai • Vietnamese • Hindi • English • Chinese (150k) – MD-OCR-008
- Japanese • Korean (40k) – MD-OCR-009
3.3 Object & Receipt Related
- Common Objects Segmentation (140.7k) – MD-Image-024
- Object Contour Matting (50k) – MD-Image-007
- Chinese Bill Dataset (6k) – MD-OCR-002
Dataset Detail with Sample Images
1.1 Single-person Portrait Matting Dataset
Data ID: MD-Image-003 | Volume: 50k
Fine segmentation of hair, ears, fingers, with varied postures and hairstyles.
View Dataset →
1.2 Eastern Asia Single-person Portrait Segmentation Dataset
Data ID: MD-Image-004 | Volume: 50k
Pixel-level segmentation across indoor, outdoor, street, and sport scenarios.
View Dataset →
1.3 Human Body Semantic Segmentation Dataset
Data ID: MD-Image-005 | Volume: 100k
Segmentation of 19 human body areas including face, torso, arms.
View Dataset →
1.4 Face Parsing Dataset
Data ID: MD-Image-006 | Volume: 100k
Segmentation of 18 facial areas including eyes, eyebrows, hair.
View Dataset →
1.5 Indoor Multiple Person & Object Segmentation Dataset
Data ID: MD-Image-009 | Volume: 7.5k
Segmentation of human body, clothing, objects; 5–6 persons per image.
View Dataset →
1.6 Indoor Facial 75 Expressions Dataset
Data ID: MD-Image-011 | Volume: 20k
75 expression categories with keypoints for 60 identities.
View Dataset →
1.7 Indoor Facial 182 Keypoints Dataset
Data ID: MD-Image-012 | Volume: 28k
182 facial keypoints annotated for 60 individuals.
View Dataset →
1.8 Portrait Segmentation
Data ID: MD-Image-015 | Volume: 294.5k
Contour, instance, and semantic segmentation of live screenshots.
View Dataset →
1.9 Human Body Segmentation
Data ID: MD-Image-016 | Volume: 85.7k
Semantic segmentation of trunks, arms, backgrounds, and more.
View Dataset →
1.10 Multiple Scenarios & Persons Semantic Segmentation
Data ID: MD-Image-018 | Volume: 54k
Segmentation of persons in urban, natural, indoor scenes.
View Dataset →
1.11 Facial Parts Semantic Segmentation
Data ID: MD-Image-019 | Volume: 2.79M
Segmentation of face parts including eyes, eyebrows, nose, accessories.
View Dataset →
1.12 Human Body High Precision Segmentation
Data ID: MD-Image-022 | Volume: 424.8k
High-precision segmentation of body, clothing, face, hair, accessories.
View Dataset →
1.13 Hair Semantic Segmentation
Data ID: MD-Image-035 | Volume: 32.2k
High-precision contour and semantic segmentation of hair.
View Dataset →
1.14 Lips Segmentation
Data ID: MD-Image-037 | Volume: 13.9k
Semantic segmentation of upper and lower lip areas.
View Dataset →
1.15 Human Contour Segmentation & Keypoints
Data ID: MD-Image-048 | Volume: 14.4k
Keypoints for nose, arms, shoulders, elbows, wrists.
View Dataset →
1.16 Nails Contour Segmentation
Data ID: MD-Image-051 | Volume: 5.9k
Semantic segmentation of fingernail contours.
View Dataset →
1.17 Segmentation & Key Points of Human Body
Data ID: MD-Image-053 | Volume: 6.6k
Instance & semantic segmentation with 27 body parts & 24 keypoints.
View Dataset →
1.18 Human Portrait Matting
Data ID: MD-Video-005 | Volume: 1.7k
Instance & semantic segmentation across dancing, shows, movies, TV.
View Dataset →
1.19 Upper Eyelid Segmentation
Data ID: MD-Image-032 | Volume: 2.4k
Semantic segmentation of upper eyelid areas.
View Dataset →2. Fashion Datasets
2.1 Clothing Classification Dataset
Data ID: MD-Fashion-1
Volume: About 2M
Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media and offline user-generated content, etc. More than 80 categories of labels, covering gender, clothing types, styles, scenarios, etc with bounding box and classification.
Link: https://maadaa.ai/dataset/clothing-classification-dataset/

2.2 Clothing Pattern Classification Dataset
Data ID: MD-Fashion-2
Volume: About 200k
Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media, offline user-generated content, etc. More than 30 categories of common patterns with bounding box and classification.
Link: https://maadaa.ai/dataset/clothing-pattern-classification-dataset/

2.3 Clothing Segmentation Dataset
Data ID: MD-Fashion-3
Volume: About 500k
Internet-collected images across scenarios such as e-commerce, fashion shows, social media, and offline UGC with semantic segmentation of clothing, human parts, and accessories.
Link: https://maadaa.ai/dataset/clothing-segmentation-dataset/

2.4 Clothing Segmentation and Fabrics Classification
Data ID: MD-Fashion-4
Volume: About 200k
Internet-collected images across typical scenarios including e-commerce, fashion shows, social media, UGC with segmentation and classification. Includes 11 fabric categories and 80 clothing types.
Link: https://maadaa.ai/dataset/fabrics-classification-dataset/

2.5 Clothing Keypoints Dataset
Data ID: MD-Fashion-5
Volume: About 1M
Internet-collected images from scenarios such as e-commerce, fashion shows, social media, and UGC. Includes bounding boxes, keypoints, and labels across 80+ clothing types.
Link: https://maadaa.ai/dataset/clothing-keypoints-dataset/

2.6 Person and Clothes Semantic Segmentation
Data ID: MD-Image-026
Volume: About 197.1k
Instance / semantic segmentation of persons and clothing, including categories like background, outfit, and body parts.
Link: https://maadaa.ai/dataset/person-and-clothes-semantic-segmentation/

2.7 Clothes Segmentation
Data ID: MD-Image-027
Volume: About 14.3k
Semantic segmentation of clothes with 30+ target categories including backgrounds, outfits, human bodies, mobile phones, and accessories.
Link: https://maadaa.ai/dataset/clothes-segmentation/

2.8 Glasses Segmentation
Data ID: MD-Image-036
Volume: About 13.9k
Semantic segmentation of transparent glasses, sunglasses, and translucent glasses.
Link: https://maadaa.ai/dataset/glasses-segmentation/

2.9 Scarf Segmentation
Data ID: MD-Image-061
Volume: About 2000
High-precision contour segmentation of scarf areas.
Link: https://maadaa.ai/dataset/scarf-segmentation/

2.10 Human and Accessories Segmentation
Data ID: MD-Image-017
Volume: About 74.3k
Semantic segmentation of humans and accessories like mobile phones, suitcases, skateboards, and animals such as horses, cattle, dogs, etc.
Link: https://maadaa.ai/dataset/human-and-accessories-segmentation/

3. E-Commerce Datasets
3.1 E-commerce Product Dataset
Data ID: MD-Image-010
Volume: About 2M
Internet-collected e-commerce products, more than 200k SKUs, covering 16 main categories like shoes, hats, bags, furniture, digital, jewelry, etc.
Link: https://maadaa.ai/dataset/e-commerce-product-dataset/

3.2 Arabic & Thai & Vietnamese & Hindi & English & Chinese Language Dataset
Data ID: MD-OCR-008
Volume: About 150k
Over 10 types of characters including product packaging, sign boards, signposts, posters, etc., collected via phone, camera, and tablet across multiple languages.
Link: https://maadaa.ai/dataset/arabic-thai-vietnamese-hindi-english-chinese-language-dataset/

3.3 Common Objects Segmentation Dataset
Data ID: MD-Image-024
Volume: About 140.7k
Instance / semantic segmentation of common scenes and objects in life such as people, animals, furniture, food, and more.
Link: https://maadaa.ai/dataset/common-objects-segmentation/

3.4 Japanese & Korean Language Dataset
Data ID: MD-OCR-009
Volume: About 40k
Collected via phone, camera, and tablet, covering menu images, product instructions, product packaging, and more.
Link: https://maadaa.ai/dataset/japanese-korean-language-dataset/

3.5 Object Contour Matting Dataset
Data ID: MD-Image-007
Volume: About 50k
Semantic segmentation dataset of clothing, accessories, merchandise, and other objects.
Link: https://maadaa.ai/dataset/object-contour-segmentation-dataset/

3.6 Chinese Bill Dataset
Data ID: MD-OCR-002
Volume: About 6k
Includes flight tickets, train tickets, hotel receipts, and commercial invoices widely used in mainland China, covering over 20 labeling categories.
Link: https://maadaa.ai/dataset/bill-dataset/

Further Reading:
AI in Fashion & E-Commerce Industry: Application Scenarios and Technologies
AI Datasets for Fashion & E-Commerce: Open vs. Commercial and Trends
AI for Virtual Fitting: Inspired by Open & Commercial Datasets
AI for Fake Detection: Related Open & Commercial Datasets
AI-powered Personalization: Open & Commercial Datasets
Face Parsing: Use Cases and Open Datasets



