fashion-datasets-collection-2048x1127.jpeg
Fashion & E-Commerce Datasets Collection
December 9, 2025Updated 10:21 am

Modern fashion and e-commerce AI relies on high-quality, large-scale datasets for model training, multimodal understanding, and RAG-powered applications. At maadaa.ai, we have built one of the industry’s most comprehensive dataset collections, covering clothing classification, keypoints, segmentation, fabric recognition, human body parsing, and millions of e-commerce product images.

Fast Overview

  • Covering classification, segmentation, keypoints, fabric, patterns, human & clothing parsing
  • Millions of curated images across real-world scenarios
  • Designed for e-commerce, virtual try-on, digital humans, content moderation, and recommendation AI
  • Supports custom data collection & annotation for enterprise applications

1. Human-Centric Datasets for Fashion & E-Commerce AI

These datasets provide foundational human body parsing, face parsing, and multi-person segmentation—critical for try-on, apparel overlay, human–clothing interaction, and UGC moderation.

1.1 Portrait & Face Segmentation Datasets

  • Single-person Portrait Matting (50k) – MD-Image-003
  • Eastern Asia Portrait Segmentation (50k) – MD-Image-004
  • Face Parsing (100k) – MD-Image-006
  • Facial Parts Semantic Segmentation (2.79M) – MD-Image-019
  • Hair, Lips, Upper Eyelid Segmentation – MD-Image-035 / 037 / 032

1.2 Human Body Semantic Datasets

  • Human Body Semantic Segmentation (100k) – MD-Image-005
  • Human Body High Precision Segmentation (424.8k) – MD-Image-022
  • Human Body Segmentation (85.7k) – MD-Image-016
  • Contour Segmentation & Keypoints (14.4k) – MD-Image-048
  • High-precision Human Body Segmentation (Video) – MD-Video-005

1.3 Multi-person & Scene-Level Segmentation Datasets

  • Indoor Multi-person Segmentation (7.5k) – MD-Image-009
  • Multiple Scenarios & Persons Segmentation (54k) – MD-Image-018
  • Human & Accessories Segmentation (74.3k) – MD-Image-017

These datasets support:

  • try-on model preprocessing
  • human vs. clothing separation
  • precise masks for VFX or AR
  • content quality control in e-commerce platforms

2. Fashion-Specific Datasets (Core for Clothing AI Models)

This is the most important section for ranking in e-commerce fashion dataset, clothing classification, fabric classification, etc.

2.1 Clothing Classification (2M images)

80+ clothing types, gender tags, scenarios, styles, UGC & e-commerce images.

Ideal for search ranking, recommendation, attribute extraction.

2.2 Clothing Pattern Classification (200k)

30+ pattern categories: stripes, floral, plaid, geometric, etc.

2.3 Clothing Segmentation (500k)

Pixel-level segmentation of clothing, body parts, accessories. Virtual try-on, segmentation-based editing.

2.4 Clothing Fabric Classification + Segmentation (200k)

11 fabric types + 80 clothing categories; essential for material recognition, sustainability analysis, texture modeling.

2.5 Clothing Keypoints (1M)

Keypoints for sleeves, collars, hems, waistlines, necklines, etc. Foundation for shape estimation & 3D try-on.

2.6 Specialty Fashion Datasets

  • Person & Clothes Semantic Segmentation (197k) – MD-Image-026
  • Clothes Segmentation (14.3k) – MD-Image-027
  • Glasses Segmentation (13.9k) – MD-Image-036
  • Scarf Segmentation (2k) – MD-Image-061

These datasets fill niche needs like accessory parsing, scarf modeling, glasses reflection removal, etc.

3. E-Commerce Datasets for Product AI

Beyond fashion, these large-scale datasets support multimodal search, product recognition, OCR, and product attribute extraction.

3.1 E-Commerce Product Dataset (2M images, 200k SKUs)

16 categories: shoes, bags, jewelry, home goods, digital devices, etc.

Enables product matching, search, classification, and deduplication.

3.2 Multilingual OCR Datasets

  • Arabic • Thai • Vietnamese • Hindi • English • Chinese (150k) – MD-OCR-008
  • Japanese • Korean (40k) – MD-OCR-009

3.3 Object & Receipt Related

  • Common Objects Segmentation (140.7k) – MD-Image-024
  • Object Contour Matting (50k) – MD-Image-007
  • Chinese Bill Dataset (6k) – MD-OCR-002

 

Dataset Detail with Sample Images

Single-person Portrait Matting Dataset

1.1 Single-person Portrait Matting Dataset

Data ID: MD-Image-003  |  Volume: 50k

Fine segmentation of hair, ears, fingers, with varied postures and hairstyles.

View Dataset →
Eastern Asia Single-person Portrait Segmentation Dataset

1.2 Eastern Asia Single-person Portrait Segmentation Dataset

Data ID: MD-Image-004  |  Volume: 50k

Pixel-level segmentation across indoor, outdoor, street, and sport scenarios.

View Dataset →
Human Body Semantic Segmentation Dataset

1.3 Human Body Semantic Segmentation Dataset

Data ID: MD-Image-005  |  Volume: 100k

Segmentation of 19 human body areas including face, torso, arms.

View Dataset →
Face Parsing Dataset

1.4 Face Parsing Dataset

Data ID: MD-Image-006  |  Volume: 100k

Segmentation of 18 facial areas including eyes, eyebrows, hair.

View Dataset →
Indoor Multiple Person & Object Segmentation Dataset

1.5 Indoor Multiple Person & Object Segmentation Dataset

Data ID: MD-Image-009  |  Volume: 7.5k

Segmentation of human body, clothing, objects; 5–6 persons per image.

View Dataset →
Indoor Facial 75 Expressions Dataset

1.6 Indoor Facial 75 Expressions Dataset

Data ID: MD-Image-011  |  Volume: 20k

75 expression categories with keypoints for 60 identities.

View Dataset →
Indoor Facial 182 Keypoints Dataset

1.7 Indoor Facial 182 Keypoints Dataset

Data ID: MD-Image-012  |  Volume: 28k

182 facial keypoints annotated for 60 individuals.

View Dataset →
Portrait Segmentation Dataset

1.8 Portrait Segmentation

Data ID: MD-Image-015  |  Volume: 294.5k

Contour, instance, and semantic segmentation of live screenshots.

View Dataset →
Human Body Segmentation Dataset

1.9 Human Body Segmentation

Data ID: MD-Image-016  |  Volume: 85.7k

Semantic segmentation of trunks, arms, backgrounds, and more.

View Dataset →
Multiple Scenarios and Persons Segmentation

1.10 Multiple Scenarios & Persons Semantic Segmentation

Data ID: MD-Image-018  |  Volume: 54k

Segmentation of persons in urban, natural, indoor scenes.

View Dataset →
Facial Parts Semantic Segmentation

1.11 Facial Parts Semantic Segmentation

Data ID: MD-Image-019  |  Volume: 2.79M

Segmentation of face parts including eyes, eyebrows, nose, accessories.

View Dataset →
Human Body High Precision Segmentation

1.12 Human Body High Precision Segmentation

Data ID: MD-Image-022  |  Volume: 424.8k

High-precision segmentation of body, clothing, face, hair, accessories.

View Dataset →
Hair Semantic Segmentation

1.13 Hair Semantic Segmentation

Data ID: MD-Image-035  |  Volume: 32.2k

High-precision contour and semantic segmentation of hair.

View Dataset →
Lips Segmentation Dataset

1.14 Lips Segmentation

Data ID: MD-Image-037  |  Volume: 13.9k

Semantic segmentation of upper and lower lip areas.

View Dataset →
Human Contour Segmentation and Keypoints

1.15 Human Contour Segmentation & Keypoints

Data ID: MD-Image-048  |  Volume: 14.4k

Keypoints for nose, arms, shoulders, elbows, wrists.

View Dataset →
Nails Contour Segmentation

1.16 Nails Contour Segmentation

Data ID: MD-Image-051  |  Volume: 5.9k

Semantic segmentation of fingernail contours.

View Dataset →
Segmentation and Key Points of Human Body

1.17 Segmentation & Key Points of Human Body

Data ID: MD-Image-053  |  Volume: 6.6k

Instance & semantic segmentation with 27 body parts & 24 keypoints.

View Dataset →
Human Portrait Matting (Video)

1.18 Human Portrait Matting

Data ID: MD-Video-005  |  Volume: 1.7k

Instance & semantic segmentation across dancing, shows, movies, TV.

View Dataset →
Upper Eyelid Segmentation Dataset

1.19 Upper Eyelid Segmentation

Data ID: MD-Image-032  |  Volume: 2.4k

Semantic segmentation of upper eyelid areas.

View Dataset →

2. Fashion Datasets

2.1 Clothing Classification Dataset

Data ID: MD-Fashion-1

Volume: About 2M

Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media and offline user-generated content, etc. More than 80 categories of labels, covering gender, clothing types, styles, scenarios, etc with bounding box and classification.

Link: https://maadaa.ai/dataset/clothing-classification-dataset/

2.2 Clothing Pattern Classification Dataset

Data ID: MD-Fashion-2

Volume: About 200k

Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media, offline user-generated content, etc. More than 30 categories of common patterns with bounding box and classification.

Link: https://maadaa.ai/dataset/clothing-pattern-classification-dataset/

2.3 Clothing Segmentation Dataset

Data ID: MD-Fashion-3

Volume: About 500k

Internet-collected images across scenarios such as e-commerce, fashion shows, social media, and offline UGC with semantic segmentation of clothing, human parts, and accessories.

Link: https://maadaa.ai/dataset/clothing-segmentation-dataset/

2.4 Clothing Segmentation and Fabrics Classification

Data ID: MD-Fashion-4

Volume: About 200k

Internet-collected images across typical scenarios including e-commerce, fashion shows, social media, UGC with segmentation and classification. Includes 11 fabric categories and 80 clothing types.

Link: https://maadaa.ai/dataset/fabrics-classification-dataset/

2.5 Clothing Keypoints Dataset

Data ID: MD-Fashion-5

Volume: About 1M

Internet-collected images from scenarios such as e-commerce, fashion shows, social media, and UGC. Includes bounding boxes, keypoints, and labels across 80+ clothing types.

Link: https://maadaa.ai/dataset/clothing-keypoints-dataset/

2.6 Person and Clothes Semantic Segmentation

Data ID: MD-Image-026

Volume: About 197.1k

Instance / semantic segmentation of persons and clothing, including categories like background, outfit, and body parts.

Link: https://maadaa.ai/dataset/person-and-clothes-semantic-segmentation/

2.7 Clothes Segmentation

Data ID: MD-Image-027

Volume: About 14.3k

Semantic segmentation of clothes with 30+ target categories including backgrounds, outfits, human bodies, mobile phones, and accessories.

Link: https://maadaa.ai/dataset/clothes-segmentation/

2.8 Glasses Segmentation

Data ID: MD-Image-036

Volume: About 13.9k

Semantic segmentation of transparent glasses, sunglasses, and translucent glasses.

Link: https://maadaa.ai/dataset/glasses-segmentation/

2.9 Scarf Segmentation

Data ID: MD-Image-061

Volume: About 2000

High-precision contour segmentation of scarf areas.

Link: https://maadaa.ai/dataset/scarf-segmentation/

2.10 Human and Accessories Segmentation

Data ID: MD-Image-017

Volume: About 74.3k

Semantic segmentation of humans and accessories like mobile phones, suitcases, skateboards, and animals such as horses, cattle, dogs, etc.

Link: https://maadaa.ai/dataset/human-and-accessories-segmentation/

3. E-Commerce Datasets

3.1 E-commerce Product Dataset

Data ID: MD-Image-010

Volume: About 2M

Internet-collected e-commerce products, more than 200k SKUs, covering 16 main categories like shoes, hats, bags, furniture, digital, jewelry, etc.

Link: https://maadaa.ai/dataset/e-commerce-product-dataset/

3.2 Arabic & Thai & Vietnamese & Hindi & English & Chinese Language Dataset

Data ID: MD-OCR-008

Volume: About 150k

Over 10 types of characters including product packaging, sign boards, signposts, posters, etc., collected via phone, camera, and tablet across multiple languages.

Link: https://maadaa.ai/dataset/arabic-thai-vietnamese-hindi-english-chinese-language-dataset/

3.3 Common Objects Segmentation Dataset

Data ID: MD-Image-024

Volume: About 140.7k

Instance / semantic segmentation of common scenes and objects in life such as people, animals, furniture, food, and more.

Link: https://maadaa.ai/dataset/common-objects-segmentation/

3.4 Japanese & Korean Language Dataset

Data ID: MD-OCR-009

Volume: About 40k

Collected via phone, camera, and tablet, covering menu images, product instructions, product packaging, and more.

Link: https://maadaa.ai/dataset/japanese-korean-language-dataset/

3.5 Object Contour Matting Dataset

Data ID: MD-Image-007

Volume: About 50k

Semantic segmentation dataset of clothing, accessories, merchandise, and other objects.

Link: https://maadaa.ai/dataset/object-contour-segmentation-dataset/

3.6 Chinese Bill Dataset

Data ID: MD-OCR-002

Volume: About 6k

Includes flight tickets, train tickets, hotel receipts, and commercial invoices widely used in mainland China, covering over 20 labeling categories.

Link: https://maadaa.ai/dataset/bill-dataset/

 

 

Any further information, please contact us.

contact us