Keywords: clothing classification, clothing pattern classification, clothing fabric classification, clothing key point detection, clothing & human body semantic, segmentation, scarf fabric segmentation, E-Commerce datasets
Based on our comprehensive accumulations of Fashion and E-Commerce technologies and application scenarios, maadaa.ai has been developing a series of standard datasets, which can help industrial and academic customers accelerate AI innovations in Fashion and E-Commerce.
These datasets include tasks such as clothing classification, clothing pattern classification, clothing fabric classification, clothing key point detection, clothing, and human body semantic segmentation, scarf fabric segmentation, and E-Commerce relevant OCR datasets, etc.
1. Human-related Datasets
1.1 Single-person Portrait Matting DatasetData ID: MD-Image-003
Volume: About 50k
Fine segmentation of personal portrait areas, including hair, ears, fingers, etc. Images are with variable postures and hairstyles.
Link: https://maadaa.ai/dataset/single-person-portrait-matting-dataset/
1.2 Eastern Asia Single-person Portrait Segmentation Dataset
Data ID: MD-Image-004
Volume: About 50k
Pixel-level fine segmentation of Eastern Asia person portrait images with variable postures in variable scenarios like indoor, outdoor, street, sport, etc.
Link: https://maadaa.ai/dataset/eastern-asia-single-person-portrait-segmentation-dataset/
1.3 Human Body Semantic Segmentation Dataset
Data ID: MD-Image-005
Volume: About 100k
Semantic segmentation of nineteen human body areas including face, left/right arm, etc.
Link: https://maadaa.ai/dataset/human-body-semantic-segmentation-dataset/
1.4 Face Parsing Dataset
Data ID: MD-Image-006
Volume: About 100k
Semantic segmentation of eighteen facial areas including hair, left/right eyebrow, etc.
Link: https://maadaa.ai/dataset/face-parsing-dataset/
1.5 Indoor Multiple Person & Object Segmentation Dataset
Data ID: MD-Image-009
Volume: 7500 images
Semantic segmentation of human body areas, clothing, accessory and indoor object. There are 5~6 individuals on each drama image, covering Asian, American and English.
Link: https://maadaa.ai/dataset/indoor-multiple-person-object-segmentation-dataset/
1.6 Indoor Facial 75 Expressions Dataset
Data ID: MD-Image-011
Volume: About 20k
75 facial expression category tags with keypoints for 60 people in indoor scenarios.
Link: https://maadaa.ai/dataset/facial-75-expressions-dataset/
1.7 Indoor Facial 182 Keypoints Dataset
Data ID: MD-Image-012
Volume: 28000 images
182 facial expression category tags with keypoints for 60 people in indoor scenarios.
Link: https://maadaa.ai/dataset/facial-182-keypoints-dataset/
1.8 Portrait Segmentation
Data ID: MD-Image-015
Volume: About 294.5k
Contour/instance/semantic segmentation of live screenshots, covering single/multiple people and accessories.
Link: https://maadaa.ai/dataset/portrait-segmentation/
1.9 Human Body Segmentation
Data ID: MD-Image-016
Volume: About 85.7k
Semantic segmentation of human bodies including trunks, arms, backgrounds, etc.
Link: https://maadaa.ai/dataset/human-body-segmentation-2/
1.10 Indoor Multiple Scenarios and Persons Semantic Segmentation
Data ID: MD-Image-018
Volume: About 54k
Semantic segmentation of images with multiple persons in urban, natural and indoor scenes. The human body parts include the head, trunk, accessories and backgrounds.
Link: https://maadaa.ai/dataset/multiple-scenarios-and-persons-semantic-segmentation/
1.11 Facial Parts Semantic Segmentation
Data ID: MD-Image-019
Volume: About 2,791.7k
Bounding box and semantic segmentation of face area. The facial part categories include skin, left/right eyes, left/right eyebrow nose, accessories, etc.
Link: https://maadaa.ai/dataset/facial-parts-semantic-segmentation/
1.12 Human Body High Precision Segmentation
Data ID: MD-Image-022
Volume: About 424.8k
High-precision segmentation of the human body, clothing, face, skin, caps, hair, accessories, backgrounds, etc.
Link: https://maadaa.ai/dataset/human-body-high-precision-segmentation/
1.13 Hair Semantic Segmentation
Data ID: MD-Image-035
Volume: About 32.2k
High-precision contour segmentation and semantic segmentation of hair.
Link: https://maadaa.ai/dataset/hair-semantic-segmentation/
1.14 Lips Segmentation
Data ID: MD-Image-037
Volume: About 13.9k
Semantic segmentation of lip areas including upper and lower lip areas.
Link: https://maadaa.ai/dataset/lips-segmentation/
1.15 Human Contour Segmentation and Keypoints
Data ID: MD-Image-048
Volume: About 14.4k
Key points of human body include nose, arms, shoulders, elbows, wrists, etc.
Link: https://maadaa.ai/dataset/human-contour-segmentation-and-keypoints/
1.16 Nails Contour Segmentation
Data ID: MD-Image-051
Volume: About 5.9k
Semantic segmentation of the contour of fingernails.
Link: https://maadaa.ai/dataset/nails-contour-segmentation/
1.17 Segmentation and Key Points of Human Body
Data ID: MD-Image-053
Volume: About 6.6k
Instance/ semantic segmentation of human body with 27 categories of body parts and 24 key points.
Link: https://maadaa.ai/dataset/segmentation-and-key-points-of-human-body/
1.18 Human Portrait Matting
Data ID: MD-Video-005
Volume: About 1.7k
Instance/semantic segmentation of humans in diversified scenes such as dancing, talent shows, movies, and TV stories. Total 19 categories are labeled, including background, face, hair, top, etc.
Link: https://maadaa.ai/dataset/high-precision-human-body-segmentation/
1.19 Upper Eyelid Segmentation
Data ID: MD-Image-032
Volume: About 2.4k
Semantic segmentation of upper eyelid.
Link: https://maadaa.ai/dataset/upper-eyelid-segmentation/
2. Fashion Datasets
2.1 Clothing Classification Dataset
Data ID: MD-Fashion-1
Volume: About 2M
Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media and offline user-generated content, etc. More than 80 categories of labels, covering gender, clothing types, styles, scenarios, etc with bounding box and classification.
Link: https://maadaa.ai/dataset/clothing-classification-dataset/
2.2 Clothing Pattern Classification Dataset
Data ID: MD-Fashion-2
Volume: About 200k
Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media, offline user-generated content, etc. More than 30 categories of common patterns with bounding box and classification.
Link: https://maadaa.ai/dataset/clothing-pattern-classification-dataset/
2.3 Clothing Segmentation Dataset
Data ID: MD-Fashion-3
Volume: About 500k
Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media, offline user-generated content, etc with semantic segmentation, including the main human parts, clothing, accessories, etc.
Link: https://maadaa.ai/dataset/clothing-segmentation-dataset/
2.4 Clothing Segmentation and Fabrics Classification
Data ID: MD-Fashion-4
Volume: About 200k
Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media, offline user-generated content, etc with classification and segmentation, including 11 common fabric categories and 80 clothing types.
Link: https://maadaa.ai/dataset/fabrics-classification-dataset/
2.5 Clothing Keypoints Dataset
Data ID: MD-Fashion-5
Volume: About 1M
Internet-collected images cover typical scenarios such as e-commerce, fashion shows, social media, offline user-generated content, etc. Classified label, bounding box, and key points with 80 clothing types.
Link: https://maadaa.ai/dataset/clothing-keypoints-dataset/
2.6 Person and Clothes Semantic Segmentation
Data ID: MD-Image-026
Volume: About 197.1k
Instance/semantic segmentation of person and clothes. including categories of background, outfit and body parts.
Link: https://maadaa.ai/dataset/person-and-clothes-semantic-segmentation/
2.7 Clothes Segmentation
Data ID: MD-Image-027
Volume: About 14.3k
Semantic segmentation of clothes with 30 target categories like backgrounds, outfits, human bodies, mobile phones and other large accessories.
Link: https://maadaa.ai/dataset/clothes-segmentation/
2.8 Glasses Segmentation
Data ID: MD-Image-036
Volume: About 13.9k
Semantic segmentation of glasses including pure transparent glasses, sunglasses and translucent glasses.
Link: https://maadaa.ai/dataset/glasses-segmentation/
2.9 Scarf Segmentation
Data ID: MD-Image-061
Volume: About 2000
High-precision segmentation of scarf areas with contour segmentation.
Link: https://maadaa.ai/dataset/scarf-segmentation/
2.10 Human and Accessories Segmentation
Data ID: MD-Image-017
Volume: About 74.3k
Semantic segmentation of humans and accessories like mobile phones, suitcases, skateboards, etc. And animals include horses, cattle, dogs, etc.
Link: https://maadaa.ai/dataset/human-and-accessories-segmentation/
3. E-Commerce Datasets
3.1 E-commerce Product Dataset
Data ID: MD-Image-010
Volume: About 2M
Internet-collected e-commerce products, more than 200k SKUs, covering 16 main categories like shoes, hats, bags, furniture, digital, jewelry, etc.
Link: https://maadaa.ai/dataset/e-commerce-product-dataset/
3.2 Arabic & Thai & Vietnamese & Hindi & English & Chinese Language Dataset
Data ID: MD-OCR-008
Volume: About 150k
Over 10 types of characters including product packaging, sign boards, signposts, poster, etc. The dataset is collected via phone, camera, and tablet, covering Arabic, Thai, Vietnamese, Hindi, English, and Chinese.
Link: https://maadaa.ai/dataset/arabic-thai-vietnamese-hindi-english-chinese-language-dataset/
3.3 Common Objects Segmentation Dataset
Data ID: MD-Image-024
Volume: About 140.7k
Instance / semantic segmentation of common scenes and objects in life such as people, animals, furniture, food, etc.
Link: https://maadaa.ai/dataset/common-objects-segmentation/
3.4 Japanese & Korean Language Dataset
Data ID: MD-OCR-009
Volume: About 40k
The dataset is collected via phone, camera and tablet, including menu, product instruction, product packages, etc.
Link: https://maadaa.ai/dataset/japanese-korean-language-dataset/
3.5 Object Contour Matting Dataset
Data ID: MD-Image-007
Volume: About 50k
Semantic segmentation of internet-collected images with objects like clothing, accessories, merchandise, etc.
Link: https://maadaa.ai/dataset/object-contour-segmentation-dataset/
3.6 Chinese Bill Dataset
Data ID: MD-OCR-002
Volume: About 6k
This dataset covers over 10 types of commercial receipts and invoices used in mainland China, including flight tickets, train tickets, hotel receipts, etc. More than 20 labeling categories like types, provinces, quality, etc.
Link: https://maadaa.ai/dataset/bill-dataset/
Further reading:
AI in fashion & E-Commerce Industry: application scenarios and technologies
AI Datasets for Fashion & E-Commerce: Open vs. Commercial and the Trends
AI for virtual fitting: inspired by datasets (Open & Commercial)
AI for fake Detection in Fashion and E-commerce industries: The related open & commercial datasets
AI-powered personalization of E-commerce and Fashion: open and commercial datasets
Face Parsing: use cases and open datasets