#OpenDatasets #fashion #Ecommerce #AI #Trend
1. Open Datasets
Most of the deep learning models adopt the data-driven way to conduct supervised optimization training, and providing diverse and accurate datasets for the models is the cornerstone of the subject research. Typical clothing datasets : Deepfashion, Deepfashion2, WAB, Chictopia10K, Deep Fashion3D.
DeepFashion2 is an open source garment related large-scale dataset from the Chinese University of Hong Kong. The dataset contains approximately 80W different images, with detailed labeling information for clothing items.
DeepFashion consists of four subsets. The main task of Category and Attribute Prediction Benchmark is to classify, including 28922 images, each image has Category annotation, Attribute annotation, Bbox border and Landmarks. In-shop Clothes Retrieval Benchmark provides a total of 52,712 pose images. Each ID has a variety of pose images corresponding to it. The main tasks are image Retrieval and Content Retrieval. The folder corresponding to each product ID contains a seller show and several buyers show, a total of 239,557 pictures. The main task is also for image Retrieval and Content Retrieval. The task of the Fashion Landmark Detection Benchmark is segmentation Detection, including 123,016 images, each of which has Landmarks and Bbox markers, as well as category labels including upper-body Clothes, Lower-body Clothes full-body Clothes
Deepfasion2 is further optimization of Deepfasion. Deepfasion had at least one item of clothing per image and at most seven items of clothing per image, compared with just one category tag per image. Each garment is manually marked Bbox, Landmark. The dataset contains 80.1w images of 13 popular clothing categories from commercial venues and the Internet.
The WAB dataset was collected by Alibaba from taobao daily clothing and live streaming. There were 1,042,178 labeled images, 1,654,780 labeled detection frame instances, and 70,000 transcribed labeled video texts. The data annotated 23 clothing detection categories and detection frame positions, which can be used for object detection algorithm research. Box-level instance numbers are annotated in the data, and about 80,000 groups of commodity sequences of the same type are constructed, which can be used for object retrieval and recognition algorithm research. In addition, the dataset provides fragment corresponding text and commodity title description text, which can be used for the study of visual text multimodal retrieval algorithm.
Chictopia dataset contains 17,706 images collected from Chictopia Fashion. The dataset has 18 categories, including 12 clothing categories, backgrounds and 5 human categories.
1.5 Deep Fashion3D
This dataset is the largest 3D scanning dataset available. Deep Fashion3D contains 2000 scanning 3D models, and each model contains rich annotations, such as ground-truth point cloud annotations, multi-angle real pictures, and 3D body pose annotations, including ten clothing categories.
2. Standard Fashion Datasets from maadaa.ai
Based on the accumulation of the Fashion technology and application field by maadaa.ai, we summarized the typical application scenarios of Fashion, designed and completed seven datasets, with a total of about 360W , about 240W classification scenarios ,and about 340W detection segmentation. It includes fashion-related tasks such as clothing classification, clothing pattern classification, clothing fabric classification, clothing key point detection, clothing and human body semantic segmentation, and scarf fabric segmentation.
MD-fashion-1 is the dataset of clothing classification. There are about 200W images, data collected from e-commerce, fashion shows, social media and other scenes. The annotation method of the dataset adopts image category annotation and BBox annotation, covering a total of 80 category tags including different clothing styles and scenes.
MD-fashion-2 is a clothing pattern classification dataset. The total amount of data is 20W images. Different from MD-Fashion-1, this dataset focuses more on the classification of clothing pattern features. The annotation method of dataset adopts image classification label, which has 30 common classification categories.
MD-Fashion-4 is the dataset of clothing fabric classification. The total amount of data is about 20W images. The data set provides classification labels and masks for clothing materials, including 11 common fabric categories.
MD-Fashion-5 is a dataset of clothing key points, containing 100W pictures. The dataset covers 80 clothing types with the coordinates of key points and Bbox as annotated information.
MD-image-027 is mainly a clothing segmentation dataset collected from the Internet. The dataset contains 1.43w images with resolutions between 183 x 275 and 3024 x 4032. Through pixel level segmentation semantic annotation of background, hat, hair, sunglasses, coat, skirt, pants, dress, tie, left shoe, right shoe, face, left leg, right leg, left arm, right arm, bag, scarf, mobile phone and other large accessories, a total of about 30 target categories, making the dataset in e-commerce, Many scenes such as visual entertainment and metasurverse virtual human have important application value.
MD-image-026 splits datasets for people and clothing. The dataset contains 19.7w images with a minimum resolution of 92 x 153 and a maximum resolution of 3024x 5381. Clothing categories include background, hat, hair, sunglasses, coat, skirt, pants, hat, gloves, sunglasses, coat, socks, skirt, shoes and body parts such as face, left and right legs, left and right arms, etc. Compared with MD-image-027, MD-image-026 adds more semantic segmentation categories of body parts, such as face, etc.
MD-image-061 is a scarf fabric segmentation dataset. The dataset contains 2000 images with resolutions between 504x 678 and 192x 2880. The data set makes high-precision semantic segmentation annotation for the scarf images.
3. Dataset comparison
3.1 MD-fashion vs. Deepfashion
In terms of total data volume, MD has more data volume of about 360w, while DeepFasion has 80w. DeepFasion was released in 2016, and its image content is a little outdated, and lacking in fashionable clothing styles. MD’s data content is the latest collection from the Internet, fashion shows ,and so on. In terms of image labeling, DeepFasion lacks Mask labeling. In terms of detection, key points are used as labeling information and only three detection categories are included (top, bottom and whole body). MD covers a large number of key points, Mask annotation, and segmentation annotation category is more than 30, so it is more fine-grained. Deepfashion segmentation labeling has only one category in a picture, while MD contains multiple categories. Deepfashion also provides annotation data for the text retrieval image task, and provides a detailed description of clothing attributes. Md-fashion provides a classified dataset for clothing material, which is convenient for the computer to further understand clothing.
3.2 MD-fashion vs. Deepfashion2
Firstly, MD is superior to Deepfashion2 in total dataset. The DeepFasion2 segmentation dataset contained 49.2W larger than MD21.1W, but the MD segmentation dataset contained more segmentation categories, not only the labeling of clothing, but also the labeling of accessories such as human body and mobile phone. The classification of MD segmentation was 30, which was better than DeepFasion2’s 13. Deepfashion2 also retains text descriptions for clothing.
3.3 MD-fashion vs. WAB
WAB prefers retrieval and target detection datasets and lacks semantically segmented annotations. The WAB target test contains 23 BBox, while the MD contains 80 apparel categories.
4. Trends In Fashion Datasets
With the rapid development of e-commerce platforms, clothing category has become a trillion-level market. With more than $30 billion in apparel sales on Amazon’s platform in 2018, it has surpassed Walmart to become the no. 1 apparel retailer in the United States. The COVID-19 epidemic in 2020 did not change the trend of market development, but further accelerated the pace of business reform and promoted the emergence of more new consumption patterns in the clothing market. Consumers are more receptive to online retail channels than before the epidemic, digital transformation of physical retailers is in full swing, and new consumer innovations such as live delivery, online experience and the sharing economy are in full swing.
The demand characteristics of the clothing category are closely related to the gender, age, ethnicity, region ,and consumption ability of the crowd. Buyers’ demand for clothing is actually diversified and fragmented. With the increase in clothing styles and the enrichment of application scenarios, the existing public datasets are evolving toward the following:
- Multiple Tasks.
- Larger Volume. To some extent, deep learning is based on data-driven approaches. Massive data plays an important role in model learning and understanding.
- Fine-Grained Annotation. It is manifested in the refinement of segmentation and classification. In the segmentation task, some accessories and other small objects are added to the mark, and the classification task expands the categories.
4.1 Multiple Tasks
Early Fashion datasets may only contain classification tasks. For example, fashion-Minst3 resampling JPEG images with a resolution of 762*1000 to obtain grayscale images with a resolution of only 28*28, such early datasets only marked the categories of clothing and other information suitable for classification tasks. After the launch of Deep Fashion, it added annotation information such as text description, Bbox and landmark, which also means that this dataset can be used for text retrieval and object detection task training. Md-fashion and Deep Fashion2 further provide Mask annotation, and continue to expand the task category to semantic segmentation to complete more fine-grained object detection tasks. The emergence of Deep Fashion 3D adds 3D scanning data to the dataset of Fashion series, which can deal with the 3D reconstruction tasks derived from the increasingly popular metasexes and virtual image scenes.
At present, the Fashion dataset already includes classification, segmentation, detection, generation, cross-modal retrieval, 3D reconstruction and other existing task annotations, and it may continue to increase in the future.
4.2 Larger Volume
In the task definition of supervised learning, deep learning can be regarded as a data-driven process, and massive and diverse data is of great significance to the improvement of network performance. With the development of time and the deepening of fashion research, the number and volume of datasets are also increasing exponentially, not only because of the increasing demand for application scenarios, but also because of the development of distributed training and the improvement of GPU storage capacity. The original fashion-Minst consisted of 6W training images and 1W test images, followed by Deep Fashion with 80W images and WAB with 104W images. Md-fashion has maintained the largest number of images at 360W so far.
Since Fashion dataset is different from other datasets such as animal species and fruit species, Fashion dataset has strong timeliness to a certain extent. For example, some latest styles, some new clothing materials, clothing accessories and so on will be updated over time, so the number of Fashion datasets will continue to be updated and tend to be larger in the future.
4.3 Fine-Grained Annotation
With the continuous development of technology and the continuous refinement and specialization of fashion application scenarios, datasets are also developing towards such a trend. For example, in the field of classification, there may be less than five categories before, but now MD-Fashion has increased to 80 categories. The previous classification was more focused on the attributes of clothing, such as short sleeves, jackets and trousers, but now the dataset has added more refined classification of clothing materials, such as cotton and silk. In the field of segmentation, the early DeepFashion series only had three types of segmentation, but now MD-Fashion supports up to 30 categories of classification labeling in one picture, which is more professional. In the aspect of detection, coordinate estimation is no longer limited to the object itself, and the key point estimation of clothing pattern is added.
The continuous specialization and refinement of Fashion application field requires more fine-grained labeling information to better meet downstream tasks.
 Mohammadi, Seyed Omid, and Ahmad Kalhor. “Smart Fashion: A Review of AI Applications in the Fashion & Apparel Industry.” arXiv preprint arXiv:2111.00905 (2021).
 He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
 He, Kaiming, et al. “Mask r-cnn.” Proceedings of the IEEE international conference on computer vision. 2017.
 Li, Xirong, et al. “W2vv++ fully deep learning for ad-hoc video search.” Proceedings of the 27th ACM International Conference on Multimedia. 2019.
 Redmon, Joseph, et al. “You only look once: Unified, real-time object detection.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
 Lassner, Christoph, Gerard Pons-Moll, and Peter V. Gehler. “A generative model of people in clothing.” Proceedings of the IEEE International Conference on Computer Vision. 2017.
 Zhu, Heming, et al. “Deep Fashion3D: A dataset and benchmark for 3D garment reconstruction from single images.” European Conference on Computer Vision. Springer, Cham, 2020.
 Ge, Yuying, et al. “Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
 Brouet, R. , et al. “Design preserving garment transfer.” ACM Transactions on Graphics (TOG) – SIGGRAPH 2012 Conference Proceedings (2012).
 GUAN P, REISS L, HIRSHBERG D A, et al.Drape: Dressing any person[J]. ACM Transactions on Graphics (TOG), 2012,31(4):1-10.
 Karras, Tero, et al. “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” International Conference on Learning Representations. 2018.