Standard Dataset

Large-scale, high-quality and scenario-based AI datasets have become the key to AI algorithm research and technology development. On the one hand, AI academic research field is putting higher requirements on the datasets scale and the data annotation

Datasets List (22)

Dataset ID

MD-Fashion-1

DESCRIPTION

[ Source ] Network collection, which covers typical scenarios such as e-commerce, fashion shows, social networking and offline user-generated content, etc.


[ Annotation ] Classified label, Bounding box.
More than 80 categories of labels, covering gender, clothing types and styles, scenarios, etc.

VOLUME

~2M

APPLICATION SCENARIO

E-commerce,Smart RetailO2O,Social Media,
Visual Entertainment

Dataset ID

MD-Fashion-2

DESCRIPTION

[ Source ] Network collection, which covers typical scenarios such as e-commerce, fashion shows, social networking and offline user-generated content, etc.


[ Annotation ] Classified label.
More than 30 categories of common patterns.

VOLUME

~200K

APPLICATION SCENARIO

E-commerce,Smart RetailO2O,Social Media,
Visual Entertainment

Dataset ID

MD-Fashion-3

DESCRIPTION

[ Source ] Network collection, which covers typical scenarios such as e-commerce, fashion shows, social networking and offline user-generated content, etc.

 

[ Annotation ] Contour segmentation
Including the main human parts, clothing, accessories, etc.

VOLUME

~500K

APPLICATION SCENARIO

E-commerce,Smart RetailO2O,Social Media,
Visual Entertainment

Dataset ID

MD-Fashion-4

DESCRIPTION

[ Source ] Network collection, which covers typical scenarios such as e-commerce, fashion shows, social networking and offline user-generated content, etc. 


[ Annotation ] Segmentation, label. 
Regional segmentation of 11 common fabric categories including 80 clothing types.

VOLUME

~200K

APPLICATION SCENARIO

E-commerce,Smart RetailO2O,Social Media,
Visual Entertainment

Dataset ID

MD-Fashion-5

DESCRIPTION

[ Source ] Network collection, which covers typical scenarios such as e-commerce, fashion shows, social networking and offline user-generated content, etc.

 
[ Annotation ] Classified label, bounding box, key point Includes key points with 80 clothing types.

VOLUME

~1M

APPLICATION SCENARIO

E-commerce,Smart RetailO2O,Social Media,
Visual Entertainment

Dataset ID

MD-Fashion-6

DESCRIPTION

[ Source ] Real scenes,The collection equipment include phones, cameras and tablet PCs. The image resolution is above 4000*3000, and the image format is JPG.

 

[ Annotation ] Polygon+OCR

VOLUME

~20K

APPLICATION SCENARIO

E-commerce,Smart RetailO2O,Social Media,
Visual Entertainment

Dataset ID

MD-OCR-001

DESCRIPTION

[ Source ] Real scenes signboards in English and Chinese. The collection equipment include phones, cameras and tablet PCs.

 

[ Annotation ] Polygon + Text.
Including unit letter and sentence.

 

[ Definition ] Including enterprise name, branch name, slogan, business scope, address and telephone, etc.

VOLUME

~30K

APPLICATION SCENARIO

Retail, Tourism, Catering

Dataset ID

MD-OCR-002

DESCRIPTION

[ Source ] Bill in different scenes. The collection equipment include phones, cameras and tablet PCs.
It covers over 10 kinds of common bills in mainland China,including flight itinerary, train tickets, hotel bill, ticket, taxi receipt, quota invoice, value-added tax invoice, toll invoice and coach ticket invoice, etc.

 

[ Annotation ] Polygon+OCR

 

[ Definition ] Over 20 label types, including categories, provinces, quality, code, number, invoice date, enterprise/ certificate number, tax, telephone, car license, ID, boarding time, drop-off time, price, mileage, wait time, surcharge, service charge and official receipts, etc.

VOLUME

~6K

APPLICATION SCENARIO

Retail, Tourism, Catering

Dataset ID

MD-OCR-003

DESCRIPTION

[ Source ] Real scenes,The collection equipment include phones, cameras and tablet PCs. The image resolution is above 4000*3000, and the image format is JPG.

 

[ Annotation ] Polygon+OCR

VOLUME

~12K

APPLICATION SCENARIO

Mobile

Dataset ID

MD-OCR-004

DESCRIPTION

[ Source ] Real scenes, The collection equipment include phones, cameras and tablet PCs.

[ Annotation ] Polygon+OCR. Including IP address and password.

VOLUME

~1K

APPLICATION SCENARIO

Mobile, Tourism, Catering

Dataset ID

MD-OCR-005

DESCRIPTION

[ Source ] Real scenes, The collection equipment include phones, cameras and tablet PCs. The image resolution is above 4000*3000, and the image format is JPG.

[ Annotation ] Ploygon+OCR

VOLUME

~20K

APPLICATION SCENARIO

Catering, Tourism

Dataset ID

MD-OCR-006

DESCRIPTION

[ Source ] Network collection,covering indoor, outdoor, natural scenes, garden scenes and other typical scenes, etc.

[ Annotation ] Classified label.
Thousands of different animals, contains mammals, aquatic animals, and amphibians distributed in Asia, Europe, Africa, North America, and South America.

VOLUME

~33K

APPLICATION SCENARIO

Tourism, Retail, Catering

Dataset ID

MD-OCR-007

DESCRIPTION

[ Source ] Real scenes. The collection equipment include phones, cameras and tablet PCs.
The image includes commodity wrappage, signboard, signpost, posters, parking lots, bodywork advertising, food packaging, architecture words, signs and book covers, etc.

 

[ Annotation ] Polygon+OCR
All text include simplified Chinese, English, Arabic numerals, and common symbols (commas, periods, Spaces, etc.)

VOLUME

~38K

APPLICATION SCENARIO

Tourism, Retail, Catering

Dataset ID

MD-OCR-008

DESCRIPTION

[ Source ] Real scenes. The collection equipment include phones, cameras and tablet PCs. The image resolution is above 4000*3000, and the image format is JPG.
It covers over 10 scenes with Arabic, Thai, Vietnamese, Hindi, English and Chinese,including commodity wrappage, signboard, signpost, poster, Electric appliance words, parking lots, costume words, architecture words, signs, menu, book covers, shopping prompt and tourist spots words, etc.

[ Annotation ] Polygon+OCR

VOLUME

~150K

APPLICATION SCENARIO

Tourism, Retail, Catering

Dataset ID

MD-OCR-009

DESCRIPTION

[ Source ] Network collection, covering indoor, outdoor, natural scenes, garden scenes and other typical scenes, etc.

[ Annotation ] Classified label.
Thousands of different animals, contains mammals, aquatic animals, and amphibians distributed in Asia, Europe, Africa, North America, and South America.

VOLUME

~40K

APPLICATION SCENARIO

Tourism, Retail, Catering

Dataset ID

MD-OCR-010

DESCRIPTION

[ Source ] Screenshots of web pages and manuscripts,the image format is JPG.


[ Annotation ] Polygon+OCR

VOLUME

~1K

APPLICATION SCENARIO

News,Tourism

Dataset ID

MD-OCR-011

DESCRIPTION

[ Source ] Common medical documents include medical invoices, billing lists, expense lists, and medical records.

 

[ Annotation ] Polygon+OCR, Personal privacy information preprocessing.

 

[ Definition ] Text and category information of edical invoices, billing lists, expense lists, and case report.

VOLUME

~10K

APPLICATION SCENARIO

Healthcare

Dataset ID

MD-OCR-012

DESCRIPTION

[ Source ] Health Examination Report

[ Annotation ] Polygon+OCR, Personal privacy information preprocessing.

VOLUME

~3K

APPLICATION SCENARIO

Healthcare

Dataset ID

MD-OCR-013

DESCRIPTION

[ Source ] Handwritten composition shooting, manuscript screenshots.

 

[ Annotation ] Polygon+OCR

VOLUME

~1K

APPLICATION SCENARIO

Education

Dataset ID

MD-OCR-014

DESCRIPTION

[ Source ] Real scenes,The collection equipment include phones, cameras and tablet PCs. The image resolution is above 4000*3000, and the image format is JPG.
Special shape text in Chinese and English, such as dense text, vertical text, arc text, art word, difficult sample, etc, covering posters, advertisements, commodity packaging, book pages, magazine pages, newspapers, clothing, LOGO, jerseys, home emblem, seals, shop signs, clothes, home decoration, etc.

[ Annotation ] Polygon + Text

VOLUME

~30K

APPLICATION SCENARIO

Tourism, Retail, Mobile

Dataset ID

MD-Image-001

DESCRIPTION

[ Source ] Network collection,covering indoor, outdoor, natural scenes, garden scenes and other typical scenes, etc.

[ Annotation ] Classified label.
Thousands of different animals, contains mammals, aquatic animals, and amphibians distributed in Asia, Europe, Africa, North America, and South America.

VOLUME

~280K

APPLICATION SCENARIO

Tourism, Entertainment, Education

Dataset ID

MD-Image-002

DESCRIPTION

[ Source ] Network collection, indoor scene

[ Annotation ] Contour segmentation
Nearly 200 kinds of food, including Chinese food, western food, Japanese food, fast food, bread and dessert, etc.

VOLUME

~30K

APPLICATION SCENARIO

Catering, Tourism