You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

4.9 KiB

comments description
true Ultralytics provides support for various datasets to facilitate multiple computer vision tasks. Check out our list of main datasets and their summaries.

Datasets Overview

Ultralytics provides support for various datasets to facilitate computer vision tasks such as detection, instance segmentation, pose estimation, classification, and multi-object tracking. Below is a list of the main Ultralytics datasets, followed by a summary of each computer vision task and the respective datasets.

Detection Datasets

Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object.

  • Argoverse: A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
  • COCO: A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
  • COCO8: Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
  • Global Wheat 2020: A dataset of wheat head images collected from around the world for object detection and localization tasks.
  • Objects365: A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
  • SKU-110K: A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
  • VisDrone: A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
  • VOC: The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
  • xView: A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.

Instance Segmentation Datasets

Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level.

  • COCO: A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
  • COCO8-seg: A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.

Pose Estimation

Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system.

  • COCO: A large-scale dataset with human pose annotations designed for pose estimation tasks.
  • COCO8-pose: A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.

Classification

Image classification is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content.

  • Caltech 101: A dataset containing images of 101 object categories for image classification tasks.
  • Caltech 256: An extended version of Caltech 101 with 256 object categories and more challenging images.
  • CIFAR-10: A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
  • CIFAR-100: An extended version of CIFAR-10 with 100 object categories and 600 images per class.
  • Fashion-MNIST: A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
  • ImageNet: A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
  • ImageNet-10: A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
  • Imagenette: A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
  • Imagewoof: A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
  • MNIST: A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.

Multi-Object Tracking

Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence.

  • Argoverse: A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks.
  • VisDrone: A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.