ultralytics 8.0.97
confusion matrix, windows, docs updates (#2511)
Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com> Co-authored-by: Dowon <ks2515@naver.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com>
This commit is contained in:
@ -1,5 +1,6 @@
|
||||
---
|
||||
comments: true
|
||||
description: Learn how torchvision organizes classification image datasets. Use this code to create and train models. CLI and Python code shown.
|
||||
---
|
||||
|
||||
# Image Classification Datasets Overview
|
||||
@ -77,6 +78,7 @@ cifar-10-/
|
||||
In this example, the `train` directory contains subdirectories for each class in the dataset, and each class subdirectory contains all the images for that class. The `test` directory has a similar structure. The `root` directory also contains other files that are part of the CIFAR10 dataset.
|
||||
|
||||
## Usage
|
||||
|
||||
!!! example ""
|
||||
|
||||
=== "Python"
|
||||
@ -98,4 +100,5 @@ In this example, the `train` directory contains subdirectories for each class in
|
||||
```
|
||||
|
||||
## Supported Datasets
|
||||
|
||||
TODO
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
comments: true
|
||||
description: Learn about the COCO dataset, designed to encourage research on object detection, segmentation, and captioning with standardized evaluation metrics.
|
||||
---
|
||||
|
||||
# COCO Dataset
|
||||
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
comments: true
|
||||
description: Learn about supported dataset formats for training YOLO detection models, including Ultralytics YOLO and COCO, in this Object Detection Datasets Overview.
|
||||
---
|
||||
|
||||
# Object Detection Datasets Overview
|
||||
@ -15,11 +16,12 @@ The dataset format used for training YOLO detection models is as follows:
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
|
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image.
|
||||
3. Object information per row: Each row contains the following information about the object instance:
|
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
|
||||
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
|
||||
- Object width and height: The width and height of the object, normalized to be between 0 and 1.
|
||||
|
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
|
||||
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
|
||||
- Object width and height: The width and height of the object, normalized to be between 0 and 1.
|
||||
|
||||
The format for a single row in the detection dataset file is as follows:
|
||||
|
||||
```
|
||||
<object-class> <x> <y> <width> <height>
|
||||
```
|
||||
@ -55,6 +57,7 @@ The `names` field is a list of the names of the object classes. The order of the
|
||||
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory
|
||||
|
||||
Alternatively, you can directly define class names like this:
|
||||
|
||||
```yaml
|
||||
names:
|
||||
0: person
|
||||
@ -72,6 +75,7 @@ names: ['person', 'car']
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
!!! example ""
|
||||
|
||||
=== "Python"
|
||||
@ -93,6 +97,7 @@ names: ['person', 'car']
|
||||
```
|
||||
|
||||
## Supported Datasets
|
||||
|
||||
TODO
|
||||
|
||||
## Port or Convert label formats
|
||||
@ -103,4 +108,4 @@ TODO
|
||||
from ultralytics.yolo.data.converter import convert_coco
|
||||
|
||||
convert_coco(labels_dir='../coco/annotations/')
|
||||
```
|
||||
```
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
comments: true
|
||||
description: Ultralytics provides support for various datasets to facilitate multiple computer vision tasks. Check out our list of main datasets and their summaries.
|
||||
---
|
||||
|
||||
# Datasets Overview
|
||||
@ -10,48 +11,48 @@ Ultralytics provides support for various datasets to facilitate computer vision
|
||||
|
||||
Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object.
|
||||
|
||||
* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
|
||||
* [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
|
||||
* [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
|
||||
* [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks.
|
||||
* [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
|
||||
* [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
|
||||
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
|
||||
* [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
|
||||
* [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
|
||||
* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
|
||||
* [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
|
||||
* [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
|
||||
* [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks.
|
||||
* [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
|
||||
* [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
|
||||
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
|
||||
* [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
|
||||
* [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
|
||||
|
||||
## [Instance Segmentation Datasets](segment/index.md)
|
||||
|
||||
Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level.
|
||||
|
||||
* [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
|
||||
* [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.
|
||||
* [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
|
||||
* [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.
|
||||
|
||||
## [Pose Estimation](pose/index.md)
|
||||
|
||||
Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system.
|
||||
|
||||
* [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks.
|
||||
* [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.
|
||||
* [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks.
|
||||
* [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.
|
||||
|
||||
## [Classification](classify/index.md)
|
||||
|
||||
Image classification is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content.
|
||||
|
||||
* [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
|
||||
* [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
|
||||
* [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
|
||||
* [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
|
||||
* [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
|
||||
* [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
|
||||
* [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
|
||||
* [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
|
||||
* [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
|
||||
* [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.
|
||||
* [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
|
||||
* [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
|
||||
* [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
|
||||
* [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
|
||||
* [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
|
||||
* [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
|
||||
* [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
|
||||
* [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
|
||||
* [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
|
||||
* [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.
|
||||
|
||||
## [Multi-Object Tracking](track/index.md)
|
||||
|
||||
Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence.
|
||||
|
||||
* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks.
|
||||
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
|
||||
* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
comments: true
|
||||
description: Learn how to format your dataset for training YOLO models with Ultralytics YOLO format using our concise tutorial and example YAML files.
|
||||
---
|
||||
|
||||
# Pose Estimation Datasets Overview
|
||||
@ -15,26 +16,26 @@ The dataset format used for training YOLO segmentation models is as follows:
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
|
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image.
|
||||
3. Object information per row: Each row contains the following information about the object instance:
|
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
|
||||
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
|
||||
- Object width and height: The width and height of the object, normalized to be between 0 and 1.
|
||||
- Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1.
|
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
|
||||
- Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
|
||||
- Object width and height: The width and height of the object, normalized to be between 0 and 1.
|
||||
- Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1.
|
||||
|
||||
Here is an example of the label format for pose estimation task:
|
||||
|
||||
Format with Dim = 2
|
||||
|
||||
```
|
||||
<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2> <pxn> <pyn>
|
||||
<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2> ... <pxn> <pyn>
|
||||
```
|
||||
|
||||
Format with Dim = 3
|
||||
|
||||
```
|
||||
<class-index> <x> <y> <width> <height> <px1> <py1> <p1-visibility> <px2> <py2> <p2-visibility> <pxn> <pyn> <p2-visibility>
|
||||
```
|
||||
|
||||
In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2> <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces.
|
||||
|
||||
In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2> ... <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces.
|
||||
|
||||
** Dataset file format **
|
||||
|
||||
@ -62,6 +63,7 @@ The `names` field is a list of the names of the object classes. The order of the
|
||||
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory
|
||||
|
||||
Alternatively, you can directly define class names like this:
|
||||
|
||||
```
|
||||
names:
|
||||
0: person
|
||||
@ -69,7 +71,7 @@ names:
|
||||
```
|
||||
|
||||
(Optional) if the points are symmetric then need flip_idx, like left-right side of human or face.
|
||||
For example let's say there're five keypoints of facial landmark: [left eye, right eye, nose, left point of mouth, right point of mouse], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3].(just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example)
|
||||
For example let's say there're five keypoints of facial landmark: [left eye, right eye, nose, left point of mouth, right point of mouse], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3].(just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example)
|
||||
|
||||
** Example **
|
||||
|
||||
@ -86,6 +88,7 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
!!! example ""
|
||||
|
||||
=== "Python"
|
||||
@ -107,6 +110,7 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
|
||||
```
|
||||
|
||||
## Supported Datasets
|
||||
|
||||
TODO
|
||||
|
||||
## Port or Convert label formats
|
||||
@ -117,4 +121,4 @@ TODO
|
||||
from ultralytics.yolo.data.converter import convert_coco
|
||||
|
||||
convert_coco(labels_dir='../coco/annotations/', use_keypoints=True)
|
||||
```
|
||||
```
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
comments: true
|
||||
description: Learn about the Ultralytics YOLO dataset format for segmentation models. Use YAML to train Detection Models. Convert COCO to YOLO format using Python.
|
||||
---
|
||||
|
||||
# Instance Segmentation Datasets Overview
|
||||
@ -15,8 +16,8 @@ The dataset format used for training YOLO segmentation models is as follows:
|
||||
1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
|
||||
2. One row per object: Each row in the text file corresponds to one object instance in the image.
|
||||
3. Object information per row: Each row contains the following information about the object instance:
|
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
|
||||
- Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1.
|
||||
- Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
|
||||
- Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1.
|
||||
|
||||
The format for a single row in the segmentation dataset file is as follows:
|
||||
|
||||
@ -24,7 +25,7 @@ The format for a single row in the segmentation dataset file is as follows:
|
||||
<class-index> <x1> <y1> <x2> <y2> ... <xn> <yn>
|
||||
```
|
||||
|
||||
In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces.
|
||||
In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces.
|
||||
|
||||
Here is an example of the YOLO dataset format for a single image with two object instances:
|
||||
|
||||
@ -32,6 +33,7 @@ Here is an example of the YOLO dataset format for a single image with two object
|
||||
0 0.6812 0.48541 0.67 0.4875 0.67656 0.487 0.675 0.489 0.66
|
||||
1 0.5046 0.0 0.5015 0.004 0.4984 0.00416 0.4937 0.010 0.492 0.0104
|
||||
```
|
||||
|
||||
Note: The length of each row does not have to be equal.
|
||||
|
||||
** Dataset file format **
|
||||
@ -56,6 +58,7 @@ The `names` field is a list of the names of the object classes. The order of the
|
||||
NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory.
|
||||
|
||||
Alternatively, you can directly define class names like this:
|
||||
|
||||
```yaml
|
||||
names:
|
||||
0: person
|
||||
@ -73,6 +76,7 @@ names: ['person', 'car']
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
!!! example ""
|
||||
|
||||
=== "Python"
|
||||
@ -103,4 +107,4 @@ names: ['person', 'car']
|
||||
from ultralytics.yolo.data.converter import convert_coco
|
||||
|
||||
convert_coco(labels_dir='../coco/annotations/', use_segments=True)
|
||||
```
|
||||
```
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
comments: true
|
||||
description: Discover the datasets compatible with Multi-Object Detector. Train your trackers and make your detections more efficient with Ultralytics' YOLO.
|
||||
---
|
||||
|
||||
# Multi-object Tracking Datasets Overview
|
||||
@ -25,5 +26,4 @@ Support for training trackers alone is coming soon
|
||||
|
||||
```bash
|
||||
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" conf=0.3, iou=0.5 show
|
||||
```
|
||||
|
||||
```
|
Reference in New Issue
Block a user