ultralytics 8.0.97 confusion matrix, windows, docs updates (#2511)

Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com> Co-authored-by: Dowon <ks2515@naver.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com>
2023-05-09 21:20:34 +02:00
parent 6ee3a9a74b
commit d1107ca4cb
138 changed files with 744 additions and 351 deletions
--- a/docs/datasets/classify/index.md
+++ b/docs/datasets/classify/index.md
@ -1,5 +1,6 @@
 ---
 comments: true
+description: Learn how torchvision organizes classification image datasets. Use this code to create and train models. CLI and Python code shown.
 ---

 # Image Classification Datasets Overview
@ -77,6 +78,7 @@ cifar-10-/
 In this example, the `train` directory contains subdirectories for each class in the dataset, and each class subdirectory contains all the images for that class. The `test` directory has a similar structure. The `root` directory also contains other files that are part of the CIFAR10 dataset.

 ## Usage
+
 !!! example ""

    === "Python"
@ -98,4 +100,5 @@ In this example, the `train` directory contains subdirectories for each class in
        ```

 ## Supported Datasets
+
 TODO
--- a/docs/datasets/detect/coco.md
+++ b/docs/datasets/detect/coco.md
@ -1,5 +1,6 @@
 ---
 comments: true
+description: Learn about the COCO dataset, designed to encourage research on object detection, segmentation, and captioning with standardized evaluation metrics.
 ---

 # COCO Dataset
--- a/docs/datasets/detect/index.md
+++ b/docs/datasets/detect/index.md
@ -1,5 +1,6 @@
 ---
 comments: true
+description: Learn about supported dataset formats for training YOLO detection models, including Ultralytics YOLO and COCO, in this Object Detection Datasets Overview.
 ---

 # Object Detection Datasets Overview
@ -15,11 +16,12 @@ The dataset format used for training YOLO detection models is as follows:
 1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
 2. One row per object: Each row in the text file corresponds to one object instance in the image.
 3. Object information per row: Each row contains the following information about the object instance:
-   - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
-   - Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
-   - Object width and height: The width and height of the object, normalized to be between 0 and 1.
-   
+    - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
+    - Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
+    - Object width and height: The width and height of the object, normalized to be between 0 and 1.
+
 The format for a single row in the detection dataset file is as follows:
+
 ```
 <object-class> <x> <y> <width> <height>
 ```
@ -55,6 +57,7 @@ The `names` field is a list of the names of the object classes. The order of the
 NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory

 Alternatively, you can directly define class names like this:
+
 ```yaml
 names:
  0: person
@ -72,6 +75,7 @@ names: ['person', 'car']
 ```

 ## Usage
+
 !!! example ""

    === "Python"
@ -93,6 +97,7 @@ names: ['person', 'car']
        ```

 ## Supported Datasets
+
 TODO

 ## Port or Convert label formats
@ -103,4 +108,4 @@ TODO
 from ultralytics.yolo.data.converter import convert_coco

 convert_coco(labels_dir='../coco/annotations/')
-```
+```
--- a/docs/datasets/index.md
+++ b/docs/datasets/index.md
@ -1,5 +1,6 @@
 ---
 comments: true
+description: Ultralytics provides support for various datasets to facilitate multiple computer vision tasks. Check out our list of main datasets and their summaries.
 ---

 # Datasets Overview
@ -10,48 +11,48 @@ Ultralytics provides support for various datasets to facilitate computer vision

 Bounding box object detection is a computer vision technique that involves detecting and localizing objects in an image by drawing a bounding box around each object.

- * [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
- * [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
- * [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
- * [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks.
- * [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
- * [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
- * [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
- * [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
- * [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.
+* [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
+* [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
+* [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
+* [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks.
+* [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
+* [SKU-110K](detect/sku-110k.md): A dataset featuring dense object detection in retail environments with over 11K images and 1.7 million bounding boxes.
+* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
+* [VOC](detect/voc.md): The Pascal Visual Object Classes (VOC) dataset for object detection and segmentation with 20 object classes and over 11K images.
+* [xView](detect/xview.md): A dataset for object detection in overhead imagery with 60 object categories and over 1 million annotated objects.

 ## [Instance Segmentation Datasets](segment/index.md)

 Instance segmentation is a computer vision technique that involves identifying and localizing objects in an image at the pixel level.

- * [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
- * [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.
+* [COCO](segment/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
+* [COCO8-seg](segment/coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.

 ## [Pose Estimation](pose/index.md)

 Pose estimation is a technique used to determine the pose of the object relative to the camera or the world coordinate system.

- * [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks.
- * [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.
+* [COCO](pose/coco.md): A large-scale dataset with human pose annotations designed for pose estimation tasks.
+* [COCO8-pose](pose/coco8-pose.md): A smaller dataset for pose estimation tasks, containing a subset of 8 COCO images with human pose annotations.

 ## [Classification](classify/index.md)

 Image classification is a computer vision task that involves categorizing an image into one or more predefined classes or categories based on its visual content.

- * [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
- * [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
- * [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
- * [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
- * [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
- * [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
- * [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
- * [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
- * [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
- * [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.
+* [Caltech 101](classify/caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
+* [Caltech 256](classify/caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
+* [CIFAR-10](classify/cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
+* [CIFAR-100](classify/cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
+* [Fashion-MNIST](classify/fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
+* [ImageNet](classify/imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
+* [ImageNet-10](classify/imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
+* [Imagenette](classify/imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
+* [Imagewoof](classify/imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
+* [MNIST](classify/mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.

 ## [Multi-Object Tracking](track/index.md)

 Multi-object tracking is a computer vision technique that involves detecting and tracking multiple objects over time in a video sequence.

 * [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations for multi-object tracking tasks.
-* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
+* [VisDrone](detect/visdrone.md): A dataset containing object detection and multi-object tracking data from drone-captured imagery with over 10K images and video sequences.
--- a/docs/datasets/pose/index.md
+++ b/docs/datasets/pose/index.md
@ -1,5 +1,6 @@
 ---
 comments: true
+description: Learn how to format your dataset for training YOLO models with Ultralytics YOLO format using our concise tutorial and example YAML files.
 ---

 # Pose Estimation Datasets Overview
@ -15,26 +16,26 @@ The dataset format used for training YOLO segmentation models is as follows:
 1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
 2. One row per object: Each row in the text file corresponds to one object instance in the image.
 3. Object information per row: Each row contains the following information about the object instance:
-   - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
-   - Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
-   - Object width and height: The width and height of the object, normalized to be between 0 and 1.
-   - Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1.
+    - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
+    - Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
+    - Object width and height: The width and height of the object, normalized to be between 0 and 1.
+    - Object keypoint coordinates: The keypoints of the object, normalized to be between 0 and 1.

 Here is an example of the label format for pose estimation task:

 Format with Dim = 2

 ```
-<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2>  <pxn> <pyn>
+<class-index> <x> <y> <width> <height> <px1> <py1> <px2> <py2> ... <pxn> <pyn>
 ```
+
 Format with Dim = 3

 ```
 <class-index> <x> <y> <width> <height> <px1> <py1> <p1-visibility> <px2> <py2> <p2-visibility> <pxn> <pyn> <p2-visibility>
 ```

-In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2>  <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces. 
-
+In this format, `<class-index>` is the index of the class for the object,`<x> <y> <width> <height>` are coordinates of boudning box, and `<px1> <py1> <px2> <py2> ... <pxn> <pyn>` are the pixel coordinates of the keypoints. The coordinates are separated by spaces.

 ** Dataset file format **

@ -62,6 +63,7 @@ The `names` field is a list of the names of the object classes. The order of the
 NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory

 Alternatively, you can directly define class names like this:
+
 ```
 names:
  0: person
@ -69,7 +71,7 @@ names:
 ```

 (Optional) if the points are symmetric then need flip_idx, like left-right side of human or face.
-For example let's say there're five keypoints of facial landmark: [left eye, right eye, nose, left point of mouth, right point of mouse], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3].(just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example) 
+For example let's say there're five keypoints of facial landmark: [left eye, right eye, nose, left point of mouth, right point of mouse], and the original index is [0, 1, 2, 3, 4], then flip_idx is [1, 0, 2, 4, 3].(just exchange the left-right index, i.e 0-1 and 3-4, and do not modify others like nose in this example)

 ** Example **

@ -86,6 +88,7 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
 ```

 ## Usage
+
 !!! example ""

    === "Python"
@ -107,6 +110,7 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
        ```

 ## Supported Datasets
+
 TODO

 ## Port or Convert label formats
@ -117,4 +121,4 @@ TODO
 from ultralytics.yolo.data.converter import convert_coco

 convert_coco(labels_dir='../coco/annotations/', use_keypoints=True)
-```
+```
--- a/docs/datasets/segment/index.md
+++ b/docs/datasets/segment/index.md
@ -1,5 +1,6 @@
 ---
 comments: true
+description: Learn about the Ultralytics YOLO dataset format for segmentation models. Use YAML to train Detection Models. Convert COCO to YOLO format using Python.
 ---

 # Instance Segmentation Datasets Overview
@ -15,8 +16,8 @@ The dataset format used for training YOLO segmentation models is as follows:
 1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
 2. One row per object: Each row in the text file corresponds to one object instance in the image.
 3. Object information per row: Each row contains the following information about the object instance:
-   - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
-   - Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1.
+    - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
+    - Object bounding coordinates: The bounding coordinates around the mask area, normalized to be between 0 and 1.

 The format for a single row in the segmentation dataset file is as follows:

@ -24,7 +25,7 @@ The format for a single row in the segmentation dataset file is as follows:
 <class-index> <x1> <y1> <x2> <y2> ... <xn> <yn>
 ```

-In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces. 
+In this format, `<class-index>` is the index of the class for the object, and `<x1> <y1> <x2> <y2> ... <xn> <yn>` are the bounding coordinates of the object's segmentation mask. The coordinates are separated by spaces.

 Here is an example of the YOLO dataset format for a single image with two object instances:

@ -32,6 +33,7 @@ Here is an example of the YOLO dataset format for a single image with two object
 0 0.6812 0.48541 0.67 0.4875 0.67656 0.487 0.675 0.489 0.66
 1 0.5046 0.0 0.5015 0.004 0.4984 0.00416 0.4937 0.010 0.492 0.0104
 ```
+
 Note: The length of each row does not have to be equal.

 ** Dataset file format **
@ -56,6 +58,7 @@ The `names` field is a list of the names of the object classes. The order of the
 NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory.

 Alternatively, you can directly define class names like this:
+
 ```yaml
 names:
  0: person
@ -73,6 +76,7 @@ names: ['person', 'car']
 ```

 ## Usage
+
 !!! example ""

    === "Python"
@ -103,4 +107,4 @@ names: ['person', 'car']
 from ultralytics.yolo.data.converter import convert_coco

 convert_coco(labels_dir='../coco/annotations/', use_segments=True)
-```
+```
--- a/docs/datasets/track/index.md
+++ b/docs/datasets/track/index.md
@ -1,5 +1,6 @@
 ---
 comments: true
+description: Discover the datasets compatible with Multi-Object Detector. Train your trackers and make your detections more efficient with Ultralytics' YOLO.
 ---

 # Multi-object Tracking Datasets Overview
@ -25,5 +26,4 @@ Support for training trackers alone is coming soon
    
        ```bash
        yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" conf=0.3, iou=0.5 show
-        ```
-
+        ```