diff --git a/README.md b/README.md
index c6ec78f..eaab62f 100644
--- a/README.md
+++ b/README.md
@@ -57,12 +57,14 @@ See below for a quickstart installation and usage example, and see the [YOLOv8 D
Install
-Pip install the ultralytics package including all [requirements](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) in a [**Python>=3.7**](https://www.python.org/) environment with [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/).
+Pip install the ultralytics package including all [requirements](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) in a [**Python>=3.8**](https://www.python.org/) environment with [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/).
```bash
pip install ultralytics
```
+For alternative installation methods including Conda, Docker, and Git, please refer to the [Ultralytics Quickstart Guide](https://docs.ultralytics.com/quickstart).
+
diff --git a/README.zh-CN.md b/README.zh-CN.md
index 6e4ca42..b171f32 100644
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -57,12 +57,14 @@
安装
-在一个 [**Python>=3.7**](https://www.python.org/) 环境中,使用 [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/),通过 pip 安装 ultralytics 软件包以及所有[依赖项](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt)。
+使用Pip在一个[**Python>=3.8**](https://www.python.org/)环境中安装`ultralytics`包,此环境还需包含[**PyTorch>=1.7**](https://pytorch.org/get-started/locally/)。这也会安装所有必要的[依赖项](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt)。
```bash
pip install ultralytics
```
+如需使用包括Conda、Docker和Git在内的其他安装方法,请参考[Ultralytics快速入门指南](https://docs.ultralytics.com/quickstart)。
+
diff --git a/docs/datasets/classify/index.md b/docs/datasets/classify/index.md
index e20c456..ab6ca5c 100644
--- a/docs/datasets/classify/index.md
+++ b/docs/datasets/classify/index.md
@@ -102,4 +102,19 @@ In this example, the `train` directory contains subdirectories for each class in
## Supported Datasets
-TODO
+Ultralytics supports the following datasets with automatic download:
+
+* [Caltech 101](caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
+* [Caltech 256](caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
+* [CIFAR-10](cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
+* [CIFAR-100](cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
+* [Fashion-MNIST](fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
+* [ImageNet](imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
+* [ImageNet-10](imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
+* [Imagenette](imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
+* [Imagewoof](imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
+* [MNIST](mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.
+
+### Adding your own dataset
+
+If you have your own dataset and would like to use it for training classification models with Ultralytics, ensure that it follows the format specified above under "Dataset format" and then point your `data` argument to the dataset directory.
\ No newline at end of file
diff --git a/docs/datasets/detect/index.md b/docs/datasets/detect/index.md
index 7ce8cf0..ce7902f 100644
--- a/docs/datasets/detect/index.md
+++ b/docs/datasets/detect/index.md
@@ -1,81 +1,53 @@
---
comments: true
-description: Learn about supported dataset formats for training YOLO detection models, including Ultralytics YOLO and COCO, in this Object Detection Datasets Overview.
-keywords: object detection, datasets, formats, Ultralytics YOLO, label format, dataset file format, dataset definition, YOLO dataset, model configuration
+description: Explore supported dataset formats for training YOLO detection models, including Ultralytics YOLO and COCO. This guide covers various dataset formats and their specific configurations for effective object detection training.
+keywords: object detection, datasets, formats, Ultralytics YOLO, COCO, label format, dataset file format, dataset definition, YOLO dataset, model configuration
---
# Object Detection Datasets Overview
+Training a robust and accurate object detection model requires a comprehensive dataset. This guide introduces various formats of datasets that are compatible with the Ultralytics YOLO model and provides insights into their structure, usage, and how to convert between different formats.
+
## Supported Dataset Formats
### Ultralytics YOLO format
-** Label Format **
-
-The dataset format used for training YOLO detection models is as follows:
-
-1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
-2. One row per object: Each row in the text file corresponds to one object instance in the image.
-3. Object information per row: Each row contains the following information about the object instance:
- - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
- - Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
- - Object width and height: The width and height of the object, normalized to be between 0 and 1.
-
-The format for a single row in the detection dataset file is as follows:
-
-```
-
-```
-
-Here is an example of the YOLO dataset format for a single image with two object instances:
-
-```
-0 0.5 0.4 0.3 0.6
-1 0.3 0.7 0.4 0.2
-```
-
-In this example, the first object is of class 0 (person), with its center at (0.5, 0.4), width of 0.3, and height of 0.6. The second object is of class 1 (car), with its center at (0.3, 0.7), width of 0.4, and height of 0.2.
-
-** Dataset file format **
-
-The Ultralytics framework uses a YAML file format to define the dataset and model configuration for training Detection Models. Here is an example of the YAML format used for defining a detection dataset:
+The Ultralytics YOLO format is a dataset configuration format that allows you to define the dataset root directory, the relative paths to training/validation/testing image directories or *.txt files containing image paths, and a dictionary of class names. Here is an example:
```yaml
-train:
-val:
-
-nc:
-names: [, , ..., ]
-```
-
-The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively.
-
-The `nc` field specifies the number of object classes in the dataset.
+# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
+path: ../datasets/coco128 # dataset root dir
+train: images/train2017 # train images (relative to 'path') 128 images
+val: images/train2017 # val images (relative to 'path') 128 images
+test: # test images (optional)
-The `names` field is a list of the names of the object classes. The order of the names should match the order of the object class indices in the YOLO dataset files.
-
-NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory
-
-Alternatively, you can directly define class names like this:
-
-```yaml
+# Classes (80 COCO classes)
names:
0: person
1: bicycle
+ 2: car
+ ...
+ 77: teddy bear
+ 78: hair drier
+ 79: toothbrush
```
-** Example **
+Labels for this format should be exported to YOLO format with one `*.txt` file per image. If there are no objects in an image, no `*.txt` file is required. The `*.txt` file should be formatted with one row per object in `class x_center y_center width height` format. Box coordinates must be in **normalized xywh** format (from 0 - 1). If your boxes are in pixels, you should divide `x_center` and `width` by image width, and `y_center` and `height` by image height. Class numbers should be zero-indexed (start with 0).
-```yaml
-train: data/train/
-val: data/val/
+
-nc: 2
-names: ['person', 'car']
-```
+The label file corresponding to the above image contains 2 persons (class `0`) and a tie (class `27`):
+
+
+
+When using the Ultralytics YOLO format, organize your training and validation images and labels as shown in the example below.
+
+
## Usage
+Here's how you can use these formats to train your model:
+
!!! example ""
=== "Python"
@@ -98,14 +70,34 @@ names: ['person', 'car']
## Supported Datasets
-TODO
+Here is a list of the supported datasets and a brief description for each:
+
+- [**Argoverse**](./argoverse.md): A collection of sensor data collected from autonomous vehicles. It contains 3D tracking annotations for car objects.
+- [**COCO**](./coco.md): Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories.
+- [**COCO8**](./coco8.md): A smaller subset of the COCO dataset, COCO8 is more lightweight and faster to train.
+- [**GlobalWheat2020**](./globalwheat2020.md): A dataset containing images of wheat heads for the Global Wheat Challenge 2020.
+- [**Objects365**](./objects365.md): A large-scale object detection dataset with 365 object categories and 600k images, aimed at advancing object detection research.
+- [**SKU-110K**](./sku-110k.md): A dataset containing images of densely packed retail products, intended for retail environment object detection.
+- [**VisDrone**](./visdrone.md): A dataset focusing on drone-based images, containing various object categories like cars, pedestrians, and cyclists.
+- [**VOC**](./voc.md): PASCAL VOC is a popular object detection dataset with 20 object categories including vehicles, animals, and furniture.
+- [**xView**](./xview.md): A dataset containing high-resolution satellite imagery, designed for the detection of various object classes in overhead views.
+
+### Adding your own dataset
+
+If you have your own dataset and would like to use it for training detection models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.
-## Port or Convert label formats
+## Port or Convert Label Formats
-### COCO dataset format to YOLO format
+### COCO Dataset Format to YOLO Format
+
+You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet:
```python
from ultralytics.yolo.data.converter import convert_coco
convert_coco(labels_dir='../coco/annotations/')
-```
\ No newline at end of file
+```
+
+This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format.
+
+Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models.
\ No newline at end of file
diff --git a/docs/datasets/detect/sku-110k.md b/docs/datasets/detect/sku-110k.md
index 508f82a..1d366b3 100644
--- a/docs/datasets/detect/sku-110k.md
+++ b/docs/datasets/detect/sku-110k.md
@@ -61,6 +61,7 @@ To train a YOLOv8n model on the SKU-110K dataset for 100 epochs with an image si
```bash
# Start training from a pretrained *.pt model
yolo detect train data=SKU-110K.yaml model=yolov8n.pt epochs=100 imgsz=640
+ ```
## Sample Data and Annotations
diff --git a/docs/datasets/detect/visdrone.md b/docs/datasets/detect/visdrone.md
index 2723601..cd7a309 100644
--- a/docs/datasets/detect/visdrone.md
+++ b/docs/datasets/detect/visdrone.md
@@ -10,22 +10,6 @@ The [VisDrone Dataset](https://github.com/VisDrone/VisDrone-Dataset) is a large-
VisDrone is composed of 288 video clips with 261,908 frames and 10,209 static images, captured by various drone-mounted cameras. The dataset covers a wide range of aspects, including location (14 different cities across China), environment (urban and rural), objects (pedestrians, vehicles, bicycles, etc.), and density (sparse and crowded scenes). The dataset was collected using various drone platforms under different scenarios and weather and lighting conditions. These frames are manually annotated with over 2.6 million bounding boxes of targets such as pedestrians, cars, bicycles, and tricycles. Attributes like scene visibility, object class, and occlusion are also provided for better data utilization.
-## Citation
-
-If you use the VisDrone dataset in your research or development work, please cite the following paper:
-
-```bibtex
-@ARTICLE{9573394,
- author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
- journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
- title={Detection and Tracking Meet Drones Challenge},
- year={2021},
- volume={},
- number={},
- pages={1-1},
- doi={10.1109/TPAMI.2021.3119563}}
-```
-
## Dataset Structure
The VisDrone dataset is organized into five main subsets, each focusing on a specific task:
diff --git a/docs/datasets/pose/index.md b/docs/datasets/pose/index.md
index aac2724..c9695fa 100644
--- a/docs/datasets/pose/index.md
+++ b/docs/datasets/pose/index.md
@@ -111,14 +111,40 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
## Supported Datasets
-TODO
+This section outlines the datasets that are compatible with Ultralytics YOLO format and can be used for training pose estimation models:
-## Port or Convert label formats
+### COCO-Pose
-### COCO dataset format to YOLO format
+- **Description**: COCO-Pose is a large-scale object detection, segmentation, and pose estimation dataset. It is a subset of the popular COCO dataset and focuses on human pose estimation. COCO-Pose includes multiple keypoints for each human instance.
+- **Label Format**: Same as Ultralytics YOLO format as described above, with keypoints for human poses.
+- **Number of Classes**: 1 (Human).
+- **Keypoints**: 17 keypoints including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.
+- **Usage**: Suitable for training human pose estimation models.
+- **Additional Notes**: The dataset is rich and diverse, containing over 200k labeled images.
+- [Read more about COCO-Pose](./coco.md)
+
+### COCO8-Pose
+
+- **Description**: [Ultralytics](https://ultralytics.com) COCO8-Pose is a small, but versatile pose detection dataset composed of the first 8 images of the COCO train 2017 set, 4 for training and 4 for validation.
+- **Label Format**: Same as Ultralytics YOLO format as described above, with keypoints for human poses.
+- **Number of Classes**: 1 (Human).
+- **Keypoints**: 17 keypoints including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.
+- **Usage**: Suitable for testing and debugging object detection models, or for experimenting with new detection approaches.
+- **Additional Notes**: COCO8-Pose is ideal for sanity checks and CI checks.
+- [Read more about COCO8-Pose](./coco8-pose.md)
+
+### Adding your own dataset
+
+If you have your own dataset and would like to use it for training pose estimation models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.
+
+### Conversion Tool
+
+Ultralytics provides a convenient conversion tool to convert labels from the popular COCO dataset format to YOLO format:
```python
from ultralytics.yolo.data.converter import convert_coco
convert_coco(labels_dir='../coco/annotations/', use_keypoints=True)
```
+
+This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format. The `use_keypoints` parameter specifies whether to include keypoints (for pose estimation) in the converted labels.
diff --git a/docs/datasets/segment/index.md b/docs/datasets/segment/index.md
index 928d9f9..a793f2f 100644
--- a/docs/datasets/segment/index.md
+++ b/docs/datasets/segment/index.md
@@ -46,7 +46,7 @@ train:
val:
nc:
-names: [ , , ..., ]
+names: [, , ..., ]
```
@@ -73,7 +73,7 @@ train: data/train/
val: data/val/
nc: 2
-names: [ 'person', 'car' ]
+names: ['person', 'car']
```
## Usage
@@ -100,9 +100,18 @@ names: [ 'person', 'car' ]
## Supported Datasets
-## Port or Convert label formats
+* [COCO](coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
+* [COCO8-seg](coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.
-### COCO dataset format to YOLO format
+### Adding your own dataset
+
+If you have your own dataset and would like to use it for training segmentation models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.
+
+## Port or Convert Label Formats
+
+### COCO Dataset Format to YOLO Format
+
+You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet:
```python
from ultralytics.yolo.data.converter import convert_coco
@@ -110,6 +119,10 @@ from ultralytics.yolo.data.converter import convert_coco
convert_coco(labels_dir='../coco/annotations/', use_segments=True)
```
+This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format.
+
+Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models.
+
## Auto-Annotation
Auto-annotation is an essential feature that allows you to generate a segmentation dataset using a pre-trained detection model. It enables you to quickly and accurately annotate a large number of images without the need for manual labeling, saving time and effort.
diff --git a/docs/models/index.md b/docs/models/index.md
index cce8af1..2b98bad 100644
--- a/docs/models/index.md
+++ b/docs/models/index.md
@@ -20,21 +20,25 @@ In this documentation, we provide information on four major models:
8. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
9. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
-You can use these models directly in the Command Line Interface (CLI) or in a Python environment. Below are examples of how to use the models with CLI and Python:
+You can use many of these models directly in the Command Line Interface (CLI) or in a Python environment. Below are examples of how to use the models with CLI and Python:
## CLI Example
+Use the `model` argument to pass a model YAML such as `model=yolov8n.yaml` or a pretrained *.pt file such as `model=yolov8n.pt`
+
```bash
-yolo task=detect mode=train model=yolov8n.yaml data=coco128.yaml epochs=100
+yolo task=detect mode=train model=yolov8n.pt data=coco128.yaml epochs=100
```
## Python Example
+PyTorch pretrained models as well as model YAML files can also be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in python:
+
```python
from ultralytics import YOLO
-model = YOLO("model.yaml") # build a YOLOv8n model from scratch
-# YOLO("model.pt") use pre-trained model if available
+model = YOLO("yolov8n.pt") # load a pretrained YOLOv8n model
+
model.info() # display model information
model.train(data="coco128.yaml", epochs=100) # train the model
```
diff --git a/docs/modes/predict.md b/docs/modes/predict.md
index 581d422..8708933 100644
--- a/docs/modes/predict.md
+++ b/docs/modes/predict.md
@@ -1,6 +1,6 @@
---
comments: true
-description: Get started with YOLOv8 Predict mode and input sources. Accepts various input sources such as images, videos, and directories.
+description: Get started with YOLOv8 Predict mode and input sources. Accepts various input sources such as images, videos, and directories.
keywords: YOLOv8, predict mode, generator, streaming mode, input sources, video formats, arguments customization
---
@@ -12,60 +12,279 @@ passing `stream=True` in the predictor's call method.
!!! example "Predict"
- === "Return a list with `Stream=False`"
+ === "Return a list with `stream=False`"
```python
- inputs = [img, img] # list of numpy arrays
- results = model(inputs) # list of Results objects
+ from ultralytics import YOLO
+
+ # Load a model
+ model = YOLO('yolov8n.pt') # pretrained YOLOv8n model
+
+ # Run batched inference on a list of images
+ results = model(['im1.jpg', 'im2.jpg']) # return a list of Results objects
+ # Process results list
for result in results:
boxes = result.boxes # Boxes object for bbox outputs
masks = result.masks # Masks object for segmentation masks outputs
+ keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs # Class probabilities for classification outputs
```
- === "Return a generator with `Stream=True`"
+ === "Return a generator with `stream=True`"
```python
- inputs = [img, img] # list of numpy arrays
- results = model(inputs, stream=True) # generator of Results objects
+ from ultralytics import YOLO
+
+ # Load a model
+ model = YOLO('yolov8n.pt') # pretrained YOLOv8n model
+
+ # Run batched inference on a list of images
+ results = model(['im1.jpg', 'im2.jpg'], stream=True) # return a generator of Results objects
+ # Process results generator
for result in results:
boxes = result.boxes # Boxes object for bbox outputs
masks = result.masks # Masks object for segmentation masks outputs
+ keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs # Class probabilities for classification outputs
```
-!!! tip "Tip"
-
- Streaming mode with `stream=True` should be used for long videos or large predict sources, otherwise results will accumuate in memory and will eventually cause out-of-memory errors.
+## Inference Sources
-## Sources
+YOLOv8 can process different types of input sources for inference, as shown in the table below. The sources include static images, video streams, and various data formats. The table also indicates whether each source can be used in streaming mode with the argument `stream=True` ✅. Streaming mode is beneficial for processing videos or live streams as it creates a generator of results instead of loading all frames into memory.
-YOLOv8 can accept various input sources, as shown in the table below. This includes images, URLs, PIL images, OpenCV,
-numpy arrays, torch tensors, CSV files, videos, directories, globs, YouTube videos, and streams. The table indicates
-whether each source can be used in streaming mode with `stream=True` ✅ and an example argument for each source.
+!!! tip "Tip"
-| source | model(arg) | type | notes |
-|-------------|--------------------------------------------|----------------|------------------|
-| image | `'im.jpg'` | `str`, `Path` | |
-| URL | `'https://ultralytics.com/images/bus.jpg'` | `str` | |
-| screenshot | `'screen'` | `str` | |
-| PIL | `Image.open('im.jpg')` | `PIL.Image` | HWC, RGB |
-| OpenCV | `cv2.imread('im.jpg')` | `np.ndarray` | HWC, BGR |
-| numpy | `np.zeros((640,1280,3))` | `np.ndarray` | HWC |
-| torch | `torch.zeros(16,3,320,640)` | `torch.Tensor` | BCHW, RGB |
-| CSV | `'sources.csv'` | `str`, `Path` | RTSP, RTMP, HTTP |
-| video ✅ | `'vid.mp4'` | `str`, `Path` | |
-| directory ✅ | `'path/'` | `str`, `Path` | |
-| glob ✅ | `'path/*.jpg'` | `str` | Use `*` operator |
-| YouTube ✅ | `'https://youtu.be/Zgi9g1ksQHc'` | `str` | |
-| stream ✅ | `'rtsp://example.com/media.mp4'` | `str` | RTSP, RTMP, HTTP |
+ Use `stream=True` for processing long videos or large datasets to efficiently manage memory. When `stream=False`, the results for all frames or data points are stored in memory, which can quickly add up and cause out-of-memory errors for large inputs. In contrast, `stream=True` utilizes a generator, which only keeps the results of the current frame or data point in memory, significantly reducing memory consumption and preventing out-of-memory issues.
+
+| Source | Argument | Type | Notes |
+|-------------|--------------------------------------------|---------------------------------------|----------------------------------------------------------------------------|
+| image | `'image.jpg'` | `str` or `Path` | Single image file. |
+| URL | `'https://ultralytics.com/images/bus.jpg'` | `str` | URL to an image. |
+| screenshot | `'screen'` | `str` | Capture a screenshot. |
+| PIL | `Image.open('im.jpg')` | `PIL.Image` | HWC format with RGB channels. |
+| OpenCV | `cv2.imread('im.jpg')` | `np.ndarray` of `uint8 (0-255)` | HWC format with BGR channels. |
+| numpy | `np.zeros((640,1280,3))` | `np.ndarray` of `uint8 (0-255)` | HWC format with BGR channels. |
+| torch | `torch.zeros(16,3,320,640)` | `torch.Tensor` of `float32 (0.0-1.0)` | BCHW format with RGB channels. |
+| CSV | `'sources.csv'` | `str` or `Path` | CSV file containing paths to images, videos, or directories. |
+| video ✅ | `'video.mp4'` | `str` or `Path` | Video file in formats like MP4, AVI, etc. |
+| directory ✅ | `'path/'` | `str` or `Path` | Path to a directory containing images or videos. |
+| glob ✅ | `'path/*.jpg'` | `str` | Glob pattern to match multiple files. Use the `*` character as a wildcard. |
+| YouTube ✅ | `'https://youtu.be/Zgi9g1ksQHc'` | `str` | URL to a YouTube video. |
+| stream ✅ | `'rtsp://example.com/media.mp4'` | `str` | URL for streaming protocols such as RTSP, RTMP, or an IP address. |
+
+Below are code examples for using each source type:
+
+!!! example "Prediction sources"
+
+ === "image"
+ Run inference on an image file.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define path to the image file
+ source = 'path/to/image.jpg'
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "screenshot"
+ Run inference on the current screen content as a screenshot.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define current screenshot as source
+ source = 'screen'
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "URL"
+ Run inference on an image or video hosted remotely via URL.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define remote image or video URL
+ source = 'https://ultralytics.com/images/bus.jpg'
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "PIL"
+ Run inference on an image opened with Python Imaging Library (PIL).
+ ```python
+ from PIL import Image
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Open an image using PIL
+ source = Image.open('path/to/image.jpg')
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "OpenCV"
+ Run inference on an image read with OpenCV.
+ ```python
+ import cv2
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Read an image using OpenCV
+ source = cv2.imread('path/to/image.jpg')
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "numpy"
+ Run inference on an image represented as a numpy array.
+ ```python
+ import numpy as np
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Create a random numpy array of HWC shape (640, 640, 3) with values in range [0, 255] and type uint8
+ source = np.random.randint(low=0, high=255, size=(640, 640, 3), dtype='uint8')
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "torch"
+ Run inference on an image represented as a PyTorch tensor.
+ ```python
+ import torch
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32
+ source = torch.rand(1, 3, 640, 640, dtype=torch.float32)
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "CSV"
+ Run inference on a collection of images, URLs, videos and directories listed in a CSV file.
+ ```python
+ import torch
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define a path to a CSV file with images, URLs, videos and directories
+ source = 'path/to/file.csv'
+
+ # Run inference on the source
+ results = model(source) # list of Results objects
+ ```
+
+ === "video"
+ Run inference on a video file. By using `stream=True`, you can create a generator of Results objects to reduce memory usage.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define path to video file
+ source = 'path/to/video.mp4'
+
+ # Run inference on the source
+ results = model(source, stream=True) # generator of Results objects
+ ```
+
+ === "directory"
+ Run inference on all images and videos in a directory. To also capture images and videos in subdirectories use a glob pattern, i.e. `path/to/dir/**/*`.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define path to directory containing images and videos for inference
+ source = 'path/to/dir'
+
+ # Run inference on the source
+ results = model(source, stream=True) # generator of Results objects
+ ```
+
+ === "glob"
+ Run inference on all images and videos that match a glob expression with `*` characters.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define a glob search for all JPG files in a directory
+ source = 'path/to/dir/*.jpg'
+
+ # OR define a recursive glob search for all JPG files including subdirectories
+ source = 'path/to/dir/**/*.jpg'
+
+ # Run inference on the source
+ results = model(source, stream=True) # generator of Results objects
+ ```
+
+ === "YouTube"
+ Run inference on a YouTube video. By using `stream=True`, you can create a generator of Results objects to reduce memory usage for long videos.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define source as YouTube video URL
+ source = 'https://youtu.be/Zgi9g1ksQHc'
+
+ # Run inference on the source
+ results = model(source, stream=True) # generator of Results objects
+ ```
+
+ === "Stream"
+ Run inference on remote streaming sources using RTSP, RTMP, and IP address protocols.
+ ```python
+ from ultralytics import YOLO
+
+ # Load a pretrained YOLOv8n model
+ model = YOLO('yolov8n.pt')
+
+ # Define source as RTSP, RTMP or IP streaming address
+ source = 'rtsp://example.com/media.mp4'
+
+ # Run inference on the source
+ results = model(source, stream=True) # generator of Results objects
+ ```
-## Arguments
+## Inference Arguments
`model.predict` accepts multiple arguments that control the prediction operation. These arguments can be passed directly to `model.predict`:
!!! example
- ```
+ ```python
model.predict(source, save=True, imgsz=320, conf=0.5)
```
@@ -97,12 +316,12 @@ All supported arguments:
## Image and Video Formats
-YOLOv8 supports various image and video formats, as specified
-in [yolo/data/utils.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/yolo/data/utils.py). See the
-tables below for the valid suffixes and example predict commands.
+YOLOv8 supports various image and video formats, as specified in [yolo/data/utils.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/yolo/data/utils.py). See the tables below for the valid suffixes and example predict commands.
### Image Suffixes
+The below table contains valid Ultralytics image formats.
+
| Image Suffixes | Example Predict Command | Reference |
|----------------|----------------------------------|-------------------------------------------------------------------------------|
| .bmp | `yolo predict source=image.bmp` | [Microsoft BMP File Format](https://en.wikipedia.org/wiki/BMP_file_format) |
@@ -118,6 +337,8 @@ tables below for the valid suffixes and example predict commands.
### Video Suffixes
+The below table contains valid Ultralytics video formats.
+
| Video Suffixes | Example Predict Command | Reference |
|----------------|----------------------------------|----------------------------------------------------------------------------------|
| .asf | `yolo predict source=video.asf` | [Advanced Systems Format](https://en.wikipedia.org/wiki/Advanced_Systems_Format) |
diff --git a/docs/quickstart.md b/docs/quickstart.md
index b762e5a..6d0db2c 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -4,27 +4,65 @@ description: Install and use YOLOv8 via CLI or Python. Run single-line commands
keywords: YOLOv8, object detection, segmentation, classification, pip, git, CLI, Python
---
-## Install
+## Install Ultralytics
-Install YOLOv8 via the `ultralytics` pip package for the latest stable release or by cloning
-the [https://github.com/ultralytics/ultralytics](https://github.com/ultralytics/ultralytics) repository for the most
-up-to-date version.
+Ultralytics provides various installation methods including pip, conda, and Docker. Install YOLOv8 via the `ultralytics` pip package for the latest stable release or by cloning the [Ultralytics GitHub repository](https://github.com/ultralytics/ultralytics) for the most up-to-date version. Docker can be used to execute the package in an isolated container, avoiding local installation.
!!! example "Install"
- === "pip install (recommended)"
+ === "Pip install (recommended)"
+ Install the `ultralytics` package using pip, or update an existing installation by running `pip install -U ultralytics`. Visit the Python Package Index (PyPI) for more details on the `ultralytics` package: [https://pypi.org/project/ultralytics/](https://pypi.org/project/ultralytics/).
```bash
+ # Install the ultralytics package using pip
pip install ultralytics
```
-
- === "git clone (for development)"
+
+ === "Conda install"
+ Conda is an alternative package manager to pip which may also be used for installation. Visit Anaconda for more details at [https://anaconda.org/conda-forge/ultralytics](https://anaconda.org/conda-forge/ultralytics). Ultralytics feedstock repository for updating the conda package is at [https://github.com/conda-forge/ultralytics-feedstock/](https://github.com/conda-forge/ultralytics-feedstock/).
+ ```bash
+ # Install the ultralytics package using conda
+ conda install ultralytics
+ ```
+
+ === "Git clone"
+ Clone the `ultralytics` repository if you are interested in contributing to the development or wish to experiment with the latest source code. After cloning, navigate into the directory and install the package in editable mode `-e` using pip.
```bash
+ # Clone the ultralytics repository
git clone https://github.com/ultralytics/ultralytics
+
+ # Navigate to the cloned directory
cd ultralytics
+
+ # Install the package in editable mode for development
pip install -e .
```
-See the `ultralytics` [requirements.txt](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) file for a list of dependencies. Note that `pip` automatically installs all required dependencies.
+ === "Docker"
+ Utilize Docker to execute the `ultralytics` package in an isolated container. By employing the official `ultralytics` image from [Docker Hub](https://hub.docker.com/r/ultralytics/ultralytics), you can avoid local installation. Below are the commands to get the latest image and execute it:
+
+ ```bash
+ # Set image name as a variable
+ t=ultralytics/ultralytics:latest
+
+ # Pull the latest ultralytics image from Docker Hub
+ sudo docker pull $t
+
+ # Run the ultralytics image in a container with GPU support
+ sudo docker run -it --ipc=host --gpus all $t
+ ```
+
+ The above command initializes a Docker container with the latest `ultralytics` image. The `-it` flag assigns a pseudo-TTY and maintains stdin open, enabling you to interact with the container. The `--ipc=host` flag sets the IPC (Inter-Process Communication) namespace to the host, which is essential for sharing memory between processes. The `--gpus all` flag enables access to all available GPUs inside the container, which is crucial for tasks that require GPU computation.
+
+ Note: To work with files on your local machine within the container, use Docker volumes for mounting a local directory into the container:
+
+ ```bash
+ # Mount local directory to a directory inside the container
+ sudo docker run -it --ipc=host --gpus all -v /path/on/host:/path/in/container $t
+ ```
+
+ Alter `/path/on/host` with the directory path on your local machine, and `/path/in/container` with the desired path inside the Docker container for accessibility.
+
+See the `ultralytics` [requirements.txt](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) file for a list of dependencies. Note that all examples above install all required dependencies.
!!! tip "Tip"
@@ -34,9 +72,9 @@ See the `ultralytics` [requirements.txt](https://github.com/ultralytics/ultralyt
-## Use with CLI
+## Use Ultralytics with CLI
-The YOLO command line interface (CLI) allows for simple single-line commands without the need for a Python environment.
+The Ultralytics command line interface (CLI) allows for simple single-line commands without the need for a Python environment.
CLI requires no customization or Python code. You can simply run all tasks from the terminal with the `yolo` command. Check out the [CLI Guide](usage/cli.md) to learn more about using YOLOv8 from the command line.
!!! example
@@ -103,7 +141,7 @@ CLI requires no customization or Python code. You can simply run all tasks from
[CLI Guide](usage/cli.md){ .md-button .md-button--primary}
-## Use with Python
+## Use Ultralytics with Python
YOLOv8's Python interface allows for seamless integration into your Python projects, making it easy to load, run, and process the model's output. Designed with simplicity and ease of use in mind, the Python interface enables users to quickly implement object detection, segmentation, and classification in their projects. This makes YOLOv8's Python interface an invaluable tool for anyone looking to incorporate these functionalities into their Python projects.
diff --git a/tests/test_python.py b/tests/test_python.py
index f633bd6..10f24fd 100644
--- a/tests/test_python.py
+++ b/tests/test_python.py
@@ -6,6 +6,7 @@ import cv2
import numpy as np
import torch
from PIL import Image
+from torchvision.transforms import ToTensor
from ultralytics import RTDETR, YOLO
from ultralytics.yolo.data.build import load_inference_source
@@ -70,7 +71,7 @@ def test_predict_img():
# Test tensor inference
im = cv2.imread(str(SOURCE)) # OpenCV
t = cv2.resize(im, (32, 32))
- t = torch.from_numpy(t.transpose((2, 0, 1)))
+ t = ToTensor()(t)
t = torch.stack([t, t, t, t])
results = model(t, visualize=True)
assert len(results) == t.shape[0]
diff --git a/ultralytics/__init__.py b/ultralytics/__init__.py
index 4c48d71..bf1f0db 100644
--- a/ultralytics/__init__.py
+++ b/ultralytics/__init__.py
@@ -1,6 +1,6 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license
-__version__ = '8.0.121'
+__version__ = '8.0.122'
from ultralytics.hub import start
from ultralytics.vit.rtdetr import RTDETR
@@ -8,5 +8,6 @@ from ultralytics.vit.sam import SAM
from ultralytics.yolo.engine.model import YOLO
from ultralytics.yolo.nas import NAS
from ultralytics.yolo.utils.checks import check_yolo as checks
+from ultralytics.yolo.utils.downloads import download
-__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'RTDETR', 'checks', 'start' # allow simpler import
+__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'RTDETR', 'checks', 'start', 'download' # allow simpler import
diff --git a/ultralytics/yolo/data/dataloaders/stream_loaders.py b/ultralytics/yolo/data/dataloaders/stream_loaders.py
index ba18296..95b22bd 100644
--- a/ultralytics/yolo/data/dataloaders/stream_loaders.py
+++ b/ultralytics/yolo/data/dataloaders/stream_loaders.py
@@ -295,11 +295,19 @@ class LoadPilAndNumpy:
class LoadTensor:
def __init__(self, im0) -> None:
- self.im0 = im0
- self.bs = im0.shape[0]
+ self.im0 = self._single_check(im0)
+ self.bs = self.im0.shape[0]
self.mode = 'image'
self.paths = [getattr(im, 'filename', f'image{i}.jpg') for i, im in enumerate(im0)]
+ @staticmethod
+ def _single_check(im):
+ """Validate and format an image to numpy array."""
+ if len(im.shape) < 4:
+ LOGGER.warning('WARNING ⚠️ torch.Tensor inputs should be BCHW format, i.e. shape(1,3,640,640).')
+ im = im.unsqueeze(0)
+ return im
+
def __iter__(self):
"""Returns an iterator object."""
self.count = 0
diff --git a/ultralytics/yolo/engine/predictor.py b/ultralytics/yolo/engine/predictor.py
index 8cb6b87..a397db0 100644
--- a/ultralytics/yolo/engine/predictor.py
+++ b/ultralytics/yolo/engine/predictor.py
@@ -116,21 +116,23 @@ class BasePredictor:
"""Prepares input image before inference.
Args:
- im (torch.Tensor | List(np.ndarray)): (N, 3, h, w) for tensor, [(h, w, 3) x N] for list.
+ im (torch.Tensor | List(np.ndarray)): BCHW for tensor, [(HWC) x B] for list.
"""
- if not isinstance(im, torch.Tensor):
+ not_tensor = not isinstance(im, torch.Tensor)
+ if not_tensor:
im = np.stack(self.pre_transform(im))
im = im[..., ::-1].transpose((0, 3, 1, 2)) # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
im = np.ascontiguousarray(im) # contiguous
im = torch.from_numpy(im)
- # NOTE: assuming im with (b, 3, h, w) if it's a tensor
+
img = im.to(self.device)
img = img.half() if self.model.fp16 else img.float() # uint8 to fp16/32
- img /= 255 # 0 - 255 to 0.0 - 1.0
+ if not_tensor:
+ img /= 255 # 0 - 255 to 0.0 - 1.0
return img
def pre_transform(self, im):
- """Pre-tranform input image before inference.
+ """Pre-transform input image before inference.
Args:
im (List(np.ndarray)): (N, 3, h, w) for tensor, [(h, w, 3) x N] for list.
@@ -147,7 +149,7 @@ class BasePredictor:
log_string = ''
if len(im.shape) == 3:
im = im[None] # expand for batch dim
- if self.source_type.webcam or self.source_type.from_img: # batch_size >= 1
+ if self.source_type.webcam or self.source_type.from_img or self.source_type.tensor: # batch_size >= 1
log_string += f'{idx}: '
frame = self.dataset.count
else:
@@ -159,10 +161,11 @@ class BasePredictor:
log_string += result.verbose()
if self.args.save or self.args.show: # Add bbox to image
- plot_args = dict(line_width=self.args.line_width,
- boxes=self.args.boxes,
- conf=self.args.show_conf,
- labels=self.args.show_labels)
+ plot_args = {
+ 'line_width': self.args.line_width,
+ 'boxes': self.args.boxes,
+ 'conf': self.args.show_conf,
+ 'labels': self.args.show_labels}
if not self.args.retina_masks:
plot_args['im_gpu'] = im[idx]
self.plotted_img = result.plot(**plot_args)
@@ -214,17 +217,23 @@ class BasePredictor:
# Setup model
if not self.model:
self.setup_model(model)
+
# Setup source every time predict is called
self.setup_source(source if source is not None else self.args.source)
# Check if save_dir/ label file exists
if self.args.save or self.args.save_txt:
(self.save_dir / 'labels' if self.args.save_txt else self.save_dir).mkdir(parents=True, exist_ok=True)
+
# Warmup model
if not self.done_warmup:
self.model.warmup(imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, 3, *self.imgsz))
self.done_warmup = True
+ # Checks
+ if self.source_type.tensor and (self.args.save or self.args.save_txt or self.args.show):
+ LOGGER.warning("WARNING ⚠️ 'save', 'save_txt' and 'show' arguments not enabled for torch.Tensor inference.")
+
self.seen, self.windows, self.batch, profilers = 0, [], None, (ops.Profile(), ops.Profile(), ops.Profile())
self.run_callbacks('on_predict_start')
for batch in self.dataset:
@@ -255,11 +264,7 @@ class BasePredictor:
'preprocess': profilers[0].dt * 1E3 / n,
'inference': profilers[1].dt * 1E3 / n,
'postprocess': profilers[2].dt * 1E3 / n}
- if self.source_type.tensor: # skip write, show and plot operations if input is raw tensor
- if self.args.save or self.args.save_txt or self.args.show:
- LOGGER.warning('WARNING ⚠️ save, save_txt and show argument not enabled for tensor inference.')
- continue
- p, im0 = path[i], im0s[i].copy()
+ p, im0 = path[i], None if self.source_type.tensor else im0s[i].copy()
p = Path(p)
if self.args.verbose or self.args.save or self.args.save_txt or self.args.show:
@@ -286,7 +291,7 @@ class BasePredictor:
if self.args.verbose and self.seen:
t = tuple(x.t / self.seen * 1E3 for x in profilers) # speeds per image
LOGGER.info(f'Speed: %.1fms preprocess, %.1fms inference, %.1fms postprocess per image at shape '
- f'{(1, 3, *self.imgsz)}' % t)
+ f'{(1, 3, *im.shape[2:])}' % t)
if self.args.save or self.args.save_txt or self.args.save_crop:
nl = len(list(self.save_dir.glob('labels/*.txt'))) # number of labels
s = f"\n{nl} label{'s' * (nl > 1)} saved to {self.save_dir / 'labels'}" if self.args.save_txt else ''
diff --git a/ultralytics/yolo/engine/results.py b/ultralytics/yolo/engine/results.py
index 68e0de2..085752e 100644
--- a/ultralytics/yolo/engine/results.py
+++ b/ultralytics/yolo/engine/results.py
@@ -198,6 +198,10 @@ class Results(SimpleClass):
Returns:
(numpy.ndarray): A numpy array of the annotated image.
"""
+ if img is None and isinstance(self.orig_img, torch.Tensor):
+ LOGGER.warning('WARNING ⚠️ Results plotting is not supported for torch.Tensor image types.')
+ return
+
# Deprecation warn TODO: remove in 8.2
if 'show_conf' in kwargs:
deprecation_warn('show_conf', 'conf')
@@ -305,7 +309,7 @@ class Results(SimpleClass):
file_name (str | pathlib.Path): File name.
"""
if self.probs is not None:
- LOGGER.warning('Warning: Classify task do not support `save_crop`.')
+ LOGGER.warning('WARNING ⚠️ Classify task do not support `save_crop`.')
return
if isinstance(save_dir, str):
save_dir = Path(save_dir)