ultralytics 8.0.122 Fix torch.Tensor inference (#3363)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: krzysztof.gonia <4281421+kgonia@users.noreply.github.com>
2023-06-25 01:36:07 +02:00
parent 51d8cfa9c3
commit 682c9ef70f
16 changed files with 471 additions and 154 deletions
--- a/README.md
+++ b/README.md
@ -57,12 +57,14 @@ See below for a quickstart installation and usage example, and see the [YOLOv8 D
 <details open>
 <summary>Install</summary>
-Pip install the ultralytics package including all [requirements](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) in a [**Python>=3.7**](https://www.python.org/) environment with [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/).
+Pip install the ultralytics package including all [requirements](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) in a [**Python>=3.8**](https://www.python.org/) environment with [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/).
 ```bash
 pip install ultralytics
 ```
 For alternative installation methods including Conda, Docker, and Git, please refer to the [Ultralytics Quickstart Guide](https://docs.ultralytics.com/quickstart).
 </details>
 <details open>
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@ -57,12 +57,14 @@
 <details open>
 <summary>安装</summary>
-在一个 [**Python>=3.7**](https://www.python.org/) 环境中，使用 [**PyTorch>=1.7**](https://pytorch.org/get-started/locally/)，通过 pip 安装 ultralytics 软件包以及所有[依赖项](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt)。
+使用Pip在一个[**Python>=3.8**](https://www.python.org/)环境中安装`ultralytics`包，此环境还需包含[**PyTorch>=1.7**](https://pytorch.org/get-started/locally/)。这也会安装所有必要的[依赖项](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt)。
 ```bash
 pip install ultralytics
 ```
 如需使用包括Conda、Docker和Git在内的其他安装方法，请参考[Ultralytics快速入门指南](https://docs.ultralytics.com/quickstart)。
 </details>
 <details open>
--- a/docs/datasets/classify/index.md
+++ b/docs/datasets/classify/index.md
@ -102,4 +102,19 @@ In this example, the `train` directory contains subdirectories for each class in
 ## Supported Datasets
-TODO
+Ultralytics supports the following datasets with automatic download:
 * [Caltech 101](caltech101.md): A dataset containing images of 101 object categories for image classification tasks.
 * [Caltech 256](caltech256.md): An extended version of Caltech 101 with 256 object categories and more challenging images.
 * [CIFAR-10](cifar10.md): A dataset of 60K 32x32 color images in 10 classes, with 6K images per class.
 * [CIFAR-100](cifar100.md): An extended version of CIFAR-10 with 100 object categories and 600 images per class.
 * [Fashion-MNIST](fashion-mnist.md): A dataset consisting of 70,000 grayscale images of 10 fashion categories for image classification tasks.
 * [ImageNet](imagenet.md): A large-scale dataset for object detection and image classification with over 14 million images and 20,000 categories.
 * [ImageNet-10](imagenet10.md): A smaller subset of ImageNet with 10 categories for faster experimentation and testing.
 * [Imagenette](imagenette.md): A smaller subset of ImageNet that contains 10 easily distinguishable classes for quicker training and testing.
 * [Imagewoof](imagewoof.md): A more challenging subset of ImageNet containing 10 dog breed categories for image classification tasks.
 * [MNIST](mnist.md): A dataset of 70,000 grayscale images of handwritten digits for image classification tasks.
 ### Adding your own dataset
 If you have your own dataset and would like to use it for training classification models with Ultralytics, ensure that it follows the format specified above under "Dataset format" and then point your `data` argument to the dataset directory.
--- a/docs/datasets/detect/index.md
+++ b/docs/datasets/detect/index.md
@ -1,81 +1,53 @@
 ---
 comments: true
-description: Learn about supported dataset formats for training YOLO detection models, including Ultralytics YOLO and COCO, in this Object Detection Datasets Overview.
+description: Explore supported dataset formats for training YOLO detection models, including Ultralytics YOLO and COCO. This guide covers various dataset formats and their specific configurations for effective object detection training.
-keywords: object detection, datasets, formats, Ultralytics YOLO, label format, dataset file format, dataset definition, YOLO dataset, model configuration
+keywords: object detection, datasets, formats, Ultralytics YOLO, COCO, label format, dataset file format, dataset definition, YOLO dataset, model configuration
 ---
 # Object Detection Datasets Overview
 Training a robust and accurate object detection model requires a comprehensive dataset. This guide introduces various formats of datasets that are compatible with the Ultralytics YOLO model and provides insights into their structure, usage, and how to convert between different formats.
 ## Supported Dataset Formats
 ### Ultralytics YOLO format
-** Label Format **
+The Ultralytics YOLO format is a dataset configuration format that allows you to define the dataset root directory, the relative paths to training/validation/testing image directories or *.txt files containing image paths, and a dictionary of class names. Here is an example:
 The dataset format used for training YOLO detection models is as follows:
 1. One text file per image: Each image in the dataset has a corresponding text file with the same name as the image file and the ".txt" extension.
 2. One row per object: Each row in the text file corresponds to one object instance in the image.
 3. Object information per row: Each row contains the following information about the object instance:
    - Object class index: An integer representing the class of the object (e.g., 0 for person, 1 for car, etc.).
    - Object center coordinates: The x and y coordinates of the center of the object, normalized to be between 0 and 1.
    - Object width and height: The width and height of the object, normalized to be between 0 and 1.
 The format for a single row in the detection dataset file is as follows:
 ```
 <object-class> <x> <y> <width> <height>
 ```
 Here is an example of the YOLO dataset format for a single image with two object instances:
 ```
 0 0.5 0.4 0.3 0.6
 1 0.3 0.7 0.4 0.2
 ```
 In this example, the first object is of class 0 (person), with its center at (0.5, 0.4), width of 0.3, and height of 0.6. The second object is of class 1 (car), with its center at (0.3, 0.7), width of 0.4, and height of 0.2.
 ** Dataset file format **
 The Ultralytics framework uses a YAML file format to define the dataset and model configuration for training Detection Models. Here is an example of the YAML format used for defining a detection dataset:
 ```yaml
-train: <path-to-training-images>
+# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
-val: <path-to-validation-images>
+path: ../datasets/coco128  # dataset root dir
 train: images/train2017  # train images (relative to 'path') 128 images
 val: images/train2017  # val images (relative to 'path') 128 images
 test:  # test images (optional)
-nc: <number-of-classes>
+# Classes (80 COCO classes)
 names: [<class-1>, <class-2>, ..., <class-n>]
 ```
 The `train` and `val` fields specify the paths to the directories containing the training and validation images, respectively.
 The `nc` field specifies the number of object classes in the dataset.
 The `names` field is a list of the names of the object classes. The order of the names should match the order of the object class indices in the YOLO dataset files.
 NOTE: Either `nc` or `names` must be defined. Defining both are not mandatory
 Alternatively, you can directly define class names like this:
 ```yaml
 names:
  0: person
  1: bicycle
  2: car
  ...
  77: teddy bear
  78: hair drier
  79: toothbrush
 ```
-** Example **
+Labels for this format should be exported to YOLO format with one `*.txt` file per image. If there are no objects in an image, no `*.txt` file is required. The `*.txt` file should be formatted with one row per object in `class x_center y_center width height` format. Box coordinates must be in **normalized xywh** format (from 0 - 1). If your boxes are in pixels, you should divide `x_center` and `width` by image width, and `y_center` and `height` by image height. Class numbers should be zero-indexed (start with 0).
-```yaml
+<p align="center"><img width="750" src="https://user-images.githubusercontent.com/26833433/91506361-c7965000-e886-11ea-8291-c72b98c25eec.jpg"></p>
 train: data/train/
 val: data/val/
-nc: 2
+The label file corresponding to the above image contains 2 persons (class `0`) and a tie (class `27`):
-names: ['person', 'car']
+
-```
+<p align="center"><img width="428" src="https://user-images.githubusercontent.com/26833433/112467037-d2568c00-8d66-11eb-8796-55402ac0d62f.png"></p>
 When using the Ultralytics YOLO format, organize your training and validation images and labels as shown in the example below.
 <p align="center"><img width="700" src="https://user-images.githubusercontent.com/26833433/134436012-65111ad1-9541-4853-81a6-f19a3468b75f.png"></p>
 ## Usage
 Here's how you can use these formats to train your model:
 !!! example ""
    === "Python"
@ -98,14 +70,34 @@ names: ['person', 'car']
 ## Supported Datasets
-TODO
+Here is a list of the supported datasets and a brief description for each:
-## Port or Convert label formats
+- [**Argoverse**](./argoverse.md): A collection of sensor data collected from autonomous vehicles. It contains 3D tracking annotations for car objects.
 - [**COCO**](./coco.md): Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories.
 - [**COCO8**](./coco8.md): A smaller subset of the COCO dataset, COCO8 is more lightweight and faster to train.
 - [**GlobalWheat2020**](./globalwheat2020.md): A dataset containing images of wheat heads for the Global Wheat Challenge 2020.
 - [**Objects365**](./objects365.md): A large-scale object detection dataset with 365 object categories and 600k images, aimed at advancing object detection research.
 - [**SKU-110K**](./sku-110k.md): A dataset containing images of densely packed retail products, intended for retail environment object detection.
 - [**VisDrone**](./visdrone.md): A dataset focusing on drone-based images, containing various object categories like cars, pedestrians, and cyclists.
 - [**VOC**](./voc.md): PASCAL VOC is a popular object detection dataset with 20 object categories including vehicles, animals, and furniture.
 - [**xView**](./xview.md): A dataset containing high-resolution satellite imagery, designed for the detection of various object classes in overhead views.
-### COCO dataset format to YOLO format
+### Adding your own dataset
 If you have your own dataset and would like to use it for training detection models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.
 ## Port or Convert Label Formats
 ### COCO Dataset Format to YOLO Format
 You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet:
 ```python
 from ultralytics.yolo.data.converter import convert_coco
 convert_coco(labels_dir='../coco/annotations/')
 ```
 This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format.
 Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models.
--- a/docs/datasets/detect/sku-110k.md
+++ b/docs/datasets/detect/sku-110k.md
@ -61,6 +61,7 @@ To train a YOLOv8n model on the SKU-110K dataset for 100 epochs with an image si
        ```bash
        # Start training from a pretrained *.pt model
        yolo detect train data=SKU-110K.yaml model=yolov8n.pt epochs=100 imgsz=640
        ```
 ## Sample Data and Annotations
--- a/docs/datasets/detect/visdrone.md
+++ b/docs/datasets/detect/visdrone.md
@ -10,22 +10,6 @@ The [VisDrone Dataset](https://github.com/VisDrone/VisDrone-Dataset) is a large-
 VisDrone is composed of 288 video clips with 261,908 frames and 10,209 static images, captured by various drone-mounted cameras. The dataset covers a wide range of aspects, including location (14 different cities across China), environment (urban and rural), objects (pedestrians, vehicles, bicycles, etc.), and density (sparse and crowded scenes). The dataset was collected using various drone platforms under different scenarios and weather and lighting conditions. These frames are manually annotated with over 2.6 million bounding boxes of targets such as pedestrians, cars, bicycles, and tricycles. Attributes like scene visibility, object class, and occlusion are also provided for better data utilization.
 ## Citation
 If you use the VisDrone dataset in your research or development work, please cite the following paper:
 ```bibtex
@ARTICLE{9573394,
  author={Zhu, Pengfei and Wen, Longyin and Du, Dawei and Bian, Xiao and Fan, Heng and Hu, Qinghua and Ling, Haibin},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Detection and Tracking Meet Drones Challenge}, 
  year={2021},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TPAMI.2021.3119563}}
 ```
 ## Dataset Structure
 The VisDrone dataset is organized into five main subsets, each focusing on a specific task:
--- a/docs/datasets/pose/index.md
+++ b/docs/datasets/pose/index.md
@ -111,14 +111,40 @@ flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
 ## Supported Datasets
-TODO
+This section outlines the datasets that are compatible with Ultralytics YOLO format and can be used for training pose estimation models:
-## Port or Convert label formats
+### COCO-Pose
-### COCO dataset format to YOLO format
+- **Description**: COCO-Pose is a large-scale object detection, segmentation, and pose estimation dataset. It is a subset of the popular COCO dataset and focuses on human pose estimation. COCO-Pose includes multiple keypoints for each human instance.
 - **Label Format**: Same as Ultralytics YOLO format as described above, with keypoints for human poses.
 - **Number of Classes**: 1 (Human).
 - **Keypoints**: 17 keypoints including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.
 - **Usage**: Suitable for training human pose estimation models.
 - **Additional Notes**: The dataset is rich and diverse, containing over 200k labeled images.
 - [Read more about COCO-Pose](./coco.md)
 ### COCO8-Pose
 - **Description**: [Ultralytics](https://ultralytics.com) COCO8-Pose is a small, but versatile pose detection dataset composed of the first 8 images of the COCO train 2017 set, 4 for training and 4 for validation.
 - **Label Format**: Same as Ultralytics YOLO format as described above, with keypoints for human poses.
 - **Number of Classes**: 1 (Human).
 - **Keypoints**: 17 keypoints including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.
 - **Usage**: Suitable for testing and debugging object detection models, or for experimenting with new detection approaches.
 - **Additional Notes**: COCO8-Pose is ideal for sanity checks and CI checks.
 - [Read more about COCO8-Pose](./coco8-pose.md)
 ### Adding your own dataset
 If you have your own dataset and would like to use it for training pose estimation models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.
 ### Conversion Tool
 Ultralytics provides a convenient conversion tool to convert labels from the popular COCO dataset format to YOLO format:
 ```python
 from ultralytics.yolo.data.converter import convert_coco
 convert_coco(labels_dir='../coco/annotations/', use_keypoints=True)
 ```
 This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format. The `use_keypoints` parameter specifies whether to include keypoints (for pose estimation) in the converted labels.
--- a/docs/datasets/segment/index.md
+++ b/docs/datasets/segment/index.md
@ -46,7 +46,7 @@ train: <path-to-training-images>
 val: <path-to-validation-images>
 nc: <number-of-classes>
-names: [ <class-1>, <class-2>, ..., <class-n> ]
+names: [<class-1>, <class-2>, ..., <class-n>]
 ```
@ -73,7 +73,7 @@ train: data/train/
 val: data/val/
 nc: 2
-names: [ 'person', 'car' ]
+names: ['person', 'car']
 ```
 ## Usage
@ -100,9 +100,18 @@ names: [ 'person', 'car' ]
 ## Supported Datasets
-## Port or Convert label formats
+* [COCO](coco.md): A large-scale dataset designed for object detection, segmentation, and captioning tasks with over 200K labeled images.
 * [COCO8-seg](coco8-seg.md): A smaller dataset for instance segmentation tasks, containing a subset of 8 COCO images with segmentation annotations.
-### COCO dataset format to YOLO format
+### Adding your own dataset
 If you have your own dataset and would like to use it for training segmentation models with Ultralytics YOLO format, ensure that it follows the format specified above under "Ultralytics YOLO format". Convert your annotations to the required format and specify the paths, number of classes, and class names in the YAML configuration file.
 ## Port or Convert Label Formats
 ### COCO Dataset Format to YOLO Format
 You can easily convert labels from the popular COCO dataset format to the YOLO format using the following code snippet:
 ```python
 from ultralytics.yolo.data.converter import convert_coco
@ -110,6 +119,10 @@ from ultralytics.yolo.data.converter import convert_coco
 convert_coco(labels_dir='../coco/annotations/', use_segments=True)
 ```
 This conversion tool can be used to convert the COCO dataset or any dataset in the COCO format to the Ultralytics YOLO format.
 Remember to double-check if the dataset you want to use is compatible with your model and follows the necessary format conventions. Properly formatted datasets are crucial for training successful object detection models.
 ## Auto-Annotation
 Auto-annotation is an essential feature that allows you to generate a segmentation dataset using a pre-trained detection model. It enables you to quickly and accurately annotate a large number of images without the need for manual labeling, saving time and effort.
--- a/docs/models/index.md
+++ b/docs/models/index.md
@ -20,21 +20,25 @@ In this documentation, we provide information on four major models:
 8. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
 9. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
-You can use these models directly in the Command Line Interface (CLI) or in a Python environment. Below are examples of how to use the models with CLI and Python:
+You can use many of these models directly in the Command Line Interface (CLI) or in a Python environment. Below are examples of how to use the models with CLI and Python:
 ## CLI Example
 Use the `model` argument to pass a model YAML such as `model=yolov8n.yaml` or a pretrained *.pt file such as `model=yolov8n.pt`
 ```bash
-yolo task=detect mode=train model=yolov8n.yaml data=coco128.yaml epochs=100
+yolo task=detect mode=train model=yolov8n.pt data=coco128.yaml epochs=100
 ```
 ## Python Example
 PyTorch pretrained models as well as model YAML files can also be passed to the `YOLO()`, `SAM()`, `NAS()` and `RTDETR()` classes to create a model instance in python:
 ```python
 from ultralytics import YOLO
-model = YOLO("model.yaml")  # build a YOLOv8n model from scratch
+model = YOLO("yolov8n.pt")  # load a pretrained YOLOv8n model
-# YOLO("model.pt")  use pre-trained model if available
+
 model.info()  # display model information
 model.train(data="coco128.yaml", epochs=100)  # train the model
 ```
--- a/docs/modes/predict.md
+++ b/docs/modes/predict.md
@ -1,6 +1,6 @@
 ---
 comments: true
-description: Get started with YOLOv8 Predict mode and input sources. Accepts various input sources such as images, videos, and directories.
+description: Get started with YOLOv8 Predict mode and input sources. Accepts various input sources such as images, videos, and directories.
 keywords: YOLOv8, predict mode, generator, streaming mode, input sources, video formats, arguments customization
 ---
@ -12,60 +12,279 @@ passing `stream=True` in the predictor's call method.
 !!! example "Predict"
-    === "Return a list with `Stream=False`"
+    === "Return a list with `stream=False`"
        ```python
-        inputs = [img, img]  # list of numpy arrays
+        from ultralytics import YOLO
        results = model(inputs)  # list of Results objects
        # Load a model
        model = YOLO('yolov8n.pt')  # pretrained YOLOv8n model
        # Run batched inference on a list of images
        results = model(['im1.jpg', 'im2.jpg'])  # return a list of Results objects
        # Process results list
        for result in results:
            boxes = result.boxes  # Boxes object for bbox outputs
            masks = result.masks  # Masks object for segmentation masks outputs
            keypoints = result.keypoints  # Keypoints object for pose outputs
            probs = result.probs  # Class probabilities for classification outputs
        ```
-    === "Return a generator with `Stream=True`"
+    === "Return a generator with `stream=True`"
        ```python
-        inputs = [img, img]  # list of numpy arrays
+        from ultralytics import YOLO
        results = model(inputs, stream=True)  # generator of Results objects
        # Load a model
        model = YOLO('yolov8n.pt')  # pretrained YOLOv8n model
        # Run batched inference on a list of images
        results = model(['im1.jpg', 'im2.jpg'], stream=True)  # return a generator of Results objects
        # Process results generator
        for result in results:
            boxes = result.boxes  # Boxes object for bbox outputs
            masks = result.masks  # Masks object for segmentation masks outputs
            keypoints = result.keypoints  # Keypoints object for pose outputs
            probs = result.probs  # Class probabilities for classification outputs
        ```
 ## Inference Sources
 YOLOv8 can process different types of input sources for inference, as shown in the table below. The sources include static images, video streams, and various data formats. The table also indicates whether each source can be used in streaming mode with the argument `stream=True` ✅. Streaming mode is beneficial for processing videos or live streams as it creates a generator of results instead of loading all frames into memory.
 !!! tip "Tip"
-    Streaming mode with `stream=True` should be used for long videos or large predict sources, otherwise results will accumuate in memory and will eventually cause out-of-memory errors. 
+    Use `stream=True` for processing long videos or large datasets to efficiently manage memory. When `stream=False`, the results for all frames or data points are stored in memory, which can quickly add up and cause out-of-memory errors for large inputs. In contrast, `stream=True` utilizes a generator, which only keeps the results of the current frame or data point in memory, significantly reducing memory consumption and preventing out-of-memory issues.
-## Sources
+| Source      | Argument                                   | Type                                  | Notes                                                                      |
 |-------------|--------------------------------------------|---------------------------------------|----------------------------------------------------------------------------|
 | image       | `'image.jpg'`                              | `str` or `Path`                       | Single image file.                                                         |
 | URL         | `'https://ultralytics.com/images/bus.jpg'` | `str`                                 | URL to an image.                                                           |
 | screenshot  | `'screen'`                                 | `str`                                 | Capture a screenshot.                                                      |
 | PIL         | `Image.open('im.jpg')`                     | `PIL.Image`                           | HWC format with RGB channels.                                              |
 | OpenCV      | `cv2.imread('im.jpg')`                     | `np.ndarray` of `uint8 (0-255)`       | HWC format with BGR channels.                                              |
 | numpy       | `np.zeros((640,1280,3))`                   | `np.ndarray` of `uint8 (0-255)`       | HWC format with BGR channels.                                              |
 | torch       | `torch.zeros(16,3,320,640)`                | `torch.Tensor` of `float32 (0.0-1.0)` | BCHW format with RGB channels.                                             |
 | CSV         | `'sources.csv'`                            | `str` or `Path`                       | CSV file containing paths to images, videos, or directories.               |       
 | video ✅     | `'video.mp4'`                              | `str` or `Path`                       | Video file in formats like MP4, AVI, etc.                                  |
 | directory ✅ | `'path/'`                                  | `str` or `Path`                       | Path to a directory containing images or videos.                           |
 | glob ✅      | `'path/*.jpg'`                             | `str`                                 | Glob pattern to match multiple files. Use the `*` character as a wildcard. |
 | YouTube ✅   | `'https://youtu.be/Zgi9g1ksQHc'`           | `str`                                 | URL to a YouTube video.                                                    |
 | stream ✅    | `'rtsp://example.com/media.mp4'`           | `str`                                 | URL for streaming protocols such as RTSP, RTMP, or an IP address.          |
-YOLOv8 can accept various input sources, as shown in the table below. This includes images, URLs, PIL images, OpenCV,
+Below are code examples for using each source type:
 numpy arrays, torch tensors, CSV files, videos, directories, globs, YouTube videos, and streams. The table indicates
 whether each source can be used in streaming mode with `stream=True` ✅ and an example argument for each source.
-| source      | model(arg)                                 | type           | notes            |
+!!! example "Prediction sources"
 |-------------|--------------------------------------------|----------------|------------------|
 | image       | `'im.jpg'`                                 | `str`, `Path`  |                  |
 | URL         | `'https://ultralytics.com/images/bus.jpg'` | `str`          |                  |
 | screenshot  | `'screen'`                                 | `str`          |                  |
 | PIL         | `Image.open('im.jpg')`                     | `PIL.Image`    | HWC, RGB         |
 | OpenCV      | `cv2.imread('im.jpg')`                     | `np.ndarray`   | HWC, BGR         |
 | numpy       | `np.zeros((640,1280,3))`                   | `np.ndarray`   | HWC              |
 | torch       | `torch.zeros(16,3,320,640)`                | `torch.Tensor` | BCHW, RGB        |
 | CSV         | `'sources.csv'`                            | `str`, `Path`  | RTSP, RTMP, HTTP |         
 | video ✅     | `'vid.mp4'`                                | `str`, `Path`  |                  |
 | directory ✅ | `'path/'`                                  | `str`, `Path`  |                  |
 | glob ✅      | `'path/*.jpg'`                             | `str`          | Use `*` operator |
 | YouTube ✅   | `'https://youtu.be/Zgi9g1ksQHc'`           | `str`          |                  |
 | stream ✅    | `'rtsp://example.com/media.mp4'`           | `str`          | RTSP, RTMP, HTTP |
-## Arguments
+    === "image"
        Run inference on an image file. 
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define path to the image file
        source = 'path/to/image.jpg'
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "screenshot"
        Run inference on the current screen content as a screenshot.
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define current screenshot as source
        source = 'screen'
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "URL"
        Run inference on an image or video hosted remotely via URL.
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define remote image or video URL
        source = 'https://ultralytics.com/images/bus.jpg'
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "PIL"
        Run inference on an image opened with Python Imaging Library (PIL).
        ```python
        from PIL import Image
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Open an image using PIL
        source = Image.open('path/to/image.jpg')
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "OpenCV"
        Run inference on an image read with OpenCV.
        ```python
        import cv2
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Read an image using OpenCV
        source = cv2.imread('path/to/image.jpg')
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "numpy"
        Run inference on an image represented as a numpy array.
        ```python
        import numpy as np
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Create a random numpy array of HWC shape (640, 640, 3) with values in range [0, 255] and type uint8
        source = np.random.randint(low=0, high=255, size=(640, 640, 3), dtype='uint8')
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "torch"
        Run inference on an image represented as a PyTorch tensor.
        ```python
        import torch
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Create a random torch tensor of BCHW shape (1, 3, 640, 640) with values in range [0, 1] and type float32
        source = torch.rand(1, 3, 640, 640, dtype=torch.float32)
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "CSV"
        Run inference on a collection of images, URLs, videos and directories listed in a CSV file.
        ```python
        import torch
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define a path to a CSV file with images, URLs, videos and directories
        source = 'path/to/file.csv'
        # Run inference on the source
        results = model(source)  # list of Results objects
        ```
    === "video"
        Run inference on a video file. By using `stream=True`, you can create a generator of Results objects to reduce memory usage.
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define path to video file
        source = 'path/to/video.mp4'
        # Run inference on the source
        results = model(source, stream=True)  # generator of Results objects
        ```
    === "directory"
        Run inference on all images and videos in a directory. To also capture images and videos in subdirectories use a glob pattern, i.e. `path/to/dir/**/*`.
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define path to directory containing images and videos for inference
        source = 'path/to/dir'
        # Run inference on the source
        results = model(source, stream=True)  # generator of Results objects
        ```
    === "glob"
        Run inference on all images and videos that match a glob expression with `*` characters.
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define a glob search for all JPG files in a directory
        source = 'path/to/dir/*.jpg'
        # OR define a recursive glob search for all JPG files including subdirectories
        source = 'path/to/dir/**/*.jpg'
        # Run inference on the source
        results = model(source, stream=True)  # generator of Results objects
        ```
    === "YouTube"
        Run inference on a YouTube video. By using `stream=True`, you can create a generator of Results objects to reduce memory usage for long videos.
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define source as YouTube video URL
        source = 'https://youtu.be/Zgi9g1ksQHc'
        # Run inference on the source
        results = model(source, stream=True)  # generator of Results objects
        ```
    === "Stream"
        Run inference on remote streaming sources using RTSP, RTMP, and IP address protocols.
        ```python
        from ultralytics import YOLO
        # Load a pretrained YOLOv8n model
        model = YOLO('yolov8n.pt')
        # Define source as RTSP, RTMP or IP streaming address
        source = 'rtsp://example.com/media.mp4'
        # Run inference on the source
        results = model(source, stream=True)  # generator of Results objects
        ```
 ## Inference Arguments
 `model.predict` accepts multiple arguments that control the prediction operation. These arguments can be passed directly to `model.predict`:
 !!! example
-    ```
+    ```python
    model.predict(source, save=True, imgsz=320, conf=0.5)
    ```
@ -97,12 +316,12 @@ All supported arguments:
 ## Image and Video Formats
-YOLOv8 supports various image and video formats, as specified
+YOLOv8 supports various image and video formats, as specified in [yolo/data/utils.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/yolo/data/utils.py). See the tables below for the valid suffixes and example predict commands.
 in [yolo/data/utils.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/yolo/data/utils.py). See the
 tables below for the valid suffixes and example predict commands.
 ### Image Suffixes
 The below table contains valid Ultralytics image formats.
 | Image Suffixes | Example Predict Command          | Reference                                                                     |
 |----------------|----------------------------------|-------------------------------------------------------------------------------|
 | .bmp           | `yolo predict source=image.bmp`  | [Microsoft BMP File Format](https://en.wikipedia.org/wiki/BMP_file_format)    |
@ -118,6 +337,8 @@ tables below for the valid suffixes and example predict commands.
 ### Video Suffixes
 The below table contains valid Ultralytics video formats.
 | Video Suffixes | Example Predict Command          | Reference                                                                        |
 |----------------|----------------------------------|----------------------------------------------------------------------------------|
 | .asf           | `yolo predict source=video.asf`  | [Advanced Systems Format](https://en.wikipedia.org/wiki/Advanced_Systems_Format) |
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@ -4,27 +4,65 @@ description: Install and use YOLOv8 via CLI or Python. Run single-line commands
 keywords: YOLOv8, object detection, segmentation, classification, pip, git, CLI, Python
 ---
-## Install
+## Install Ultralytics
-Install YOLOv8 via the `ultralytics` pip package for the latest stable release or by cloning
+Ultralytics provides various installation methods including pip, conda, and Docker. Install YOLOv8 via the `ultralytics` pip package for the latest stable release or by cloning the [Ultralytics GitHub repository](https://github.com/ultralytics/ultralytics) for the most up-to-date version. Docker can be used to execute the package in an isolated container, avoiding local installation.
 the [https://github.com/ultralytics/ultralytics](https://github.com/ultralytics/ultralytics) repository for the most
 up-to-date version.
 !!! example "Install"
-    === "pip install (recommended)"
+    === "Pip install (recommended)"
        Install the `ultralytics` package using pip, or update an existing installation by running `pip install -U ultralytics`. Visit the Python Package Index (PyPI) for more details on the `ultralytics` package: [https://pypi.org/project/ultralytics/](https://pypi.org/project/ultralytics/).
        ```bash
        # Install the ultralytics package using pip
        pip install ultralytics
        ```
-    === "git clone (for development)"
+    === "Conda install"
        Conda is an alternative package manager to pip which may also be used for installation. Visit Anaconda for more details at [https://anaconda.org/conda-forge/ultralytics](https://anaconda.org/conda-forge/ultralytics). Ultralytics feedstock repository for updating the conda package is at [https://github.com/conda-forge/ultralytics-feedstock/](https://github.com/conda-forge/ultralytics-feedstock/).
        ```bash
        # Install the ultralytics package using conda
        conda install ultralytics
        ```
    === "Git clone"
        Clone the `ultralytics` repository if you are interested in contributing to the development or wish to experiment with the latest source code. After cloning, navigate into the directory and install the package in editable mode `-e` using pip.
        ```bash
        # Clone the ultralytics repository
        git clone https://github.com/ultralytics/ultralytics
        # Navigate to the cloned directory
        cd ultralytics
        # Install the package in editable mode for development
        pip install -e .
        ```
-See the `ultralytics` [requirements.txt](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) file for a list of dependencies. Note that `pip` automatically installs all required dependencies.
+    === "Docker"
        Utilize Docker to execute the `ultralytics` package in an isolated container. By employing the official `ultralytics` image from [Docker Hub](https://hub.docker.com/r/ultralytics/ultralytics), you can avoid local installation. Below are the commands to get the latest image and execute it:
        ```bash
        # Set image name as a variable
        t=ultralytics/ultralytics:latest
        # Pull the latest ultralytics image from Docker Hub
        sudo docker pull $t
        # Run the ultralytics image in a container with GPU support
        sudo docker run -it --ipc=host --gpus all $t
        ```
        The above command initializes a Docker container with the latest `ultralytics` image. The `-it` flag assigns a pseudo-TTY and maintains stdin open, enabling you to interact with the container. The `--ipc=host` flag sets the IPC (Inter-Process Communication) namespace to the host, which is essential for sharing memory between processes. The `--gpus all` flag enables access to all available GPUs inside the container, which is crucial for tasks that require GPU computation.
        Note: To work with files on your local machine within the container, use Docker volumes for mounting a local directory into the container:
        ```bash
        # Mount local directory to a directory inside the container
        sudo docker run -it --ipc=host --gpus all -v /path/on/host:/path/in/container $t
        ```
        Alter `/path/on/host` with the directory path on your local machine, and `/path/in/container` with the desired path inside the Docker container for accessibility.
 See the `ultralytics` [requirements.txt](https://github.com/ultralytics/ultralytics/blob/main/requirements.txt) file for a list of dependencies. Note that all examples above install all required dependencies.
 !!! tip "Tip"
@ -34,9 +72,9 @@ See the `ultralytics` [requirements.txt](https://github.com/ultralytics/ultralyt
        <img width="800" alt="PyTorch Installation Instructions" src="https://user-images.githubusercontent.com/26833433/228650108-ab0ec98a-b328-4f40-a40d-95355e8a84e3.png">
    </a>
-## Use with CLI
+## Use Ultralytics with CLI
-The YOLO command line interface (CLI) allows for simple single-line commands without the need for a Python environment.
+The Ultralytics command line interface (CLI) allows for simple single-line commands without the need for a Python environment.
 CLI requires no customization or Python code. You can simply run all tasks from the terminal with the `yolo` command. Check out the [CLI Guide](usage/cli.md) to learn more about using YOLOv8 from the command line.
 !!! example
@ -103,7 +141,7 @@ CLI requires no customization or Python code. You can simply run all tasks from
 [CLI Guide](usage/cli.md){ .md-button .md-button--primary}
-## Use with Python
+## Use Ultralytics with Python
 YOLOv8's Python interface allows for seamless integration into your Python projects, making it easy to load, run, and process the model's output. Designed with simplicity and ease of use in mind, the Python interface enables users to quickly implement object detection, segmentation, and classification in their projects. This makes YOLOv8's Python interface an invaluable tool for anyone looking to incorporate these functionalities into their Python projects.
--- a/tests/test_python.py
+++ b/tests/test_python.py
@ -6,6 +6,7 @@ import cv2
 import numpy as np
 import torch
 from PIL import Image
 from torchvision.transforms import ToTensor
 from ultralytics import RTDETR, YOLO
 from ultralytics.yolo.data.build import load_inference_source
@ -70,7 +71,7 @@ def test_predict_img():
    # Test tensor inference
    im = cv2.imread(str(SOURCE))  # OpenCV
    t = cv2.resize(im, (32, 32))
-    t = torch.from_numpy(t.transpose((2, 0, 1)))
+    t = ToTensor()(t)
    t = torch.stack([t, t, t, t])
    results = model(t, visualize=True)
    assert len(results) == t.shape[0]
--- a/ultralytics/init.py
+++ b/ultralytics/init.py
@ -1,6 +1,6 @@
 # Ultralytics YOLO 🚀, AGPL-3.0 license
-__version__ = '8.0.121'
+__version__ = '8.0.122'
 from ultralytics.hub import start
 from ultralytics.vit.rtdetr import RTDETR
@ -8,5 +8,6 @@ from ultralytics.vit.sam import SAM
 from ultralytics.yolo.engine.model import YOLO
 from ultralytics.yolo.nas import NAS
 from ultralytics.yolo.utils.checks import check_yolo as checks
 from ultralytics.yolo.utils.downloads import download
-__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'RTDETR', 'checks', 'start'  # allow simpler import
+__all__ = '__version__', 'YOLO', 'NAS', 'SAM', 'RTDETR', 'checks', 'start', 'download'  # allow simpler import
--- a/ultralytics/yolo/data/dataloaders/stream_loaders.py
+++ b/ultralytics/yolo/data/dataloaders/stream_loaders.py
@ -295,11 +295,19 @@ class LoadPilAndNumpy:
 class LoadTensor:
    def __init__(self, im0) -> None:
-        self.im0 = im0
+        self.im0 = self._single_check(im0)
-        self.bs = im0.shape[0]
+        self.bs = self.im0.shape[0]
        self.mode = 'image'
        self.paths = [getattr(im, 'filename', f'image{i}.jpg') for i, im in enumerate(im0)]
    @staticmethod
    def _single_check(im):
        """Validate and format an image to numpy array."""
        if len(im.shape) < 4:
            LOGGER.warning('WARNING ⚠️ torch.Tensor inputs should be BCHW format, i.e. shape(1,3,640,640).')
            im = im.unsqueeze(0)
        return im
    def __iter__(self):
        """Returns an iterator object."""
        self.count = 0
--- a/ultralytics/yolo/engine/predictor.py
+++ b/ultralytics/yolo/engine/predictor.py
@ -116,21 +116,23 @@ class BasePredictor:
        """Prepares input image before inference.
        Args:
-            im (torch.Tensor | List(np.ndarray)): (N, 3, h, w) for tensor, [(h, w, 3) x N] for list.
+            im (torch.Tensor | List(np.ndarray)): BCHW for tensor, [(HWC) x B] for list.
        """
-        if not isinstance(im, torch.Tensor):
+        not_tensor = not isinstance(im, torch.Tensor)
        if not_tensor:
            im = np.stack(self.pre_transform(im))
            im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
            im = np.ascontiguousarray(im)  # contiguous
            im = torch.from_numpy(im)
-        # NOTE: assuming im with (b, 3, h, w) if it's a tensor
+
        img = im.to(self.device)
        img = img.half() if self.model.fp16 else img.float()  # uint8 to fp16/32
        if not_tensor:
            img /= 255  # 0 - 255 to 0.0 - 1.0
        return img
    def pre_transform(self, im):
-        """Pre-tranform input image before inference.
+        """Pre-transform input image before inference.
        Args:
            im (List(np.ndarray)): (N, 3, h, w) for tensor, [(h, w, 3) x N] for list.
@ -147,7 +149,7 @@ class BasePredictor:
        log_string = ''
        if len(im.shape) == 3:
            im = im[None]  # expand for batch dim
-        if self.source_type.webcam or self.source_type.from_img:  # batch_size >= 1
+        if self.source_type.webcam or self.source_type.from_img or self.source_type.tensor:  # batch_size >= 1
            log_string += f'{idx}: '
            frame = self.dataset.count
        else:
@ -159,10 +161,11 @@ class BasePredictor:
        log_string += result.verbose()
        if self.args.save or self.args.show:  # Add bbox to image
-            plot_args = dict(line_width=self.args.line_width,
+            plot_args = {
-                             boxes=self.args.boxes,
+                'line_width': self.args.line_width,
-                             conf=self.args.show_conf,
+                'boxes': self.args.boxes,
-                             labels=self.args.show_labels)
+                'conf': self.args.show_conf,
                'labels': self.args.show_labels}
            if not self.args.retina_masks:
                plot_args['im_gpu'] = im[idx]
            self.plotted_img = result.plot(**plot_args)
@ -214,17 +217,23 @@ class BasePredictor:
        # Setup model
        if not self.model:
            self.setup_model(model)
        # Setup source every time predict is called
        self.setup_source(source if source is not None else self.args.source)
        # Check if save_dir/ label file exists
        if self.args.save or self.args.save_txt:
            (self.save_dir / 'labels' if self.args.save_txt else self.save_dir).mkdir(parents=True, exist_ok=True)
        # Warmup model
        if not self.done_warmup:
            self.model.warmup(imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, 3, *self.imgsz))
            self.done_warmup = True
        # Checks
        if self.source_type.tensor and (self.args.save or self.args.save_txt or self.args.show):
            LOGGER.warning("WARNING ⚠️ 'save', 'save_txt' and 'show' arguments not enabled for torch.Tensor inference.")
        self.seen, self.windows, self.batch, profilers = 0, [], None, (ops.Profile(), ops.Profile(), ops.Profile())
        self.run_callbacks('on_predict_start')
        for batch in self.dataset:
@ -255,11 +264,7 @@ class BasePredictor:
                    'preprocess': profilers[0].dt * 1E3 / n,
                    'inference': profilers[1].dt * 1E3 / n,
                    'postprocess': profilers[2].dt * 1E3 / n}
-                if self.source_type.tensor:  # skip write, show and plot operations if input is raw tensor
+                p, im0 = path[i], None if self.source_type.tensor else im0s[i].copy()
                    if self.args.save or self.args.save_txt or self.args.show:
                        LOGGER.warning('WARNING ⚠️ save, save_txt and show argument not enabled for tensor inference.')
                    continue
                p, im0 = path[i], im0s[i].copy()
                p = Path(p)
                if self.args.verbose or self.args.save or self.args.save_txt or self.args.show:
@ -286,7 +291,7 @@ class BasePredictor:
        if self.args.verbose and self.seen:
            t = tuple(x.t / self.seen * 1E3 for x in profilers)  # speeds per image
            LOGGER.info(f'Speed: %.1fms preprocess, %.1fms inference, %.1fms postprocess per image at shape '
-                        f'{(1, 3, *self.imgsz)}' % t)
+                        f'{(1, 3, *im.shape[2:])}' % t)
        if self.args.save or self.args.save_txt or self.args.save_crop:
            nl = len(list(self.save_dir.glob('labels/*.txt')))  # number of labels
            s = f"\n{nl} label{'s' * (nl > 1)} saved to {self.save_dir / 'labels'}" if self.args.save_txt else ''
--- a/ultralytics/yolo/engine/results.py
+++ b/ultralytics/yolo/engine/results.py
@ -198,6 +198,10 @@ class Results(SimpleClass):
        Returns:
            (numpy.ndarray): A numpy array of the annotated image.
        """
        if img is None and isinstance(self.orig_img, torch.Tensor):
            LOGGER.warning('WARNING ⚠️ Results plotting is not supported for torch.Tensor image types.')
            return
        # Deprecation warn TODO: remove in 8.2
        if 'show_conf' in kwargs:
            deprecation_warn('show_conf', 'conf')
@ -305,7 +309,7 @@ class Results(SimpleClass):
            file_name (str | pathlib.Path): File name.
        """
        if self.probs is not None:
-            LOGGER.warning('Warning: Classify task do not support `save_crop`.')
+            LOGGER.warning('WARNING ⚠️ Classify task do not support `save_crop`.')
            return
        if isinstance(save_dir, str):
            save_dir = Path(save_dir)