ultralytics 8.0.99 HUB resume fix and Docs updates (#2567)

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>
2023-05-12 18:33:32 +02:00
parent 229119c376
commit db1c5885d5
19 changed files with 486 additions and 52 deletions
--- a/docs/models/index.md
+++ b/docs/models/index.md
@ -13,6 +13,7 @@ In this documentation, we provide information on four major models:
 2. [YOLOv5](./yolov5.md): An improved version of the YOLO architecture, offering better performance and speed tradeoffs compared to previous versions.
 3. [YOLOv8](./yolov8.md): The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
 4. [Segment Anything Model (SAM)](./sam.md): Meta's Segment Anything Model (SAM).
+5. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's RT-DETR model.

 You can use these models directly in the Command Line Interface (CLI) or in a Python environment. Below are examples of how to use the models with CLI and Python:

--- a/docs/models/rtdetr.md
+++ b/docs/models/rtdetr.md
@ -0,0 +1,52 @@
+---
+comments: true
+description: Explore RT-DETR, a high-performance real-time object detector. Learn how to use pre-trained models with Ultralytics Python API for various tasks.
+---
+
+# RT-DETR
+
+## Overview
+
+Real-Time Detection Transformer (RT-DETR) is an end-to-end object detector that provides real-time performance while maintaining high accuracy. It efficiently processes multi-scale features by decoupling intra-scale interaction and cross-scale fusion, and supports flexible adjustment of inference speed using different decoder layers without retraining. RT-DETR outperforms many real-time object detectors on accelerated backends like CUDA with TensorRT.
+
+### Key Features
+
+- **Efficient Hybrid Encoder:** RT-DETR uses an efficient hybrid encoder that processes multi-scale features by decoupling intra-scale interaction and cross-scale fusion. This design reduces computational costs and allows for real-time object detection.
+- **IoU-aware Query Selection:** RT-DETR improves object query initialization by utilizing IoU-aware query selection. This allows the model to focus on the most relevant objects in the scene.
+- **Adaptable Inference Speed:** RT-DETR supports flexible adjustments of inference speed by using different decoder layers without the need for retraining. This adaptability facilitates practical application in various real-time object detection scenarios.
+
+## Pre-trained Models
+
+Ultralytics RT-DETR provides several pre-trained models with different scales:
+
+- RT-DETR-L: 53.0% AP on COCO val2017, 114 FPS on T4 GPU
+- RT-DETR-X: 54.8% AP on COCO val2017, 74 FPS on T4 GPU
+
+## Usage
+
+### Python API
+
+```python
+from ultralytics import RTDETR
+
+model = RTDETR("rtdetr-l.pt")
+model.info()  # display model information
+model.predict("path/to/image.jpg")  # predict
+```
+
+### Supported Tasks
+
+| Model Type          | Pre-trained Weights | Tasks Supported  |
+|---------------------|---------------------|------------------|
+| RT-DETR Large       | `rtdetr-l.pt`       | Object Detection |
+| RT-DETR Extra-Large | `rtdetr-x.pt`       | Object Detection |
+
+### Supported Modes
+
+| Mode       | Supported          |
+|------------|--------------------|
+| Inference  | :heavy_check_mark: |
+| Validation | :heavy_check_mark: |
+| Training   | :x: (Coming soon)  |
+
+For more information about the RT-DETR model, please refer to the [original paper](https://arxiv.org/abs/2304.08069) and the [PaddleDetection repository](https://github.com/PaddlePaddle/PaddleDetection).
--- a/docs/models/sam.md
+++ b/docs/models/sam.md
@ -1,26 +1,37 @@
 ---
 comments: true
-description: Learn about the Vision Transformer (ViT) and segment anything with SAM models. Train and use pre-trained models with Python API.
+description: Learn about the Segment Anything Model (SAM) and how it provides promptable image segmentation through an advanced architecture and the SA-1B dataset.
 ---

-# Vision Transformers
+# Segment Anything Model (SAM)

-Vit models currently support Python environment:
+## Overview
+
+The Segment Anything Model (SAM) is a groundbreaking image segmentation model that enables promptable segmentation with real-time performance. It forms the foundation for the Segment Anything project, which introduces a new task, model, and dataset for image segmentation. SAM is designed to be promptable, allowing it to transfer zero-shot to new image distributions and tasks. The model is trained on the [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/), which contains over 1 billion masks on 11 million licensed and privacy-respecting images. SAM has demonstrated impressive zero-shot performance, often surpassing prior fully supervised results.
+
+![Dataset sample image](https://production-media.paperswithcode.com/datasets/540cfe0d-5fe7-43a4-9e1d-4c5813ddeb3e.png)
+
+## Key Features
+
+- **Promptable Segmentation Task:** SAM is designed for a promptable segmentation task, enabling it to return a valid segmentation mask given any segmentation prompt, such as spatial or text information identifying an object.
+- **Advanced Architecture:** SAM utilizes a powerful image encoder, a prompt encoder, and a lightweight mask decoder. This architecture enables flexible prompting, real-time mask computation, and ambiguity awareness in segmentation.
+- **SA-1B Dataset:** The Segment Anything project introduces the SA-1B dataset, which contains over 1 billion masks on 11 million images. This dataset is the largest segmentation dataset to date, providing SAM with a diverse and large-scale source of data for training.
+- **Zero-Shot Performance:** SAM demonstrates remarkable zero-shot performance across a range of segmentation tasks, allowing it to be used out-of-the-box with prompt engineering for various applications.
+
+For more information about the Segment Anything Model and the SA-1B dataset, please refer to the [Segment Anything website](https://segment-anything.com) and the research paper [Segment Anything](https://arxiv.org/abs/2304.02643).
+
+## Usage
+
+SAM can be used for a variety of downstream tasks involving object and image distributions beyond its training data. Examples include edge detection, object proposal generation, instance segmentation, and preliminary text-to-mask prediction. By employing prompt engineering, SAM can adapt to new tasks and data distributions in a zero-shot manner, making it a versatile and powerful tool for image segmentation tasks.

 ```python
 from ultralytics.vit import SAM

-# from ultralytics.vit import MODEL_TYPE
-
-model = SAM("sam_b.pt")
+model = SAM('sam_b.pt')
 model.info()  # display model information
-model.predict(...)  # predict
+model.predict('path/to/image.jpg')  # predict
 ```

-# Segment Anything
-
-## About
-
 ## Supported Tasks

 | Model Type | Pre-trained Weights | Tasks Supported       |
@ -34,4 +45,21 @@ model.predict(...)  # predict
 |------------|--------------------|
 | Inference  | :heavy_check_mark: |
 | Validation | :x:                |
-| Training   | :x:                |
+| Training   | :x:                |
+
+# Citations and Acknowledgements
+
+If you use SAM in your research or development work, please cite the following paper:
+
+```bibtex
+@misc{kirillov2023segment,
+      title={Segment Anything}, 
+      author={Alexander Kirillov and Eric Mintun and Nikhila Ravi and Hanzi Mao and Chloe Rolland and Laura Gustafson and Tete Xiao and Spencer Whitehead and Alexander C. Berg and Wan-Yen Lo and Piotr Dollár and Ross Girshick},
+      year={2023},
+      eprint={2304.02643},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
+
+We would like to acknowledge Meta AI for creating and maintaining this valuable resource for the computer vision community.
--- a/docs/models/yolov5.md
+++ b/docs/models/yolov5.md
@ -5,9 +5,15 @@ description: Detect objects faster and more accurately using Ultralytics YOLOv5u

 # YOLOv5u

-## About
+## Overview

-Anchor-free YOLOv5 models with improved accuracy-speed tradeoff.
+YOLOv5u is an updated version of YOLOv5 that incorporates the anchor-free split Ultralytics head used in the YOLOv8 models. It retains the same backbone and neck architecture as YOLOv5 but offers improved accuracy-speed tradeoff for object detection tasks.
+
+## Key Features
+
+- **Anchor-free Split Ultralytics Head:** YOLOv5u replaces the traditional anchor-based detection head with an anchor-free split Ultralytics head, resulting in improved performance.
+- **Optimized Accuracy-Speed Tradeoff:** The updated model offers a better balance between accuracy and speed, making it more suitable for a wider range of applications.
+- **Variety of Pre-trained Models:** YOLOv5u offers a range of pre-trained models tailored for various tasks, including Inference, Validation, and Training.

 ## Supported Tasks

--- a/docs/models/yolov8.md
+++ b/docs/models/yolov8.md
@ -5,7 +5,16 @@ description: Learn about YOLOv8's pre-trained weights supporting detection, inst

 # YOLOv8

-## About
+## Overview
+
+YOLOv8 is the latest iteration in the YOLO series of real-time object detectors, offering cutting-edge performance in terms of accuracy and speed. Building upon the advancements of previous YOLO versions, YOLOv8 introduces new features and optimizations that make it an ideal choice for various object detection tasks in a wide range of applications.
+
+## Key Features
+
+- **Advanced Backbone and Neck Architectures:** YOLOv8 employs state-of-the-art backbone and neck architectures, resulting in improved feature extraction and object detection performance.
+- **Anchor-free Split Ultralytics Head:** YOLOv8 adopts an anchor-free split Ultralytics head, which contributes to better accuracy and a more efficient detection process compared to anchor-based approaches.
+- **Optimized Accuracy-Speed Tradeoff:** With a focus on maintaining an optimal balance between accuracy and speed, YOLOv8 is suitable for real-time object detection tasks in diverse application areas.
+- **Variety of Pre-trained Models:** YOLOv8 offers a range of pre-trained models to cater to various tasks and performance requirements, making it easier to find the right model for your specific use case.

 ## Supported Tasks