README and Docs updates with A100 TensorRT times (#270)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
single_channel
Glenn Jocher 2 years ago committed by GitHub
parent 216cf2ddb6
commit e18ae9d8e1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -121,13 +121,13 @@ Ultralytics [release](https://github.com/ultralytics/ultralytics/releases) on fi
<details open><summary>Detection</summary> <details open><summary>Detection</summary>
| Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU<br>(ms) | Speed<br><sup>T4 GPU<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) | | Model | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
| ------------------------------------------------------------------------------------ | --------------------- | -------------------- | ------------------------- | ---------------------------- | ------------------ | ----------------- | | ------------------------------------------------------------------------------------ | --------------------- | -------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
| [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt) | 640 | 37.3 | - | - | 3.2 | 8.7 | | [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt) | 640 | 37.3 | - | 0.99 | 3.2 | 8.7 |
| [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt) | 640 | 44.9 | - | - | 11.2 | 28.6 | | [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt) | 640 | 44.9 | - | 1.20 | 11.2 | 28.6 |
| [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m.pt) | 640 | 50.2 | - | - | 25.9 | 78.9 | | [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m.pt) | 640 | 50.2 | - | 1.83 | 25.9 | 78.9 |
| [YOLOv8l](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8l.pt) | 640 | 52.9 | - | - | 43.7 | 165.2 | | [YOLOv8l](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8l.pt) | 640 | 52.9 | - | 2.39 | 43.7 | 165.2 |
| [YOLOv8x](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x.pt) | 640 | 53.9 | - | - | 68.2 | 257.8 | | [YOLOv8x](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x.pt) | 640 | 53.9 | - | 3.53 | 68.2 | 257.8 |
- **mAP<sup>val</sup>** values are for single-model single-scale on [COCO val2017](http://cocodataset.org) dataset. - **mAP<sup>val</sup>** values are for single-model single-scale on [COCO val2017](http://cocodataset.org) dataset.
<br>Reproduce by `yolo mode=val task=detect data=coco.yaml device=0` <br>Reproduce by `yolo mode=val task=detect data=coco.yaml device=0`
@ -138,8 +138,8 @@ Ultralytics [release](https://github.com/ultralytics/ultralytics/releases) on fi
<details><summary>Segmentation</summary> <details><summary>Segmentation</summary>
| Model | size<br><sup>(pixels) | mAP<sup>box<br>50-95 | mAP<sup>mask<br>50-95 | Speed<br><sup>CPU<br>(ms) | Speed<br><sup>T4 GPU<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) | | Model | size<br><sup>(pixels) | mAP<sup>box<br>50-95 | mAP<sup>mask<br>50-95 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) |
| ---------------------------------------------------------------------------------------- | --------------------- | -------------------- | --------------------- | ------------------------- | ---------------------------- | ------------------ | ----------------- | | ---------------------------------------------------------------------------------------- | --------------------- | -------------------- | --------------------- | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
| [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n-seg.pt) | 640 | 36.7 | 30.5 | - | - | 3.4 | 12.6 | | [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n-seg.pt) | 640 | 36.7 | 30.5 | - | - | 3.4 | 12.6 |
| [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-seg.pt) | 640 | 44.6 | 36.8 | - | - | 11.8 | 42.6 | | [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-seg.pt) | 640 | 44.6 | 36.8 | - | - | 11.8 | 42.6 |
| [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m-seg.pt) | 640 | 49.9 | 40.8 | - | - | 27.3 | 110.2 | | [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m-seg.pt) | 640 | 49.9 | 40.8 | - | - | 27.3 | 110.2 |
@ -155,8 +155,8 @@ Ultralytics [release](https://github.com/ultralytics/ultralytics/releases) on fi
<details><summary>Classification</summary> <details><summary>Classification</summary>
| Model | size<br><sup>(pixels) | acc<br><sup>top1 | acc<br><sup>top5 | Speed<br><sup>CPU<br>(ms) | Speed<br><sup>T4 GPU<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) at 640 | | Model | size<br><sup>(pixels) | acc<br><sup>top1 | acc<br><sup>top5 | Speed<br><sup>CPU ONNX<br>(ms) | Speed<br><sup>A100 TensorRT<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>(B) at 640 |
| ---------------------------------------------------------------------------------------- | --------------------- | ---------------- | ---------------- | ------------------------- | ---------------------------- | ------------------ | ------------------------ | | ---------------------------------------------------------------------------------------- | --------------------- | ---------------- | ---------------- | ------------------------------ | ----------------------------------- | ------------------ | ------------------------ |
| [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n-cls.pt) | 224 | 66.6 | 87.0 | - | - | 2.7 | 4.3 | | [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n-cls.pt) | 224 | 66.6 | 87.0 | - | - | 2.7 | 4.3 |
| [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-cls.pt) | 224 | 72.3 | 91.1 | - | - | 6.4 | 13.5 | | [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-cls.pt) | 224 | 72.3 | 91.1 | - | - | 6.4 | 13.5 |
| [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m-cls.pt) | 224 | 76.4 | 93.2 | - | - | 17.0 | 42.7 | | [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m-cls.pt) | 224 | 76.4 | 93.2 | - | - | 17.0 | 42.7 |

@ -115,13 +115,13 @@ success = YOLO("yolov8n.pt").export(format="onnx") # 将模型导出为 ONNX
<details open><summary>目标检测</summary> <details open><summary>目标检测</summary>
| 模型 | 尺寸<br><sup>(像素) | mAP<sup>val<br>50-95 | 推理速度<br><sup>CPU<br>(ms) | 推理速度<br><sup>T4 GPU<br>(ms) | 参数量<br><sup>(M) | FLOPs<br><sup>(B) | | 模型 | 尺寸<br><sup>(像素) | mAP<sup>val<br>50-95 | 推理速度<br><sup>CPU ONNX<br>(ms) | 推理速度<br><sup>A100 TensorRT<br>(ms) | 参数量<br><sup>(M) | FLOPs<br><sup>(B) |
| ----------------------------------------------------------------------------------------- | --------------- | -------------------- | ------------------------ | --------------------------- | --------------- | ----------------- | | ------------------------------------------------------------------------------------ | --------------- | -------------------- | ----------------------------- | ---------------------------------- | --------------- | ----------------- |
| [YOLOv8n](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8n.pt) | 640 | 37.3 | - | - | 3.2 | 8.7 | | [YOLOv8n](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt) | 640 | 37.3 | - | 0.99 | 3.2 | 8.7 |
| [YOLOv8s](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8s.pt) | 640 | 44.9 | - | - | 11.2 | 28.6 | | [YOLOv8s](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt) | 640 | 44.9 | - | 1.20 | 11.2 | 28.6 |
| [YOLOv8m](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8m.pt) | 640 | 50.2 | - | - | 25.9 | 78.9 | | [YOLOv8m](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m.pt) | 640 | 50.2 | - | 1.83 | 25.9 | 78.9 |
| [YOLOv8l](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8l.pt) | 640 | 52.9 | - | - | 43.7 | 165.2 | | [YOLOv8l](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8l.pt) | 640 | 52.9 | - | 2.39 | 43.7 | 165.2 |
| [YOLOv8x](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8x.pt) | 640 | 53.9 | - | - | 68.2 | 257.8 | | [YOLOv8x](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x.pt) | 640 | 53.9 | - | 3.53 | 68.2 | 257.8 |
- **mAP<sup>val</sup>** 结果都在 [COCO val2017](http://cocodataset.org) 数据集上,使用单模型单尺度测试得到。 - **mAP<sup>val</sup>** 结果都在 [COCO val2017](http://cocodataset.org) 数据集上,使用单模型单尺度测试得到。
<br>复现命令 `yolo mode=val task=detect data=coco.yaml device=0` <br>复现命令 `yolo mode=val task=detect data=coco.yaml device=0`
@ -132,8 +132,8 @@ success = YOLO("yolov8n.pt").export(format="onnx") # 将模型导出为 ONNX
<details><summary>实例分割</summary> <details><summary>实例分割</summary>
| 模型 | 尺寸<br><sup>(像素) | mAP<sup>box<br>50-95 | mAP<sup>mask<br>50-95 | 推理速度<br><sup>CPU<br>(ms) | 推理速度<br><sup>T4 GPU<br>(ms) | 参数量<br><sup>(M) | FLOPs<br><sup>(B) | | 模型 | 尺寸<br><sup>(像素) | mAP<sup>box<br>50-95 | mAP<sup>mask<br>50-95 | 推理速度<br><sup>CPU ONNX<br>(ms) | 推理速度<br><sup>A100 TensorRT<br>(ms) | 参数量<br><sup>(M) | FLOPs<br><sup>(B) |
| --------------------------------------------------------------------------------------------- | --------------- | -------------------- | --------------------- | ------------------------ | --------------------------- | --------------- | ----------------- | | --------------------------------------------------------------------------------------------- | --------------- | -------------------- | --------------------- | ----------------------------- | ---------------------------------- | --------------- | ----------------- |
| [YOLOv8n](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8n-seg.pt) | 640 | 36.7 | 30.5 | - | - | 3.4 | 12.6 | | [YOLOv8n](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8n-seg.pt) | 640 | 36.7 | 30.5 | - | - | 3.4 | 12.6 |
| [YOLOv8s](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8s-seg.pt) | 640 | 44.6 | 36.8 | - | - | 11.8 | 42.6 | | [YOLOv8s](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8s-seg.pt) | 640 | 44.6 | 36.8 | - | - | 11.8 | 42.6 |
| [YOLOv8m](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8m-seg.pt) | 640 | 49.9 | 40.8 | - | - | 27.3 | 110.2 | | [YOLOv8m](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8m-seg.pt) | 640 | 49.9 | 40.8 | - | - | 27.3 | 110.2 |
@ -149,8 +149,8 @@ success = YOLO("yolov8n.pt").export(format="onnx") # 将模型导出为 ONNX
<details><summary>分类</summary> <details><summary>分类</summary>
| 模型 | 尺寸<br><sup>(像素) | acc<br><sup>top1 | acc<br><sup>top5 | 推理速度<br><sup>CPU<br>(ms) | 推理速度<br><sup>T4 GPU<br>(ms) | 参数量<br><sup>(M) | FLOPs<br><sup>(B) at 640 | | 模型 | 尺寸<br><sup>(像素) | acc<br><sup>top1 | acc<br><sup>top5 | 推理速度<br><sup>CPU ONNX<br>(ms) | 推理速度<br><sup>A100 TensorRT<br>(ms) | 参数量<br><sup>(M) | FLOPs<br><sup>(B) at 640 |
| --------------------------------------------------------------------------------------------- | --------------- | ---------------- | ---------------- | ------------------------ | --------------------------- | --------------- | ------------------------ | | --------------------------------------------------------------------------------------------- | --------------- | ---------------- | ---------------- | ----------------------------- | ---------------------------------- | --------------- | ------------------------ |
| [YOLOv8n](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8n-cls.pt) | 224 | 66.6 | 87.0 | - | - | 2.7 | 4.3 | | [YOLOv8n](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8n-cls.pt) | 224 | 66.6 | 87.0 | - | - | 2.7 | 4.3 |
| [YOLOv8s](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8s-cls.pt) | 224 | 72.3 | 91.1 | - | - | 6.4 | 13.5 | | [YOLOv8s](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8s-cls.pt) | 224 | 72.3 | 91.1 | - | - | 6.4 | 13.5 |
| [YOLOv8m](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8m-cls.pt) | 224 | 76.4 | 93.2 | - | - | 17.0 | 42.7 | | [YOLOv8m](https://github.com/ultralytics/ultralytics/releases/download/v8.0.0/yolov8m-cls.pt) | 224 | 76.4 | 93.2 | - | - | 17.0 | 42.7 |

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.3 KiB

@ -5,7 +5,7 @@ BaseTrainer contains the generic boilerplate training routine. It can be customi
* `get_model(cfg, weights)` - The function that builds a the model to be trained * `get_model(cfg, weights)` - The function that builds a the model to be trained
* `get_dataloder()` - The function that builds the dataloder * `get_dataloder()` - The function that builds the dataloder
More details and source code can be found in [`BaseTrainer` Reference](../reference/base_trainer.md) More details and source code can be found in [`BaseTrainer` Reference](reference/base_trainer.md)
## DetectionTrainer ## DetectionTrainer
Here's how you can use the YOLOv8 `DetectionTrainer` and customize it. Here's how you can use the YOLOv8 `DetectionTrainer` and customize it.

@ -1,6 +1,10 @@
<div align="center"> <div align="center">
<a href="https://github.com/ultralytics/ultralytics" target="_blank"> <a href="https://github.com/ultralytics/ultralytics" target="_blank">
<img width="1024" src="https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png"></a> <img width="1024" src="https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/banner-yolov8.png"></a>
<br>
<a href="https://github.com/ultralytics/ultralytics/actions/workflows/ci.yaml"><img src="https://github.com/ultralytics/ultralytics/actions/workflows/ci.yaml/badge.svg" alt="Ultralytics CI"></a>
<a href="https://zenodo.org/badge/latestdoi/264818686"><img src="https://zenodo.org/badge/264818686.svg" alt="YOLOv8 Citation"></a>
<a href="https://hub.docker.com/r/ultralytics/ultralytics"><img src="https://img.shields.io/docker/pulls/ultralytics/ultralytics?logo=docker" alt="Docker Pulls"></a>
<br> <br>
<a href="https://console.paperspace.com/github/ultralytics/ultralytics"><img src="https://assets.paperspace.io/img/gradient-badge.svg" alt="Run on Gradient"/></a> <a href="https://console.paperspace.com/github/ultralytics/ultralytics"><img src="https://assets.paperspace.io/img/gradient-badge.svg" alt="Run on Gradient"/></a>
<a href="https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> <a href="https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

@ -1,25 +1,14 @@
site_name: Ultralytics Docs site_name: Ultralytics Docs
repo_url: https://github.com/ultralytics/ultralytics repo_url: https://github.com/ultralytics/ultralytics
repo_name: Ultralytics edit_uri: https://github.com/ultralytics/ultralytics/tree/main/docs
repo_name: ultralytics/ultralytics
theme: theme:
name: "material" name: "material"
logo: https://github.com/ultralytics/assets/raw/main/logo/Ultralytics-logomark-white.png logo: https://github.com/ultralytics/assets/raw/main/logo/Ultralytics-logomark-white.png
icon: favicon: assets/favicon.ico
repo: fontawesome/brands/github font:
admonition: text: Roboto
note: octicons/tag-16
abstract: octicons/checklist-16
info: octicons/info-16
tip: octicons/squirrel-16
success: octicons/check-16
question: octicons/question-16
warning: octicons/alert-16
failure: octicons/x-circle-16
danger: octicons/zap-16
bug: octicons/bug-16
example: octicons/beaker-16
quote: octicons/quote-16
palette: palette:
# Palette toggle for light mode # Palette toggle for light mode
@ -34,12 +23,16 @@ theme:
icon: material/brightness-4 icon: material/brightness-4
name: Switch to light mode name: Switch to light mode
features: features:
- content.action.edit
- content.code.annotate - content.code.annotate
- content.tooltips - content.tooltips
- search.highlight - search.highlight
- search.share - search.share
- search.suggest - search.suggest
- toc.follow - toc.follow
- navigation.top
- navigation.expand
- navigation.footer
extra_css: extra_css:
- stylesheets/style.css - stylesheets/style.css
@ -72,8 +65,10 @@ markdown_extensions:
- pymdownx.keys - pymdownx.keys
- pymdownx.mark - pymdownx.mark
- pymdownx.tilde - pymdownx.tilde
plugins: plugins:
- mkdocstrings - mkdocstrings
- search
# Primary navigation # Primary navigation
nav: nav:

@ -22,32 +22,31 @@ class AutoBackend(nn.Module):
def __init__(self, weights='yolov8n.pt', device=torch.device('cpu'), dnn=False, data=None, fp16=False, fuse=True): def __init__(self, weights='yolov8n.pt', device=torch.device('cpu'), dnn=False, data=None, fp16=False, fuse=True):
""" """
Ultralytics YOLO MultiBackend class for python inference on various backends MultiBackend class for python inference on various platforms using Ultralytics YOLO.
Args: Args:
weights: the path to the weights file. Defaults to yolov8n.pt weights (str): The path to the weights file. Default: 'yolov8n.pt'
device: The device to run the model on. device (torch.device): The device to run the model on.
dnn: If you want to use OpenCV's DNN module to run the inference, set this to True. Defaults to dnn (bool): Use OpenCV's DNN module for inference if True, defaults to False.
False data (dict): Additional data, optional
data: a dictionary containing the following keys: fp16 (bool): If True, use half precision. Default: False
fp16: If true, will use half precision. Defaults to False fuse (bool): Whether to fuse the model or not. Default: True
fuse: whether to fuse the model or not. Defaults to True
Supported format and their usage: Supported formats and their usage:
| Platform | weights | Platform | Weights Format
|-----------------------|------------------| -----------------------|------------------
| PyTorch | *.pt | PyTorch | *.pt
| TorchScript | *.torchscript | TorchScript | *.torchscript
| ONNX Runtime | *.onnx | ONNX Runtime | *.onnx
| ONNX OpenCV DNN | *.onnx --dnn | ONNX OpenCV DNN | *.onnx --dnn
| OpenVINO | *.xml | OpenVINO | *.xml
| CoreML | *.mlmodel | CoreML | *.mlmodel
| TensorRT | *.engine | TensorRT | *.engine
| TensorFlow SavedModel | *_saved_model | TensorFlow SavedModel | *_saved_model
| TensorFlow GraphDef | *.pb | TensorFlow GraphDef | *.pb
| TensorFlow Lite | *.tflite | TensorFlow Lite | *.tflite
| TensorFlow Edge TPU | *_edgetpu.tflite | TensorFlow Edge TPU | *_edgetpu.tflite
| PaddlePaddle | *_paddle_model | PaddlePaddle | *_paddle_model
""" """
super().__init__() super().__init__()
w = str(weights[0] if isinstance(weights, list) else weights) w = str(weights[0] if isinstance(weights, list) else weights)
@ -234,15 +233,16 @@ class AutoBackend(nn.Module):
def forward(self, im, augment=False, visualize=False): def forward(self, im, augment=False, visualize=False):
""" """
Runs inference on the given model Runs inference on the YOLOv8 MultiBackend model.
Args: Args:
im: the image tensor im (torch.tensor): The image tensor to perform inference on.
augment: whether to augment the image. Defaults to False augment (bool): whether to perform data augmentation during inference, defaults to False
visualize: if True, then the network will output the feature maps of the last convolutional layer. visualize (bool): whether to visualize the output predictions, defaults to False
Defaults to False
Returns:
(tuple): Tuple containing the raw output tensor, and the processed output for visualization (if visualize=True)
""" """
# YOLOv5 MultiBackend inference
b, ch, h, w = im.shape # batch, channel, height, width b, ch, h, w = im.shape # batch, channel, height, width
if self.fp16 and im.dtype != torch.float16: if self.fp16 and im.dtype != torch.float16:
im = im.half() # to FP16 im = im.half() # to FP16
@ -325,19 +325,25 @@ class AutoBackend(nn.Module):
def from_numpy(self, x): def from_numpy(self, x):
""" """
`from_numpy` converts a numpy array to a tensor Convert a numpy array to a tensor.
Args: Args:
x: the numpy array to convert x (numpy.ndarray): The array to be converted.
Returns:
(torch.tensor): The converted tensor
""" """
return torch.from_numpy(x).to(self.device) if isinstance(x, np.ndarray) else x return torch.from_numpy(x).to(self.device) if isinstance(x, np.ndarray) else x
def warmup(self, imgsz=(1, 3, 640, 640)): def warmup(self, imgsz=(1, 3, 640, 640)):
""" """
Warmup model by running inference once Warm up the model by running one forward pass with a dummy input.
Args: Args:
imgsz: the size of the image you want to run inference on. imgsz (tuple): The shape of the dummy input tensor in the format (batch_size, channels, height, width)
Returns:
(None): This method runs the forward pass and don't return any value
""" """
warmup_types = self.pt, self.jit, self.onnx, self.engine, self.saved_model, self.pb, self.triton, self.nn_module warmup_types = self.pt, self.jit, self.onnx, self.engine, self.saved_model, self.pb, self.triton, self.nn_module
if any(warmup_types) and (self.device.type != 'cpu' or self.triton): if any(warmup_types) and (self.device.type != 'cpu' or self.triton):

@ -17,35 +17,36 @@ from ultralytics.yolo.utils.torch_utils import (fuse_conv_and_bn, initialize_wei
class BaseModel(nn.Module): class BaseModel(nn.Module):
''' """
The BaseModel class is a base class for all the models in the Ultralytics YOLO family. The BaseModel class serves as a base class for all the models in the Ultralytics YOLO family.
''' """
def forward(self, x, profile=False, visualize=False): def forward(self, x, profile=False, visualize=False):
""" """
> `forward` is a wrapper for `_forward_once` that runs the model on a single scale Forward pass of the model on a single scale.
Wrapper for `_forward_once` method.
Args: Args:
x: the input image x (torch.tensor): The input image tensor
profile: whether to profile the model. Defaults to False profile (bool): Whether to profile the model, defaults to False
visualize: if True, will return the intermediate feature maps. Defaults to False visualize (bool): Whether to return the intermediate feature maps, defaults to False
Returns: Returns:
The output of the network. (torch.tensor): The output of the network.
""" """
return self._forward_once(x, profile, visualize) return self._forward_once(x, profile, visualize)
def _forward_once(self, x, profile=False, visualize=False): def _forward_once(self, x, profile=False, visualize=False):
""" """
> Forward pass of the network Perform a forward pass through the network.
Args: Args:
x: input to the model x (torch.tensor): The input tensor to the model
profile: if True, the time taken for each layer will be printed. Defaults to False profile (bool): Print the computation time of each layer if True, defaults to False.
visualize: If True, it will save the feature maps of the model. Defaults to False visualize (bool): Save the feature maps of the model if True, defaults to False
Returns: Returns:
The last layer of the model. (torch.tensor): The last output of the model.
""" """
y, dt = [], [] # outputs y, dt = [], [] # outputs
for m in self.model: for m in self.model:
@ -62,13 +63,15 @@ class BaseModel(nn.Module):
def _profile_one_layer(self, m, x, dt): def _profile_one_layer(self, m, x, dt):
""" """
It takes a model, an input, and a list of times, and it profiles the model on the input, appending Profile the computation time and FLOPs of a single layer of the model on a given input. Appends the results to the provided list.
the time to the list
Args: Args:
m: the model m (nn.Module): The layer to be profiled.
x: the input image x (torch.Tensor): The input data to the layer.
dt: list of time taken for each layer dt (list): A list to store the computation time of the layer.
Returns:
None
""" """
c = m == self.model[-1] # is final layer, copy input as inplace fix c = m == self.model[-1] # is final layer, copy input as inplace fix
o = thop.profile(m, inputs=(x.copy() if c else x,), verbose=False)[0] / 1E9 * 2 if thop else 0 # FLOPs o = thop.profile(m, inputs=(x.copy() if c else x,), verbose=False)[0] / 1E9 * 2 if thop else 0 # FLOPs
@ -84,10 +87,10 @@ class BaseModel(nn.Module):
def fuse(self): def fuse(self):
""" """
> It takes a model and fuses the Conv2d() and BatchNorm2d() layers into a single layer Fuse the `Conv2d()` and `BatchNorm2d()` layers of the model into a single layer, in order to improve the computation efficiency.
Returns: Returns:
The model is being returned. (nn.Module): The fused model is returned.
""" """
LOGGER.info('Fusing layers... ') LOGGER.info('Fusing layers... ')
for m in self.model.modules(): for m in self.model.modules():
@ -103,8 +106,8 @@ class BaseModel(nn.Module):
Prints model information Prints model information
Args: Args:
verbose: if True, prints out the model information. Defaults to False verbose (bool): if True, prints out the model information. Defaults to False
imgsz: the size of the image that the model will be trained on. Defaults to 640 imgsz (int): the size of the image that the model will be trained on. Defaults to 640
""" """
model_info(self, verbose, imgsz) model_info(self, verbose, imgsz)
@ -129,10 +132,10 @@ class BaseModel(nn.Module):
def load(self, weights): def load(self, weights):
""" """
> This function loads the weights of the model from a file This function loads the weights of the model from a file
Args: Args:
weights: The weights to load into the model. weights (str): The weights to load into the model.
""" """
# Force all tasks to implement this function # Force all tasks to implement this function
raise NotImplementedError("This function needs to be implemented by derived classes!") raise NotImplementedError("This function needs to be implemented by derived classes!")

@ -84,6 +84,7 @@ class BaseTrainer:
if overrides is None: if overrides is None:
overrides = {} overrides = {}
self.args = get_config(config, overrides) self.args = get_config(config, overrides)
self.device = utils.torch_utils.select_device(self.args.device, self.args.batch)
self.check_resume() self.check_resume()
self.console = LOGGER self.console = LOGGER
self.validator = None self.validator = None
@ -113,7 +114,6 @@ class BaseTrainer:
print_args(dict(self.args)) print_args(dict(self.args))
# Device # Device
self.device = utils.torch_utils.select_device(self.args.device, self.batch_size)
self.amp = self.device.type != 'cpu' self.amp = self.device.type != 'cpu'
self.scaler = amp.GradScaler(enabled=self.amp) self.scaler = amp.GradScaler(enabled=self.amp)
if self.device.type == 'cpu': if self.device.type == 'cpu':
@ -164,7 +164,15 @@ class BaseTrainer:
callback(self) callback(self)
def train(self): def train(self):
# Allow device='', device=None on Multi-GPU systems to default to device=0
if isinstance(self.args.device, int) or self.args.device: # i.e. device=0 or device=[0,1,2,3]
world_size = torch.cuda.device_count() world_size = torch.cuda.device_count()
elif torch.cuda.is_available(): # i.e. device=None or device=''
world_size = 1 # default to device 0
else: # i.e. device='cpu' or 'mps'
world_size = 0
# Run subprocess if DDP training, else train normally
if world_size > 1 and "LOCAL_RANK" not in os.environ: if world_size > 1 and "LOCAL_RANK" not in os.environ:
command = generate_ddp_command(world_size, self) command = generate_ddp_command(world_size, self)
try: try:

@ -1,5 +1,3 @@
# Ultralytics YOLO 🚀, GPL-3.0 license
import contextlib import contextlib
import math import math
import re import re
@ -50,15 +48,15 @@ def coco80_to_coco91_class(): # converts 80-index (val2014) to 91-index (paper)
def segment2box(segment, width=640, height=640): def segment2box(segment, width=640, height=640):
""" """
> Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to
(xyxy) (xyxy)
Args: Args:
segment: the segment label segment (torch.tensor): the segment label
width: the width of the image. Defaults to 640 width (int): the width of the image. Defaults to 640
height: The height of the image. Defaults to 640 height (int): The height of the image. Defaults to 640
Returns: Returns:
the minimum and maximum x and y values of the segment. (np.array): the minimum and maximum x and y values of the segment.
""" """
# Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy) # Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)
x, y = segment.T # segment xy x, y = segment.T # segment xy
@ -69,17 +67,16 @@ def segment2box(segment, width=640, height=640):
def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None): def scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None):
""" """
> Rescale boxes (xyxy) from img1_shape to img0_shape Rescales bounding boxes (in the format of xyxy) from the shape of the image they were originally specified in (img1_shape) to the shape of a different image (img0_shape).
Args: Args:
img1_shape: The shape of the image that the bounding boxes are for. img1_shape (tuple): The shape of the image that the bounding boxes are for, in the format of (height, width).
boxes: the bounding boxes of the objects in the image boxes (torch.tensor): the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2)
img0_shape: the shape of the original image img0_shape (tuple): the shape of the target image, in the format of (height, width).
ratio_pad: a tuple of (ratio, pad) ratio_pad (tuple): a tuple of (ratio, pad) for scaling the boxes. If not provided, the ratio and pad will be calculated based on the size difference between the two images.
Returns: Returns:
The boxes are being returned. boxes (torch.tensor): The scaled bounding boxes, in the format of (x1, y1, x2, y2)
""" """
#
if ratio_pad is None: # calculate from img0_shape if ratio_pad is None: # calculate from img0_shape
gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new
pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2 # wh padding pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2 # wh padding
@ -113,7 +110,7 @@ def non_max_suppression(
nm=0, # number of masks nm=0, # number of masks
): ):
""" """
> Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box. Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.
Arguments: Arguments:
prediction (torch.Tensor): A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks) prediction (torch.Tensor): A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks)
@ -134,7 +131,7 @@ def non_max_suppression(
nm (int): The number of masks output by the model. nm (int): The number of masks output by the model.
Returns: Returns:
List[torch.Tensor]: A list of length batch_size, where each element is a tensor of (List[torch.Tensor]): A list of length batch_size, where each element is a tensor of
shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns
(x1, y1, x2, y2, confidence, class, mask1, mask2, ...). (x1, y1, x2, y2, confidence, class, mask1, mask2, ...).
""" """
@ -231,12 +228,12 @@ def non_max_suppression(
def clip_boxes(boxes, shape): def clip_boxes(boxes, shape):
""" """
> It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the
shape shape
Args: Args:
boxes: the bounding boxes to clip boxes (torch.tensor): the bounding boxes to clip
shape: the shape of the image shape (tuple): the shape of the image
""" """
if isinstance(boxes, torch.Tensor): # faster individually if isinstance(boxes, torch.Tensor): # faster individually
boxes[..., 0].clamp_(0, shape[1]) # x1 boxes[..., 0].clamp_(0, shape[1]) # x1
@ -262,16 +259,16 @@ def clip_coords(boxes, shape):
def scale_image(im1_shape, masks, im0_shape, ratio_pad=None): def scale_image(im1_shape, masks, im0_shape, ratio_pad=None):
""" """
> It takes a mask, and resizes it to the original image size Takes a mask, and resizes it to the original image size
Args: Args:
im1_shape: model input shape, [h, w] im1_shape (tuple): model input shape, [h, w]
masks: [h, w, num] masks (torch.tensor): [h, w, num]
im0_shape: the original image shape im0_shape (tuple): the original image shape
ratio_pad: the ratio of the padding to the original image. ratio_pad (tuple): the ratio of the padding to the original image.
Returns: Returns:
The masks are being returned. masks (torch.tensor): The masks that are being returned.
""" """
# Rescale coordinates (xyxy) from im1_shape to im0_shape # Rescale coordinates (xyxy) from im1_shape to im0_shape
if ratio_pad is None: # calculate from im0_shape if ratio_pad is None: # calculate from im0_shape
@ -297,14 +294,12 @@ def scale_image(im1_shape, masks, im0_shape, ratio_pad=None):
def xyxy2xywh(x): def xyxy2xywh(x):
""" """
> It takes a list of bounding boxes, and converts them from the format [x1, y1, x2, y2] to [x, y, w, Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height) format.
h] where xy1=top-left, xy2=bottom-right
Args: Args:
x: the input tensor x (np.ndarray) or (torch.Tensor): The input tensor containing the bounding box coordinates in (x1, y1, x2, y2) format.
Returns: Returns:
the center of the box, the width and the height of the box. y (numpy.ndarray) or (torch.Tensor): The bounding box coordinates in (x, y, width, height) format.
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[..., 0] = (x[..., 0] + x[..., 2]) / 2 # x center y[..., 0] = (x[..., 0] + x[..., 2]) / 2 # x center
@ -316,13 +311,12 @@ def xyxy2xywh(x):
def xywh2xyxy(x): def xywh2xyxy(x):
""" """
> It converts the bounding box from x,y,w,h to x1,y1,x2,y2 where xy1=top-left, xy2=bottom-right Convert bounding box coordinates from (x, y, width, height) format to (x1, y1, x2, y2) format where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.
Args: Args:
x: the input tensor x (np.ndarray) or (torch.Tensor): The input tensor containing the bounding box coordinates in (x, y, width, height) format.
Returns: Returns:
the top left and bottom right coordinates of the bounding box. y (numpy.ndarray) or (torch.Tensor): The bounding box coordinates in (x1, y1, x2, y2) format.
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[..., 0] = x[..., 0] - x[..., 2] / 2 # top left x y[..., 0] = x[..., 0] - x[..., 2] / 2 # top left x
@ -334,17 +328,16 @@ def xywh2xyxy(x):
def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0): def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):
""" """
> It converts the normalized coordinates to the actual coordinates [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right Convert normalized bounding box coordinates to pixel coordinates.
Args: Args:
x: the bounding box coordinates x (np.ndarray) or (torch.Tensor): The bounding box coordinates.
w: width of the image. Defaults to 640 w (int): Width of the image. Defaults to 640
h: height of the image. Defaults to 640 h (int): Height of the image. Defaults to 640
padw: padding width. Defaults to 0 padw (int): Padding width. Defaults to 0
padh: height of the padding. Defaults to 0 padh (int): Padding height. Defaults to 0
Returns: Returns:
the xyxy coordinates of the bounding box. y (numpy.ndarray) or (torch.Tensor): The coordinates of the bounding box in the format [x1, y1, x2, y2] where x1,y1 is the top-left corner, x2,y2 is the bottom-right corner of the bounding box.
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[..., 0] = w * (x[..., 0] - x[..., 2] / 2) + padw # top left x y[..., 0] = w * (x[..., 0] - x[..., 2] / 2) + padw # top left x
@ -356,18 +349,16 @@ def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):
def xyxy2xywhn(x, w=640, h=640, clip=False, eps=0.0): def xyxy2xywhn(x, w=640, h=640, clip=False, eps=0.0):
""" """
> It takes in a list of bounding boxes, and returns a list of bounding boxes, but with the x and y Convert bounding box coordinates from (x1, y1, x2, y2) format to (x, y, width, height, normalized) format. x, y, width and height are normalized to image dimensions
coordinates normalized to the width and height of the image
Args: Args:
x: the bounding box coordinates x (np.ndarray) or (torch.Tensor): The input tensor containing the bounding box coordinates in (x1, y1, x2, y2) format.
w: width of the image. Defaults to 640 w (int): The width of the image. Defaults to 640
h: height of the image. Defaults to 640 h (int): The height of the image. Defaults to 640
clip: If True, the boxes will be clipped to the image boundaries. Defaults to False clip (bool): If True, the boxes will be clipped to the image boundaries. Defaults to False
eps: the minimum value of the box's width and height. eps (float): The minimum value of the box's width and height. Defaults to 0.0
Returns: Returns:
the xywhn format of the bounding boxes. y (numpy.ndarray) or (torch.Tensor): The bounding box coordinates in (x, y, width, height, normalized) format
""" """
if clip: if clip:
clip_boxes(x, (h - eps, w - eps)) # warning: inplace clip clip_boxes(x, (h - eps, w - eps)) # warning: inplace clip
@ -381,17 +372,16 @@ def xyxy2xywhn(x, w=640, h=640, clip=False, eps=0.0):
def xyn2xy(x, w=640, h=640, padw=0, padh=0): def xyn2xy(x, w=640, h=640, padw=0, padh=0):
""" """
> It converts normalized segments into pixel segments of shape (n,2) Convert normalized coordinates to pixel coordinates of shape (n,2)
Args: Args:
x: the normalized coordinates of the bounding box x (numpy.ndarray) or (torch.Tensor): The input tensor of normalized bounding box coordinates
w: width of the image. Defaults to 640 w (int): The width of the image. Defaults to 640
h: height of the image. Defaults to 640 h (int): The height of the image. Defaults to 640
padw: padding width. Defaults to 0 padw (int): The width of the padding. Defaults to 0
padh: padding height. Defaults to 0 padh (int): The height of the padding. Defaults to 0
Returns: Returns:
the x and y coordinates of the top left corner of the bounding box. y (numpy.ndarray) or (torch.Tensor): The x and y coordinates of the top left corner of the bounding box
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[..., 0] = w * x[..., 0] + padw # top left x y[..., 0] = w * x[..., 0] + padw # top left x
@ -401,13 +391,12 @@ def xyn2xy(x, w=640, h=640, padw=0, padh=0):
def xywh2ltwh(x): def xywh2ltwh(x):
""" """
> It converts the bounding box from [x, y, w, h] to [x1, y1, w, h] where xy1=top-left Convert the bounding box format from [x, y, w, h] to [x1, y1, w, h], where x1, y1 are the top-left coordinates.
Args: Args:
x: the x coordinate of the center of the bounding box x (numpy.ndarray) or (torch.Tensor): The input tensor with the bounding box coordinates in the xywh format
Returns: Returns:
the top left x and y coordinates of the bounding box. y (numpy.ndarray) or (torch.Tensor): The bounding box coordinates in the xyltwh format
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[:, 0] = x[:, 0] - x[:, 2] / 2 # top left x y[:, 0] = x[:, 0] - x[:, 2] / 2 # top left x
@ -417,13 +406,12 @@ def xywh2ltwh(x):
def xyxy2ltwh(x): def xyxy2ltwh(x):
""" """
> Convert nx4 boxes from [x1, y1, x2, y2] to [x1, y1, w, h] where xy1=top-left, xy2=bottom-right Convert nx4 bounding boxes from [x1, y1, x2, y2] to [x1, y1, w, h], where xy1=top-left, xy2=bottom-right
Args: Args:
x: the input tensor x (numpy.ndarray) or (torch.Tensor): The input tensor with the bounding boxes coordinates in the xyxy format
Returns: Returns:
the xyxy2ltwh function. y (numpy.ndarray) or (torch.Tensor): The bounding box coordinates in the xyltwh format.
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[:, 2] = x[:, 2] - x[:, 0] # width y[:, 2] = x[:, 2] - x[:, 0] # width
@ -433,10 +421,10 @@ def xyxy2ltwh(x):
def ltwh2xywh(x): def ltwh2xywh(x):
""" """
> Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center
Args: Args:
x: the input tensor x (torch.tensor): the input tensor
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[:, 0] = x[:, 0] + x[:, 2] / 2 # center x y[:, 0] = x[:, 0] + x[:, 2] / 2 # center x
@ -446,14 +434,13 @@ def ltwh2xywh(x):
def ltwh2xyxy(x): def ltwh2xyxy(x):
""" """
> It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
xy2=bottom-right
Args: Args:
x: the input image x (numpy.ndarray) or (torch.Tensor): the input image
Returns: Returns:
the xyxy coordinates of the bounding boxes. y (numpy.ndarray) or (torch.Tensor): the xyxy coordinates of the bounding boxes.
""" """
y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
y[:, 2] = x[:, 2] + x[:, 0] # width y[:, 2] = x[:, 2] + x[:, 0] # width
@ -463,14 +450,13 @@ def ltwh2xyxy(x):
def segments2boxes(segments): def segments2boxes(segments):
""" """
> It converts segment labels to box labels, i.e. (cls, xy1, xy2, ...) to (cls, xywh) It converts segment labels to box labels, i.e. (cls, xy1, xy2, ...) to (cls, xywh)
Args: Args:
segments: list of segments, each segment is a list of points, each point is a list of x, y segments (list): list of segments, each segment is a list of points, each point is a list of x, y coordinates
coordinates
Returns: Returns:
the xywh coordinates of the bounding boxes. (np.array): the xywh coordinates of the bounding boxes.
""" """
boxes = [] boxes = []
for s in segments: for s in segments:
@ -481,15 +467,14 @@ def segments2boxes(segments):
def resample_segments(segments, n=1000): def resample_segments(segments, n=1000):
""" """
> It takes a list of segments (n,2) and returns a list of segments (n,2) where each segment has been It takes a list of segments (n,2) and returns a list of segments (n,2) where each segment has been up-sampled to n points
up-sampled to n points
Args: Args:
segments: a list of (n,2) arrays, where n is the number of points in the segment. segments (list): a list of (n,2) arrays, where n is the number of points in the segment.
n: number of points to resample the segment to. Defaults to 1000 n (int): number of points to resample the segment to. Defaults to 1000
Returns: Returns:
the resampled segments. segments (list): the resampled segments.
""" """
for i, s in enumerate(segments): for i, s in enumerate(segments):
s = np.concatenate((s, s[0:1, :]), axis=0) s = np.concatenate((s, s[0:1, :]), axis=0)
@ -501,14 +486,14 @@ def resample_segments(segments, n=1000):
def crop_mask(masks, boxes): def crop_mask(masks, boxes):
""" """
> It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box
Args: Args:
masks: [h, w, n] tensor of masks masks (torch.tensor): [h, w, n] tensor of masks
boxes: [n, 4] tensor of bbox coords in relative point form boxes (torch.tensor): [n, 4] tensor of bbox coordinates in relative point form
Returns: Returns:
The masks are being cropped to the bounding box. (torch.tensor): The masks are being cropped to the bounding box.
""" """
n, h, w = masks.shape n, h, w = masks.shape
x1, y1, x2, y2 = torch.chunk(boxes[:, :, None], 4, 1) # x1 shape(1,1,n) x1, y1, x2, y2 = torch.chunk(boxes[:, :, None], 4, 1) # x1 shape(1,1,n)
@ -520,17 +505,17 @@ def crop_mask(masks, boxes):
def process_mask_upsample(protos, masks_in, bboxes, shape): def process_mask_upsample(protos, masks_in, bboxes, shape):
""" """
> It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher
quality but is slower. quality but is slower.
Args: Args:
protos: [mask_dim, mask_h, mask_w] protos (torch.tensor): [mask_dim, mask_h, mask_w]
masks_in: [n, mask_dim], n is number of masks after nms masks_in (torch.tensor): [n, mask_dim], n is number of masks after nms
bboxes: [n, 4], n is number of masks after nms bboxes (torch.tensor): [n, 4], n is number of masks after nms
shape: the size of the input image shape (tuple): the size of the input image (h,w)
Returns: Returns:
mask (torch.tensor): The upsampled masks.
""" """
c, mh, mw = protos.shape # CHW c, mh, mw = protos.shape # CHW
masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw) masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)
@ -541,17 +526,17 @@ def process_mask_upsample(protos, masks_in, bboxes, shape):
def process_mask(protos, masks_in, bboxes, shape, upsample=False): def process_mask(protos, masks_in, bboxes, shape, upsample=False):
""" """
> It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces
downsampled quality of mask downsampled quality of mask
Args: Args:
protos: [mask_dim, mask_h, mask_w] protos (torch.tensor): [mask_dim, mask_h, mask_w]
masks_in: [n, mask_dim], n is number of masks after nms masks_in (torch.tensor): [n, mask_dim], n is number of masks after nms
bboxes: [n, 4], n is number of masks after nms bboxes (torch.tensor): [n, 4], n is number of masks after nms
shape: the size of the input image shape (tuple): the size of the input image (h,w)
Returns: Returns:
mask (torch.tensor): The processed masks.
""" """
c, mh, mw = protos.shape # CHW c, mh, mw = protos.shape # CHW
@ -572,16 +557,16 @@ def process_mask(protos, masks_in, bboxes, shape, upsample=False):
def process_mask_native(protos, masks_in, bboxes, shape): def process_mask_native(protos, masks_in, bboxes, shape):
""" """
> It takes the output of the mask head, and crops it after upsampling to the bounding boxes. It takes the output of the mask head, and crops it after upsampling to the bounding boxes.
Args: Args:
protos: [mask_dim, mask_h, mask_w] protos (torch.tensor): [mask_dim, mask_h, mask_w]
masks_in: [n, mask_dim], n is number of masks after nms masks_in (torch.tensor): [n, mask_dim], n is number of masks after nms
bboxes: [n, 4], n is number of masks after nms bboxes (torch.tensor): [n, 4], n is number of masks after nms
shape: input_image_size, (h, w) shape (tuple): the size of the input image (h,w)
Returns: Returns:
masks: [h, w, n] masks (torch.tensor): The returned masks with dimensions [h, w, n]
""" """
c, mh, mw = protos.shape # CHW c, mh, mw = protos.shape # CHW
masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw) masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)
@ -598,17 +583,17 @@ def process_mask_native(protos, masks_in, bboxes, shape):
def scale_segments(img1_shape, segments, img0_shape, ratio_pad=None, normalize=False): def scale_segments(img1_shape, segments, img0_shape, ratio_pad=None, normalize=False):
""" """
> Rescale segment coords (xyxy) from img1_shape to img0_shape Rescale segment coordinates (xyxy) from img1_shape to img0_shape
Args: Args:
img1_shape: The shape of the image that the segments are from. img1_shape (tuple): The shape of the image that the segments are from.
segments: the segments to be scaled segments (torch.tensor): the segments to be scaled
img0_shape: the shape of the image that the segmentation is being applied to img0_shape (tuple): the shape of the image that the segmentation is being applied to
ratio_pad: the ratio of the image size to the padded image size. ratio_pad (tuple): the ratio of the image size to the padded image size.
normalize: If True, the coordinates will be normalized to the range [0, 1]. Defaults to False normalize (bool): If True, the coordinates will be normalized to the range [0, 1]. Defaults to False
Returns: Returns:
the segmented image. segments (torch.tensor): the segmented image.
""" """
if ratio_pad is None: # calculate from img0_shape if ratio_pad is None: # calculate from img0_shape
gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # gain = old / new
@ -629,11 +614,11 @@ def scale_segments(img1_shape, segments, img0_shape, ratio_pad=None, normalize=F
def masks2segments(masks, strategy='largest'): def masks2segments(masks, strategy='largest'):
""" """
> It takes a list of masks(n,h,w) and returns a list of segments(n,xy) It takes a list of masks(n,h,w) and returns a list of segments(n,xy)
Args: Args:
masks: the output of the model, which is a tensor of shape (batch_size, 160, 160) masks (torch.tensor): the output of the model, which is a tensor of shape (batch_size, 160, 160)
strategy: 'concat' or 'largest'. Defaults to largest strategy (str): 'concat' or 'largest'. Defaults to largest
Returns: Returns:
segments (List): list of segment masks segments (List): list of segment masks
@ -654,12 +639,12 @@ def masks2segments(masks, strategy='largest'):
def clip_segments(segments, shape): def clip_segments(segments, shape):
""" """
> It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width) It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width)
Args: Args:
segments: a list of segments, each segment is a list of points, each point is a list of x,y segments (list): a list of segments, each segment is a list of points, each point is a list of x,y
coordinates coordinates
shape: the shape of the image shape (tuple): the shape of the image
""" """
if isinstance(segments, torch.Tensor): # faster individually if isinstance(segments, torch.Tensor): # faster individually
segments[:, 0].clamp_(0, shape[1]) # x segments[:, 0].clamp_(0, shape[1]) # x
@ -670,5 +655,13 @@ def clip_segments(segments, shape):
def clean_str(s): def clean_str(s):
# Cleans a string by replacing special characters with underscore _ """
Cleans a string by replacing special characters with underscore _
Args:
s (str): a string needing special characters replaced
Returns:
(str): a string with special characters replaced by an underscore _
"""
return re.sub(pattern="[|@#!¡·$€%&()=?¿^*;:,¨´><+]", repl="_", string=s) return re.sub(pattern="[|@#!¡·$€%&()=?¿^*;:,¨´><+]", repl="_", string=s)

Loading…
Cancel
Save