ultralytics 8.0.134
add MobileSAM support (#3474)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: Laughing-q <1185102784@qq.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This commit is contained in:
@ -17,6 +17,7 @@ In this documentation, we provide information on four major models:
|
||||
5. [YOLOv7](./yolov7.md): Updated YOLO models released in 2022 by the authors of YOLOv4.
|
||||
6. [YOLOv8](./yolov8.md): The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
|
||||
7. [Segment Anything Model (SAM)](./sam.md): Meta's Segment Anything Model (SAM).
|
||||
7. [Mobile Segment Anything Model (MobileSAM)](./mobile-sam.md): MobileSAM for mobile applications by Kyung Hee University.
|
||||
8. [Fast Segment Anything Model (FastSAM)](./fast-sam.md): FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
|
||||
9. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
|
||||
10. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
|
||||
@ -44,4 +45,4 @@ model.info() # display model information
|
||||
model.train(data="coco128.yaml", epochs=100) # train the model
|
||||
```
|
||||
|
||||
For more details on each model, their supported tasks, modes, and performance, please visit their respective documentation pages linked above.
|
||||
For more details on each model, their supported tasks, modes, and performance, please visit their respective documentation pages linked above.
|
||||
|
99
docs/models/mobile-sam.md
Normal file
99
docs/models/mobile-sam.md
Normal file
@ -0,0 +1,99 @@
|
||||
---
|
||||
comments: true
|
||||
description: MobileSAM is a lightweight adaptation of the Segment Anything Model (SAM) designed for mobile applications. It maintains the full functionality of the original SAM while significantly improving speed, making it suitable for CPU-only edge devices, such as mobile phones.
|
||||
keywords: MobileSAM, Faster Segment Anything, Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI
|
||||
---
|
||||
|
||||

|
||||
|
||||
# Faster Segment Anything (MobileSAM)
|
||||
|
||||
The MobileSAM paper is now available on [ResearchGate](https://www.researchgate.net/publication/371851844_Faster_Segment_Anything_Towards_Lightweight_SAM_for_Mobile_Applications) and [arXiv](https://arxiv.org/pdf/2306.14289.pdf). The most recent version will initially appear on ResearchGate due to the delayed content update on arXiv.
|
||||
|
||||
A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.
|
||||
|
||||
MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [SegmentAnythingin3D](https://github.com/Jumpat/SegmentAnythingin3D).
|
||||
|
||||
MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.
|
||||
|
||||
## Adapting from SAM to MobileSAM
|
||||
|
||||
Since MobileSAM retains the same pipeline as the original SAM, we have incorporated the original's pre-processing, post-processing, and all other interfaces. Consequently, those currently using the original SAM can transition to MobileSAM with minimal effort.
|
||||
|
||||
MobileSAM performs comparably to the original SAM and retains the same pipeline except for a change in the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a smaller Tiny-ViT (5M). On a single GPU, MobileSAM operates at about 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
|
||||
|
||||
The following table provides a comparison of ViT-based image encoders:
|
||||
|
||||
| Image Encoder | Original SAM | MobileSAM |
|
||||
|---------------|--------------|-----------|
|
||||
| Parameters | 611M | 5M |
|
||||
| Speed | 452ms | 8ms |
|
||||
|
||||
Both the original SAM and MobileSAM utilize the same prompt-guided mask decoder:
|
||||
|
||||
| Mask Decoder | Original SAM | MobileSAM |
|
||||
|--------------|--------------|-----------|
|
||||
| Parameters | 3.876M | 3.876M |
|
||||
| Speed | 4ms | 4ms |
|
||||
|
||||
Here is the comparison of the whole pipeline:
|
||||
|
||||
| Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM |
|
||||
|--------------------------|--------------|-----------|
|
||||
| Parameters | 615M | 9.66M |
|
||||
| Speed | 456ms | 12ms |
|
||||
|
||||
The performance of MobileSAM and the original SAM are demonstrated using both a point and a box as prompts.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
With its superior performance, MobileSAM is approximately 5 times smaller and 7 times faster than the current FastSAM. More details are available at the [MobileSAM project page](https://github.com/ChaoningZhang/MobileSAM).
|
||||
|
||||
## Testing MobileSAM in Ultralytics
|
||||
|
||||
Just like the original SAM, we offer a straightforward testing method in Ultralytics, including modes for both Point and Box prompts.
|
||||
|
||||
### Model Download
|
||||
|
||||
You can download the model [here](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt).
|
||||
|
||||
### Point Prompt
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
# Load the model
|
||||
model = SAM('mobile_sam.pt')
|
||||
|
||||
# Predict a segment based on a point prompt
|
||||
model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1])
|
||||
```
|
||||
|
||||
### Box Prompt
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
# Load the model
|
||||
model = SAM('mobile_sam.pt')
|
||||
|
||||
# Predict a segment based on a box prompt
|
||||
model.predict('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
|
||||
```
|
||||
|
||||
We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](./sam.md).
|
||||
|
||||
### Citing MobileSAM
|
||||
|
||||
If you find MobileSAM useful in your research or development work, please consider citing our paper:
|
||||
|
||||
```bibtex
|
||||
@article{mobile_sam,
|
||||
title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
|
||||
author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
|
||||
journal={arXiv preprint arXiv:2306.14289},
|
||||
year={2023}
|
||||
}
|
||||
```
|
@ -30,9 +30,11 @@ For an in-depth look at the Segment Anything Model and the SA-1B dataset, please
|
||||
|
||||
The Segment Anything Model can be employed for a multitude of downstream tasks that go beyond its training data. This includes edge detection, object proposal generation, instance segmentation, and preliminary text-to-mask prediction. With prompt engineering, SAM can swiftly adapt to new tasks and data distributions in a zero-shot manner, establishing it as a versatile and potent tool for all your image segmentation needs.
|
||||
|
||||
!!! example "SAM prediction example"
|
||||
### SAM prediction example
|
||||
|
||||
Device is determined automatically. If a GPU is available then it will be used, otherwise inference will run on CPU.
|
||||
!!! example "Segment with prompts"
|
||||
|
||||
Segment image with given prompts.
|
||||
|
||||
=== "Python"
|
||||
|
||||
@ -45,7 +47,29 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Run inference with the model
|
||||
# Run inference with bboxes prompt
|
||||
model('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
|
||||
|
||||
# Run inference with points prompt
|
||||
model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1])
|
||||
```
|
||||
|
||||
!!! example "Segment everything"
|
||||
|
||||
Segment the whole image.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
# Load a model
|
||||
model = SAM('sam_b.pt')
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Run inference
|
||||
model('path/to/image.jpg')
|
||||
```
|
||||
=== "CLI"
|
||||
@ -55,6 +79,48 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
|
||||
yolo predict model=sam_b.pt source=path/to/image.jpg
|
||||
```
|
||||
|
||||
- The logic here is to segment the whole image if you don't pass any prompts(bboxes/points/masks).
|
||||
|
||||
!!! example "SAMPredictor example"
|
||||
|
||||
This way you can set image once and run prompts inference multiple times without running image encoder multiple times.
|
||||
|
||||
=== "Prompt inference"
|
||||
|
||||
```python
|
||||
from ultralytics.vit.sam import Predictor as SAMPredictor
|
||||
|
||||
# Create SAMPredictor
|
||||
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
|
||||
predictor = SAMPredictor(overrides=overrides)
|
||||
|
||||
# Set image
|
||||
predictor.set_image("ultralytics/assets/zidane.jpg") # set with image file
|
||||
predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg")) # set with np.ndarray
|
||||
results = predictor(bboxes=[439, 437, 524, 709])
|
||||
results = predictor(points=[900, 370], labels=[1])
|
||||
# Reset image
|
||||
predictor.reset_image()
|
||||
```
|
||||
|
||||
Segment everything with additional args.
|
||||
|
||||
=== "Segment everything"
|
||||
|
||||
```python
|
||||
from ultralytics.vit.sam import Predictor as SAMPredictor
|
||||
|
||||
# Create SAMPredictor
|
||||
overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
|
||||
predictor = SAMPredictor(overrides=overrides)
|
||||
|
||||
# segment with additional args
|
||||
results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64)
|
||||
|
||||
```
|
||||
|
||||
- More additional args for `Segment everything` see [`Predictor/generate` Reference](../reference/vit/sam/predict.md).
|
||||
|
||||
## Available Models and Supported Tasks
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported |
|
||||
@ -76,21 +142,33 @@ Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segm
|
||||
|
||||
| Model | Size | Parameters | Speed (CPU) |
|
||||
|------------------------------------------------|----------------------------|------------------------|-------------------------|
|
||||
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms |
|
||||
| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms** (866x faster) |
|
||||
| Meta's SAM-b | 358 MB | 94.7 M | 51096 ms/im |
|
||||
| [MobileSAM](mobile-sam.md) | 40.7 MB | 10.1 M | 46122 ms/im |
|
||||
| [FastSAM-s](fast-sam.md) with YOLOv8 backbone | 23.7 MB | 11.8 M | 115 ms/im |
|
||||
| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) |
|
||||
|
||||
This comparison shows the order-of-magnitude differences in the model sizes and speeds. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient since they are dedicated to more targeted use cases.
|
||||
This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.
|
||||
|
||||
To reproduce this test:
|
||||
Tests run on a 2023 Apple M2 Macbook with 16GB of RAM. To reproduce this test:
|
||||
|
||||
```python
|
||||
from ultralytics import SAM, YOLO
|
||||
from ultralytics import FastSAM, SAM, YOLO
|
||||
|
||||
# Profile SAM-b
|
||||
model = SAM('sam_b.pt')
|
||||
model.info()
|
||||
model('ultralytics/assets')
|
||||
|
||||
# Profile MobileSAM
|
||||
model = SAM('mobile_sam.pt')
|
||||
model.info()
|
||||
model('ultralytics/assets')
|
||||
|
||||
# Profile FastSAM-s
|
||||
model = FastSAM('FastSAM-s.pt')
|
||||
model.info()
|
||||
model('ultralytics/assets')
|
||||
|
||||
# Profile YOLOv8n-seg
|
||||
model = YOLO('yolov8n-seg.pt')
|
||||
model.info()
|
||||
@ -140,4 +218,4 @@ If you find SAM useful in your research or development work, please consider cit
|
||||
|
||||
We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the computer vision community.
|
||||
|
||||
*keywords: Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI.*
|
||||
*keywords: Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI.*
|
||||
|
@ -1,9 +0,0 @@
|
||||
---
|
||||
description: Learn how to use the ResizeLongestSide module in Ultralytics YOLO for automatic image resizing. Resize your images with ease.
|
||||
keywords: ResizeLongestSide, Ultralytics YOLO, automatic image resizing, image resizing
|
||||
---
|
||||
|
||||
## ResizeLongestSide
|
||||
---
|
||||
### ::: ultralytics.vit.sam.autosize.ResizeLongestSide
|
||||
<br><br>
|
@ -18,6 +18,11 @@ keywords: SAM, VIT, computer vision models, build SAM models, build VIT models,
|
||||
### ::: ultralytics.vit.sam.build.build_sam_vit_b
|
||||
<br><br>
|
||||
|
||||
## build_mobile_sam
|
||||
---
|
||||
### ::: ultralytics.vit.sam.build.build_mobile_sam
|
||||
<br><br>
|
||||
|
||||
## _build_sam
|
||||
---
|
||||
### ::: ultralytics.vit.sam.build._build_sam
|
||||
|
@ -1,9 +0,0 @@
|
||||
---
|
||||
description: Learn about the SamAutomaticMaskGenerator module in Ultralytics YOLO, an automatic mask generator for image segmentation.
|
||||
keywords: SamAutomaticMaskGenerator, Ultralytics YOLO, automatic mask generator, image segmentation
|
||||
---
|
||||
|
||||
## SamAutomaticMaskGenerator
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.mask_generator.SamAutomaticMaskGenerator
|
||||
<br><br>
|
@ -1,9 +0,0 @@
|
||||
---
|
||||
description: Learn about PromptPredictor - a module in Ultralytics VIT SAM that predicts image captions based on prompts. Get started today!.
|
||||
keywords: PromptPredictor, Ultralytics, YOLO, VIT SAM, image captioning, deep learning, computer vision
|
||||
---
|
||||
|
||||
## PromptPredictor
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.prompt_predictor.PromptPredictor
|
||||
<br><br>
|
59
docs/reference/vit/sam/modules/tiny_encoder.md
Normal file
59
docs/reference/vit/sam/modules/tiny_encoder.md
Normal file
@ -0,0 +1,59 @@
|
||||
---
|
||||
description: Learn about the Conv2d_BN, MBConv, ConvLayer, Attention, BasicLayer, and TinyViT modules.
|
||||
keywords: Conv2d_BN, MBConv, ConvLayer, Attention, BasicLayer, TinyViT
|
||||
---
|
||||
|
||||
## Conv2d_BN
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.Conv2d_BN
|
||||
<br><br>
|
||||
|
||||
## PatchEmbed
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.PatchEmbed
|
||||
<br><br>
|
||||
|
||||
## MBConv
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.MBConv
|
||||
<br><br>
|
||||
|
||||
## PatchMerging
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.PatchMerging
|
||||
<br><br>
|
||||
|
||||
## ConvLayer
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.ConvLayer
|
||||
<br><br>
|
||||
|
||||
## Mlp
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.Mlp
|
||||
<br><br>
|
||||
|
||||
## Attention
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.Attention
|
||||
<br><br>
|
||||
|
||||
## TinyViTBlock
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.TinyViTBlock
|
||||
<br><br>
|
||||
|
||||
## BasicLayer
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.BasicLayer
|
||||
<br><br>
|
||||
|
||||
## LayerNorm2d
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.LayerNorm2d
|
||||
<br><br>
|
||||
|
||||
## TinyViT
|
||||
---
|
||||
### ::: ultralytics.vit.sam.modules.tiny_encoder.TinyViT
|
||||
<br><br>
|
9
docs/reference/yolo/fastsam/model.md
Normal file
9
docs/reference/yolo/fastsam/model.md
Normal file
@ -0,0 +1,9 @@
|
||||
---
|
||||
description: Learn how to use FastSAM in Ultralytics YOLO to improve object detection accuracy and speed.
|
||||
keywords: FastSAM, object detection, accuracy, speed, Ultralytics YOLO
|
||||
---
|
||||
|
||||
## FastSAM
|
||||
---
|
||||
### ::: ultralytics.yolo.fastsam.model.FastSAM
|
||||
<br><br>
|
9
docs/reference/yolo/fastsam/predict.md
Normal file
9
docs/reference/yolo/fastsam/predict.md
Normal file
@ -0,0 +1,9 @@
|
||||
---
|
||||
description: FastSAMPredictor API reference and usage guide for the Ultralytics YOLO object detection library.
|
||||
keywords: FastSAMPredictor, API, reference, usage, guide, Ultralytics, YOLO, object detection, library
|
||||
---
|
||||
|
||||
## FastSAMPredictor
|
||||
---
|
||||
### ::: ultralytics.yolo.fastsam.predict.FastSAMPredictor
|
||||
<br><br>
|
9
docs/reference/yolo/fastsam/prompt.md
Normal file
9
docs/reference/yolo/fastsam/prompt.md
Normal file
@ -0,0 +1,9 @@
|
||||
---
|
||||
description: Learn how to use FastSAMPrompt in Ultralytics YOLO for fast and efficient object detection and tracking.
|
||||
keywords: FastSAMPrompt, Ultralytics YOLO, object detection, tracking, fast, efficient
|
||||
---
|
||||
|
||||
## FastSAMPrompt
|
||||
---
|
||||
### ::: ultralytics.yolo.fastsam.prompt.FastSAMPrompt
|
||||
<br><br>
|
14
docs/reference/yolo/fastsam/utils.md
Normal file
14
docs/reference/yolo/fastsam/utils.md
Normal file
@ -0,0 +1,14 @@
|
||||
---
|
||||
description: Learn how to adjust bounding boxes to the image border in Ultralytics YOLO framework. Improve object detection accuracy by accounting for image borders.
|
||||
keywords: adjust_bboxes_to_image_border, Ultralytics YOLO, object detection, bounding boxes, image border
|
||||
---
|
||||
|
||||
## adjust_bboxes_to_image_border
|
||||
---
|
||||
### ::: ultralytics.yolo.fastsam.utils.adjust_bboxes_to_image_border
|
||||
<br><br>
|
||||
|
||||
## bbox_iou
|
||||
---
|
||||
### ::: ultralytics.yolo.fastsam.utils.bbox_iou
|
||||
<br><br>
|
9
docs/reference/yolo/fastsam/val.md
Normal file
9
docs/reference/yolo/fastsam/val.md
Normal file
@ -0,0 +1,9 @@
|
||||
---
|
||||
description: Learn about the FastSAMValidator module in Ultralytics YOLO. Validate and evaluate Segment Anything Model (SAM) datasets for object detection models with ease.
|
||||
keywords: FastSAMValidator, Ultralytics YOLO, SAM datasets, object detection, validation, evaluation
|
||||
---
|
||||
|
||||
## FastSAMValidator
|
||||
---
|
||||
### ::: ultralytics.yolo.fastsam.val.FastSAMValidator
|
||||
<br><br>
|
@ -123,6 +123,11 @@ keywords: Ultralytics, YOLO, Utils Ops, Functions, coco80_to_coco91_class, scale
|
||||
### ::: ultralytics.yolo.utils.ops.process_mask_native
|
||||
<br><br>
|
||||
|
||||
## scale_masks
|
||||
---
|
||||
### ::: ultralytics.yolo.utils.ops.scale_masks
|
||||
<br><br>
|
||||
|
||||
## scale_coords
|
||||
---
|
||||
### ::: ultralytics.yolo.utils.ops.scale_coords
|
||||
|
Reference in New Issue
Block a user