ultralytics 8.0.134 add MobileSAM support (#3474)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: Laughing-q <1185102784@qq.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2023-07-13 20:25:56 +08:00
parent c55a98ab8e
commit 201e69e4e4
32 changed files with 1472 additions and 841 deletions
--- a/docs/models/index.md
+++ b/docs/models/index.md
@ -17,6 +17,7 @@ In this documentation, we provide information on four major models:
 5. [YOLOv7](./yolov7.md): Updated YOLO models released in 2022 by the authors of YOLOv4.
 6. [YOLOv8](./yolov8.md): The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
 7. [Segment Anything Model (SAM)](./sam.md): Meta's Segment Anything Model (SAM).
+7. [Mobile Segment Anything Model (MobileSAM)](./mobile-sam.md): MobileSAM for mobile applications by Kyung Hee University.
 8. [Fast Segment Anything Model (FastSAM)](./fast-sam.md): FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
 9. [YOLO-NAS](./yolo-nas.md): YOLO Neural Architecture Search (NAS) Models.
 10. [Realtime Detection Transformers (RT-DETR)](./rtdetr.md): Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
@ -44,4 +45,4 @@ model.info()  # display model information
 model.train(data="coco128.yaml", epochs=100)  # train the model
 ```

-For more details on each model, their supported tasks, modes, and performance, please visit their respective documentation pages linked above.
+For more details on each model, their supported tasks, modes, and performance, please visit their respective documentation pages linked above.
--- a/docs/models/mobile-sam.md
+++ b/docs/models/mobile-sam.md
@ -0,0 +1,99 @@
+---
+comments: true
+description: MobileSAM is a lightweight adaptation of the Segment Anything Model (SAM) designed for mobile applications. It maintains the full functionality of the original SAM while significantly improving speed, making it suitable for CPU-only edge devices, such as mobile phones.
+keywords: MobileSAM, Faster Segment Anything, Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI
+---
+
+![MobileSAM Logo](https://github.com/ChaoningZhang/MobileSAM/blob/master/assets/logo2.png?raw=true)
+
+# Faster Segment Anything (MobileSAM)
+
+The MobileSAM paper is now available on [ResearchGate](https://www.researchgate.net/publication/371851844_Faster_Segment_Anything_Towards_Lightweight_SAM_for_Mobile_Applications) and [arXiv](https://arxiv.org/pdf/2306.14289.pdf). The most recent version will initially appear on ResearchGate due to the delayed content update on arXiv.
+
+A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.
+
+MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [SegmentAnythingin3D](https://github.com/Jumpat/SegmentAnythingin3D).
+
+MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.
+
+## Adapting from SAM to MobileSAM
+
+Since MobileSAM retains the same pipeline as the original SAM, we have incorporated the original's pre-processing, post-processing, and all other interfaces. Consequently, those currently using the original SAM can transition to MobileSAM with minimal effort.
+
+MobileSAM performs comparably to the original SAM and retains the same pipeline except for a change in the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a smaller Tiny-ViT (5M). On a single GPU, MobileSAM operates at about 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
+
+The following table provides a comparison of ViT-based image encoders:
+
+| Image Encoder | Original SAM | MobileSAM |
+|---------------|--------------|-----------|
+| Parameters    | 611M         | 5M        |
+| Speed         | 452ms        | 8ms       |
+
+Both the original SAM and MobileSAM utilize the same prompt-guided mask decoder:
+
+| Mask Decoder | Original SAM | MobileSAM |
+|--------------|--------------|-----------|
+| Parameters   | 3.876M       | 3.876M    |
+| Speed        | 4ms          | 4ms       |
+
+Here is the comparison of the whole pipeline:
+
+| Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM |
+|--------------------------|--------------|-----------|
+| Parameters               | 615M         | 9.66M     |
+| Speed                    | 456ms        | 12ms      |
+
+The performance of MobileSAM and the original SAM are demonstrated using both a point and a box as prompts.
+
+![Image with Point as Prompt](https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/assets/mask_box.jpg?raw=true)
+
+![Image with Box as Prompt](https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/assets/mask_box.jpg?raw=true)
+
+With its superior performance, MobileSAM is approximately 5 times smaller and 7 times faster than the current FastSAM. More details are available at the [MobileSAM project page](https://github.com/ChaoningZhang/MobileSAM).
+
+## Testing MobileSAM in Ultralytics
+
+Just like the original SAM, we offer a straightforward testing method in Ultralytics, including modes for both Point and Box prompts.
+
+### Model Download
+
+You can download the model [here](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt).
+
+### Point Prompt
+
+```python
+from ultralytics import SAM
+
+# Load the model
+model = SAM('mobile_sam.pt')
+
+# Predict a segment based on a point prompt
+model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1])
+```
+
+### Box Prompt
+
+```python
+from ultralytics import SAM
+
+# Load the model
+model = SAM('mobile_sam.pt')
+
+# Predict a segment based on a box prompt
+model.predict('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
+```
+
+We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](./sam.md).
+
+### Citing MobileSAM
+
+If you find MobileSAM useful in your research or development work, please consider citing our paper:
+
+```bibtex
+@article{mobile_sam,
+  title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
+  author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
+  journal={arXiv preprint arXiv:2306.14289},
+  year={2023}
+}
+```
--- a/docs/models/sam.md
+++ b/docs/models/sam.md
@ -30,9 +30,11 @@ For an in-depth look at the Segment Anything Model and the SA-1B dataset, please

 The Segment Anything Model can be employed for a multitude of downstream tasks that go beyond its training data. This includes edge detection, object proposal generation, instance segmentation, and preliminary text-to-mask prediction. With prompt engineering, SAM can swiftly adapt to new tasks and data distributions in a zero-shot manner, establishing it as a versatile and potent tool for all your image segmentation needs.

-!!! example "SAM prediction example"
+### SAM prediction example

-    Device is determined automatically. If a GPU is available then it will be used, otherwise inference will run on CPU.
+!!! example "Segment with prompts"
+
+    Segment image with given prompts.

    === "Python"
    
@ -45,7 +47,29 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
        # Display model information (optional)
        model.info()

-        # Run inference with the model
+        # Run inference with bboxes prompt
+        model('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
+
+        # Run inference with points prompt
+        model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1])
+        ```
+
+!!! example "Segment everything"
+
+    Segment the whole image.
+
+    === "Python"
+    
+        ```python
+        from ultralytics import SAM
+        
+        # Load a model
+        model = SAM('sam_b.pt')
+
+        # Display model information (optional)
+        model.info()
+
+        # Run inference
        model('path/to/image.jpg')
        ```
    === "CLI"
@ -55,6 +79,48 @@ The Segment Anything Model can be employed for a multitude of downstream tasks t
        yolo predict model=sam_b.pt source=path/to/image.jpg
        ```

+- The logic here is to segment the whole image if you don't pass any prompts(bboxes/points/masks).
+
+!!! example "SAMPredictor example"
+
+    This way you can set image once and run prompts inference multiple times without running image encoder multiple times.
+
+    === "Prompt inference"
+    
+        ```python
+        from ultralytics.vit.sam import Predictor as SAMPredictor
+
+        # Create SAMPredictor
+        overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
+        predictor = SAMPredictor(overrides=overrides)
+
+        # Set image
+        predictor.set_image("ultralytics/assets/zidane.jpg")  # set with image file
+        predictor.set_image(cv2.imread("ultralytics/assets/zidane.jpg"))  # set with np.ndarray
+        results = predictor(bboxes=[439, 437, 524, 709])
+        results = predictor(points=[900, 370], labels=[1])
+        # Reset image
+        predictor.reset_image()
+        ```
+
+    Segment everything with additional args.
+
+    === "Segment everything"
+    
+        ```python
+        from ultralytics.vit.sam import Predictor as SAMPredictor
+
+        # Create SAMPredictor
+        overrides = dict(conf=0.25, task='segment', mode='predict', imgsz=1024, model="mobile_sam.pt")
+        predictor = SAMPredictor(overrides=overrides)
+
+        # segment with additional args
+        results = predictor(source="ultralytics/assets/zidane.jpg", crop_n_layers=1, points_stride=64)
+
+        ```
+
+- More additional args for `Segment everything` see [`Predictor/generate` Reference](../reference/vit/sam/predict.md).
+
 ## Available Models and Supported Tasks

 | Model Type | Pre-trained Weights | Tasks Supported       |
@ -76,21 +142,33 @@ Here we compare Meta's smallest SAM model, SAM-b, with Ultralytics smallest segm

 | Model                                          | Size                       | Parameters             | Speed (CPU)             |
 |------------------------------------------------|----------------------------|------------------------|-------------------------|
-| Meta's SAM-b                                   | 358 MB                     | 94.7 M                 | 51096 ms                |
-| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms** (866x faster) |
+| Meta's SAM-b                                   | 358 MB                     | 94.7 M                 | 51096 ms/im             |
+| [MobileSAM](mobile-sam.md)                     | 40.7 MB                    | 10.1 M                 | 46122 ms/im             |
+| [FastSAM-s](fast-sam.md) with YOLOv8 backbone  | 23.7 MB                    | 11.8 M                 | 115 ms/im               |
+| Ultralytics [YOLOv8n-seg](../tasks/segment.md) | **6.7 MB** (53.4x smaller) | **3.4 M** (27.9x less) | **59 ms/im** (866x faster) |

-This comparison shows the order-of-magnitude differences in the model sizes and speeds. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient since they are dedicated to more targeted use cases.
+This comparison shows the order-of-magnitude differences in the model sizes and speeds between models. Whereas SAM presents unique capabilities for automatic segmenting, it is not a direct competitor to YOLOv8 segment models, which are smaller, faster and more efficient.

-To reproduce this test:
+Tests run on a 2023 Apple M2 Macbook with 16GB of RAM. To reproduce this test:

 ```python
-from ultralytics import SAM, YOLO
+from ultralytics import FastSAM, SAM, YOLO

 # Profile SAM-b
 model = SAM('sam_b.pt')
 model.info()
 model('ultralytics/assets')

+# Profile MobileSAM
+model = SAM('mobile_sam.pt')
+model.info()
+model('ultralytics/assets')
+
+# Profile FastSAM-s
+model = FastSAM('FastSAM-s.pt')
+model.info()
+model('ultralytics/assets')
+
 # Profile YOLOv8n-seg
 model = YOLO('yolov8n-seg.pt')
 model.info()
@ -140,4 +218,4 @@ If you find SAM useful in your research or development work, please consider cit

 We would like to express our gratitude to Meta AI for creating and maintaining this valuable resource for the computer vision community.

-*keywords: Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI.*
+*keywords: Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI.*
--- a/docs/reference/vit/sam/autosize.md
+++ b/docs/reference/vit/sam/autosize.md
@ -1,9 +0,0 @@
---
-description: Learn how to use the ResizeLongestSide module in Ultralytics YOLO for automatic image resizing. Resize your images with ease.
-keywords: ResizeLongestSide, Ultralytics YOLO, automatic image resizing, image resizing
---
-
-## ResizeLongestSide
---
-### ::: ultralytics.vit.sam.autosize.ResizeLongestSide
-<br><br>
--- a/docs/reference/vit/sam/build.md
+++ b/docs/reference/vit/sam/build.md
@ -18,6 +18,11 @@ keywords: SAM, VIT, computer vision models, build SAM models, build VIT models,
 ### ::: ultralytics.vit.sam.build.build_sam_vit_b
 <br><br>

+## build_mobile_sam
+---
+### ::: ultralytics.vit.sam.build.build_mobile_sam
+<br><br>
+
 ## _build_sam
 ---
 ### ::: ultralytics.vit.sam.build._build_sam
--- a/docs/reference/vit/sam/modules/mask_generator.md
+++ b/docs/reference/vit/sam/modules/mask_generator.md
@ -1,9 +0,0 @@
---
-description: Learn about the SamAutomaticMaskGenerator module in Ultralytics YOLO, an automatic mask generator for image segmentation.
-keywords: SamAutomaticMaskGenerator, Ultralytics YOLO, automatic mask generator, image segmentation
---
-
-## SamAutomaticMaskGenerator
---
-### ::: ultralytics.vit.sam.modules.mask_generator.SamAutomaticMaskGenerator
-<br><br>
--- a/docs/reference/vit/sam/modules/prompt_predictor.md
+++ b/docs/reference/vit/sam/modules/prompt_predictor.md
@ -1,9 +0,0 @@
---
-description: Learn about PromptPredictor - a module in Ultralytics VIT SAM that predicts image captions based on prompts. Get started today!.
-keywords: PromptPredictor, Ultralytics, YOLO, VIT SAM, image captioning, deep learning, computer vision
---
-
-## PromptPredictor
---
-### ::: ultralytics.vit.sam.modules.prompt_predictor.PromptPredictor
-<br><br>
--- a/docs/reference/vit/sam/modules/tiny_encoder.md
+++ b/docs/reference/vit/sam/modules/tiny_encoder.md
@ -0,0 +1,59 @@
+---
+description: Learn about the Conv2d_BN, MBConv, ConvLayer, Attention, BasicLayer, and TinyViT modules.
+keywords: Conv2d_BN, MBConv, ConvLayer, Attention, BasicLayer, TinyViT
+---
+
+## Conv2d_BN
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.Conv2d_BN
+<br><br>
+
+## PatchEmbed
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.PatchEmbed
+<br><br>
+
+## MBConv
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.MBConv
+<br><br>
+
+## PatchMerging
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.PatchMerging
+<br><br>
+
+## ConvLayer
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.ConvLayer
+<br><br>
+
+## Mlp
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.Mlp
+<br><br>
+
+## Attention
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.Attention
+<br><br>
+
+## TinyViTBlock
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.TinyViTBlock
+<br><br>
+
+## BasicLayer
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.BasicLayer
+<br><br>
+
+## LayerNorm2d
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.LayerNorm2d
+<br><br>
+
+## TinyViT
+---
+### ::: ultralytics.vit.sam.modules.tiny_encoder.TinyViT
+<br><br>
--- a/docs/reference/yolo/fastsam/model.md
+++ b/docs/reference/yolo/fastsam/model.md
@ -0,0 +1,9 @@
+---
+description: Learn how to use FastSAM in Ultralytics YOLO to improve object detection accuracy and speed.
+keywords: FastSAM, object detection, accuracy, speed, Ultralytics YOLO
+---
+
+## FastSAM
+---
+### ::: ultralytics.yolo.fastsam.model.FastSAM
+<br><br>
--- a/docs/reference/yolo/fastsam/predict.md
+++ b/docs/reference/yolo/fastsam/predict.md
@ -0,0 +1,9 @@
+---
+description: FastSAMPredictor API reference and usage guide for the Ultralytics YOLO object detection library.
+keywords: FastSAMPredictor, API, reference, usage, guide, Ultralytics, YOLO, object detection, library
+---
+
+## FastSAMPredictor
+---
+### ::: ultralytics.yolo.fastsam.predict.FastSAMPredictor
+<br><br>
--- a/docs/reference/yolo/fastsam/prompt.md
+++ b/docs/reference/yolo/fastsam/prompt.md
@ -0,0 +1,9 @@
+---
+description: Learn how to use FastSAMPrompt in Ultralytics YOLO for fast and efficient object detection and tracking.
+keywords: FastSAMPrompt, Ultralytics YOLO, object detection, tracking, fast, efficient
+---
+
+## FastSAMPrompt
+---
+### ::: ultralytics.yolo.fastsam.prompt.FastSAMPrompt
+<br><br>
--- a/docs/reference/yolo/fastsam/utils.md
+++ b/docs/reference/yolo/fastsam/utils.md
@ -0,0 +1,14 @@
+---
+description: Learn how to adjust bounding boxes to the image border in Ultralytics YOLO framework. Improve object detection accuracy by accounting for image borders.
+keywords: adjust_bboxes_to_image_border, Ultralytics YOLO, object detection, bounding boxes, image border
+---
+
+## adjust_bboxes_to_image_border
+---
+### ::: ultralytics.yolo.fastsam.utils.adjust_bboxes_to_image_border
+<br><br>
+
+## bbox_iou
+---
+### ::: ultralytics.yolo.fastsam.utils.bbox_iou
+<br><br>
--- a/docs/reference/yolo/fastsam/val.md
+++ b/docs/reference/yolo/fastsam/val.md
@ -0,0 +1,9 @@
+---
+description: Learn about the FastSAMValidator module in Ultralytics YOLO. Validate and evaluate Segment Anything Model (SAM) datasets for object detection models with ease.
+keywords: FastSAMValidator, Ultralytics YOLO, SAM datasets, object detection, validation, evaluation
+---
+
+## FastSAMValidator
+---
+### ::: ultralytics.yolo.fastsam.val.FastSAMValidator
+<br><br>
--- a/docs/reference/yolo/utils/ops.md
+++ b/docs/reference/yolo/utils/ops.md
@ -123,6 +123,11 @@ keywords: Ultralytics, YOLO, Utils Ops, Functions, coco80_to_coco91_class, scale
 ### ::: ultralytics.yolo.utils.ops.process_mask_native
 <br><br>

+## scale_masks
+---
+### ::: ultralytics.yolo.utils.ops.scale_masks
+<br><br>
+
 ## scale_coords
 ---
 ### ::: ultralytics.yolo.utils.ops.scale_coords