YOLOv8-16bit/docs/modes/train.md

<img width="1024" src="https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png">

**Train mode** is used for training a YOLOv8 model on a custom dataset. In this mode, the model is trained using the
specified dataset and hyperparameters. The training process involves optimizing the model's parameters so that it can
accurately predict the classes and locations of objects in an image.

!!! tip "Tip"

    * YOLOv8 datasets like COCO, VOC, ImageNet and many others automatically download on first use, i.e. `yolo train data=coco.yaml`

## Usage Examples

Train YOLOv8n on the COCO128 dataset for 100 epochs at image size 640. See Arguments section below for a full list of
training arguments.

!!! example ""

    === "Python"
    
        ```python
        from ultralytics import YOLO
        
        # Load a model
        model = YOLO('yolov8n.yaml')  # build a new model from YAML
        model = YOLO('yolov8n.pt')  # load a pretrained model (recommended for training)
        model = YOLO('yolov8n.yaml').load('yolov8n.pt')  # build from YAML and transfer weights
        
        # Train the model
        model.train(data='coco128.yaml', epochs=100, imgsz=640)
        ```
    === "CLI"
    
        ```bash
        # Build a new model from YAML and start training from scratch
        yolo detect train data=coco128.yaml model=yolov8n.yaml epochs=100 imgsz=640

        # Start training from a pretrained *.pt model
        yolo detect train data=coco128.yaml model=yolov8n.pt epochs=100 imgsz=640

        # Build a new model from YAML, transfer pretrained weights to it and start training
        yolo detect train data=coco128.yaml model=yolov8n.yaml pretrained=yolov8n.pt epochs=100 imgsz=640
        ```

## Arguments

Training settings for YOLO models refer to the various hyperparameters and configurations used to train the model on a
dataset. These settings can affect the model's performance, speed, and accuracy. Some common YOLO training settings
include the batch size, learning rate, momentum, and weight decay. Other factors that may affect the training process
include the choice of optimizer, the choice of loss function, and the size and composition of the training dataset. It
is important to carefully tune and experiment with these settings to achieve the best possible performance for a given
task.

| Key               | Value    | Description                                                                 |
|-------------------|----------|-----------------------------------------------------------------------------|
| `model`           | `None`   | path to model file, i.e. yolov8n.pt, yolov8n.yaml                           |
| `data`            | `None`   | path to data file, i.e. coco128.yaml                                        |
| `epochs`          | `100`    | number of epochs to train for                                               |
| `patience`        | `50`     | epochs to wait for no observable improvement for early stopping of training |
| `batch`           | `16`     | number of images per batch (-1 for AutoBatch)                               |
| `imgsz`           | `640`    | size of input images as integer or w,h                                      |
| `save`            | `True`   | save train checkpoints and predict results                                  |
| `save_period`     | `-1`     | Save checkpoint every x epochs (disabled if < 1)                            |
| `cache`           | `False`  | True/ram, disk or False. Use cache for data loading                         |
| `device`          | `None`   | device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu        |
| `workers`         | `8`      | number of worker threads for data loading (per RANK if DDP)                 |
| `project`         | `None`   | project name                                                                |
| `name`            | `None`   | experiment name                                                             |
| `exist_ok`        | `False`  | whether to overwrite existing experiment                                    |
| `pretrained`      | `False`  | whether to use a pretrained model                                           |
| `optimizer`       | `'SGD'`  | optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']               |
| `verbose`         | `False`  | whether to print verbose output                                             |
| `seed`            | `0`      | random seed for reproducibility                                             |
| `deterministic`   | `True`   | whether to enable deterministic mode                                        |
| `single_cls`      | `False`  | train multi-class data as single-class                                      |
| `image_weights`   | `False`  | use weighted image selection for training                                   |
| `rect`            | `False`  | rectangular training with each batch collated for minimum padding           |
| `cos_lr`          | `False`  | use cosine learning rate scheduler                                          |
| `close_mosaic`    | `10`     | disable mosaic augmentation for final 10 epochs                             |
| `resume`          | `False`  | resume training from last checkpoint                                        |
| `amp`             | `True`   | Automatic Mixed Precision (AMP) training, choices=[True, False]             |
| `lr0`             | `0.01`   | initial learning rate (i.e. SGD=1E-2, Adam=1E-3)                            |
| `lrf`             | `0.01`   | final learning rate (lr0 * lrf)                                             |
| `momentum`        | `0.937`  | SGD momentum/Adam beta1                                                     |
| `weight_decay`    | `0.0005` | optimizer weight decay 5e-4                                                 |
| `warmup_epochs`   | `3.0`    | warmup epochs (fractions ok)                                                |
| `warmup_momentum` | `0.8`    | warmup initial momentum                                                     |
| `warmup_bias_lr`  | `0.1`    | warmup initial bias lr                                                      |
| `box`             | `7.5`    | box loss gain                                                               |
| `cls`             | `0.5`    | cls loss gain (scale with pixels)                                           |
| `dfl`             | `1.5`    | dfl loss gain                                                               |
| `pose`            | `12.0`   | pose loss gain (pose-only)                                                  |
| `kobj`            | `2.0`    | keypoint obj loss gain (pose-only)                                          |
| `fl_gamma`        | `0.0`    | focal loss gamma (efficientDet default gamma=1.5)                           |
| `label_smoothing` | `0.0`    | label smoothing (fraction)                                                  |
| `nbs`             | `64`     | nominal batch size                                                          |
| `overlap_mask`    | `True`   | masks should overlap during training (segment train only)                   |
| `mask_ratio`      | `4`      | mask downsample ratio (segment train only)                                  |
| `dropout`         | `0.0`    | use dropout regularization (classify train only)                            |
| `val`             | `True`   | validate/test during training                                               |
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago			`<img width="1024" src="https://github.com/ultralytics/assets/raw/main/yolov8/banner-integrations.png">`

			`Train mode is used for training a YOLOv8 model on a custom dataset. In this mode, the model is trained using the`
			`specified dataset and hyperparameters. The training process involves optimizing the model's parameters so that it can`
			`accurately predict the classes and locations of objects in an image.`

			`!!! tip "Tip"`

			* YOLOv8 datasets like COCO, VOC, ImageNet and many others automatically download on first use, i.e. `yolo train data=coco.yaml`

			`## Usage Examples`

			`Train YOLOv8n on the COCO128 dataset for 100 epochs at image size 640. See Arguments section below for a full list of`
			`training arguments.`

			`!!! example ""`

			`=== "Python"`

			```python
			`from ultralytics import YOLO`

			`# Load a model`
`ultralytics 8.0.54` TFLite export improvements and fixes (#1447) Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2 years ago			`model = YOLO('yolov8n.yaml') # build a new model from YAML`
			`model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training)`
			`model = YOLO('yolov8n.yaml').load('yolov8n.pt') # build from YAML and transfer weights`
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago
			`# Train the model`
`ultralytics 8.0.54` TFLite export improvements and fixes (#1447) Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2 years ago			`model.train(data='coco128.yaml', epochs=100, imgsz=640)`
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago			```
			`=== "CLI"`

			```bash
`ultralytics 8.0.54` TFLite export improvements and fixes (#1447) Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2 years ago			`# Build a new model from YAML and start training from scratch`
			`yolo detect train data=coco128.yaml model=yolov8n.yaml epochs=100 imgsz=640`

			`# Start training from a pretrained *.pt model`
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago			`yolo detect train data=coco128.yaml model=yolov8n.pt epochs=100 imgsz=640`
`ultralytics 8.0.54` TFLite export improvements and fixes (#1447) Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2 years ago
			`# Build a new model from YAML, transfer pretrained weights to it and start training`
			`yolo detect train data=coco128.yaml model=yolov8n.yaml pretrained=yolov8n.pt epochs=100 imgsz=640`
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago			```

			`## Arguments`

			`Training settings for YOLO models refer to the various hyperparameters and configurations used to train the model on a`
			`dataset. These settings can affect the model's performance, speed, and accuracy. Some common YOLO training settings`
			`include the batch size, learning rate, momentum, and weight decay. Other factors that may affect the training process`
			`include the choice of optimizer, the choice of loss function, and the size and composition of the training dataset. It`
			`is important to carefully tune and experiment with these settings to achieve the best possible performance for a given`
			`task.`

			`\| Key \| Value \| Description \|`
			`\|-------------------\|----------\|-----------------------------------------------------------------------------\|`
			\| `model` \| `None` \| path to model file, i.e. yolov8n.pt, yolov8n.yaml \|
			\| `data` \| `None` \| path to data file, i.e. coco128.yaml \|
			\| `epochs` \| `100` \| number of epochs to train for \|
			\| `patience` \| `50` \| epochs to wait for no observable improvement for early stopping of training \|
			\| `batch` \| `16` \| number of images per batch (-1 for AutoBatch) \|
			\| `imgsz` \| `640` \| size of input images as integer or w,h \|
			\| `save` \| `True` \| save train checkpoints and predict results \|
			\| `save_period` \| `-1` \| Save checkpoint every x epochs (disabled if < 1) \|
			\| `cache` \| `False` \| True/ram, disk or False. Use cache for data loading \|
			\| `device` \| `None` \| device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu \|
			\| `workers` \| `8` \| number of worker threads for data loading (per RANK if DDP) \|
			\| `project` \| `None` \| project name \|
			\| `name` \| `None` \| experiment name \|
			\| `exist_ok` \| `False` \| whether to overwrite existing experiment \|
			\| `pretrained` \| `False` \| whether to use a pretrained model \|
			\| `optimizer` \| `'SGD'` \| optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp'] \|
			\| `verbose` \| `False` \| whether to print verbose output \|
			\| `seed` \| `0` \| random seed for reproducibility \|
			\| `deterministic` \| `True` \| whether to enable deterministic mode \|
			\| `single_cls` \| `False` \| train multi-class data as single-class \|
			\| `image_weights` \| `False` \| use weighted image selection for training \|
`ultralytics 8.0.59` new MLFlow and feature updates (#1720) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: St. HeMeow <sheng.heyang@gmail.com> Co-authored-by: Danny Kim <imbird0312@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Torge Kummerow <CySlider@users.noreply.github.com> Co-authored-by: dankernel <dkdkernel@gmail.com> Co-authored-by: Burhan <62214284+Burhan-Q@users.noreply.github.com> Co-authored-by: Roshanlal <roshanlaladchitre103@gmail.com> Co-authored-by: Lorenzo Mammana <lorenzo.mammana@orobix.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com> 2 years ago			\| `rect` \| `False` \| rectangular training with each batch collated for minimum padding \|
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago			\| `cos_lr` \| `False` \| use cosine learning rate scheduler \|
			\| `close_mosaic` \| `10` \| disable mosaic augmentation for final 10 epochs \|
			\| `resume` \| `False` \| resume training from last checkpoint \|
`ultralytics 8.0.57` Comet, AMP, Classify, Docker updates (#1601) Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> 2 years ago			\| `amp` \| `True` \| Automatic Mixed Precision (AMP) training, choices=[True, False] \|
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago			\| `lr0` \| `0.01` \| initial learning rate (i.e. SGD=1E-2, Adam=1E-3) \|
			\| `lrf` \| `0.01` \| final learning rate (lr0 * lrf) \|
			\| `momentum` \| `0.937` \| SGD momentum/Adam beta1 \|
			\| `weight_decay` \| `0.0005` \| optimizer weight decay 5e-4 \|
			\| `warmup_epochs` \| `3.0` \| warmup epochs (fractions ok) \|
			\| `warmup_momentum` \| `0.8` \| warmup initial momentum \|
			\| `warmup_bias_lr` \| `0.1` \| warmup initial bias lr \|
			\| `box` \| `7.5` \| box loss gain \|
			\| `cls` \| `0.5` \| cls loss gain (scale with pixels) \|
			\| `dfl` \| `1.5` \| dfl loss gain \|
`ultralytics 8.0.65` YOLOv8 Pose models (#1347) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mert Can Demir <validatedev@gmail.com> Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: Fabian Greavu <fabiangreavu@gmail.com> Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com> Co-authored-by: Eric Pedley <ericpedley@gmail.com> Co-authored-by: JustasBart <40023722+JustasBart@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Sergio Sanchez <sergio.ssm.97@gmail.com> Co-authored-by: Bogdan Gheorghe <112427971+bogdan-galileo@users.noreply.github.com> Co-authored-by: Jaap van de Loosdrecht <jaap@vdlmv.nl> Co-authored-by: Noobtoss <96134731+Noobtoss@users.noreply.github.com> Co-authored-by: nerdyespresso <106761627+nerdyespresso@users.noreply.github.com> Co-authored-by: Farid Inawan <frdteknikelektro@gmail.com> Co-authored-by: Laughing-q <1185102784@qq.com> Co-authored-by: Alexander Duda <Alexander.Duda@me.com> Co-authored-by: Mehran Ghandehari <mehran.maps@gmail.com> Co-authored-by: Snyk bot <snyk-bot@snyk.io> Co-authored-by: majid nasiri <majnasai@gmail.com> 2 years ago			\| `pose` \| `12.0` \| pose loss gain (pose-only) \|
			\| `kobj` \| `2.0` \| keypoint obj loss gain (pose-only) \|
`ultralytics 8.0.53` DDP AMP and Edge TPU fixes (#1362) Co-authored-by: Richard Aljaste <richardaljasteabramson@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vuong Kha Sieu <75152429+hotfur@users.noreply.github.com> 2 years ago			\| `fl_gamma` \| `0.0` \| focal loss gamma (efficientDet default gamma=1.5) \|
			\| `label_smoothing` \| `0.0` \| label smoothing (fraction) \|
			\| `nbs` \| `64` \| nominal batch size \|
			\| `overlap_mask` \| `True` \| masks should overlap during training (segment train only) \|
			\| `mask_ratio` \| `4` \| mask downsample ratio (segment train only) \|
			\| `dropout` \| `0.0` \| use dropout regularization (classify train only) \|
			\| `val` \| `True` \| validate/test during training \|