Object detection

https://production-media.paperswithcode.com/thumbnails/task/task-0000000004-7757802e.jpg

Object detection consists in detecting objects belonging to a certain category from an image, determining the absolute location and also assigning each detected instance a predefined category. In the last few years, several models have emerged based on deep learning. Being the state of the art models being based on two stages. First, regions with higher recall values are located, so that all objects in the image adhere to the proposed regions. The second stage consists of classification models, usually CNNs, used to determine the category of each proposed region (instances).

Learn more: https://paperswithcode.com/task/object-detection

Finetuning

In order to customize your model with your own data you can use our Training API (experimental) to perform the fine-tuning of your model.

We provide ObjectDetectionTrainer with a default training structure to train object detection problems. However, one can leverage this is API using the models provided by Kornia or use existing libraries from the PyTorch ecosystem such as torchvision.

Create the dataloaders and transforms:

def my_app(config: Configuration) -> None:

    def collate_fn(data):
        # To map from [{A, B}, ...] => [{A}, ...], [{B}, ...]
        return [d[0] for d in data], [d[1] for d in data]

    # create the dataset
    train_dataset = torchvision.datasets.WIDERFace(
        root=to_absolute_path(config.data_path), transform=T.ToTensor(), split='train', download=False
    )

    valid_dataset = torchvision.datasets.WIDERFace(
        root=to_absolute_path(config.data_path), transform=T.ToTensor(), split='val', download=False
    )

    # create the dataloaders
    train_dataloader = torch.utils.data.DataLoader(
        train_dataset, batch_size=config.batch_size, shuffle=True, num_workers=0, pin_memory=True, collate_fn=collate_fn
    )

    valid_daloader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=config.batch_size, shuffle=True, num_workers=0, pin_memory=True, collate_fn=collate_fn
    )

Define your model, losses, optimizers and schedulers:


    # create the model
    model = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=True)

    # create the loss function
    criterion = None

    # instantiate the optimizer and scheduler
    optimizer = torch.optim.AdamW(model.parameters(), lr=config.lr)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, config.num_epochs * len(train_dataloader))

Create your preprocessing and augmentations pipeline:


    # define some augmentations
    _augmentations = K.augmentation.AugmentationSequential(
        K.augmentation.RandomHorizontalFlip(p=0.75),
        K.augmentation.RandomVerticalFlip(p=0.75),
        K.augmentation.RandomAffine(
            degrees=0.0, translate=(0.1, 0.1), scale=(0.9, 1.1)
        ),  # NOTE: XYXY bbox format cannot handle rotated boxes
        data_keys=['input', 'bbox_xyxy'],
    )

    def bbox_xywh_to_xyxy(boxes: torch.Tensor):
        boxes[..., 2] = boxes[..., 0] + boxes[..., 2]  # x + w
        boxes[..., 3] = boxes[..., 1] + boxes[..., 3]  # y + h
        return boxes

    def preprocess(self, x: dict) -> dict:
        x['target'] = {
            "boxes": [bbox_xywh_to_xyxy(a['bbox'].float()) for a in x['target']],
            # labels are set to 1 for all faces
            "labels": [torch.tensor([1] * len(a['bbox'])) for a in x['target']],
        }
        return x

    def augmentations(self, sample: dict) -> dict:
        xs, ys, ys2 = [], [], []
        for inp, trg, lab in zip(sample["input"], sample["target"]["boxes"], sample["target"]["labels"]):
            x, y = _augmentations(inp[None], trg[None])
            xs.append(x[0])
            ys.append(y[0])
            ys2.append(lab)
        return {"input": xs, "target": {"boxes": ys, "labels": ys2}}

    def on_before_model(self, sample: dict) -> dict:
        return {
            "input": sample["input"],
            "target": [
                {"boxes": v, "labels": l} for v, l in zip(sample["target"]["boxes"], sample["target"]["labels"])
            ],
        }

Finally, instantiate the ObjectDetectionTrainer and execute your training pipeline.


    model_checkpoint = ModelCheckpoint(filepath="./outputs", monitor="map")

    trainer = ObjectDetectionTrainer(
        model,
        train_dataloader,
        valid_daloader,
        criterion,
        optimizer,
        scheduler,
        config,
        num_classes=81,
        loss_computed_by_model=True,
        callbacks={
            "preprocess": preprocess,
            "augmentations": augmentations,
            "on_before_model": on_before_model,
            "on_checkpoint": model_checkpoint,
        },
    )
    trainer.fit()
    trainer.evaluate()

See also

Play with the full example here