This demo shows how to use [BLIP](https://arxiv.org/abs/2201.12086) to do conditional or unconditional image captioning.

## Quick Start

```shell
cargo run -r --example blip
```

## BLIP ONNX Model

- [blip-visual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-visual-base.onnx)  
- [blip-textual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-textual-base.onnx)


## Results

```shell
[Unconditional image captioning]: a group of people walking around a bus
[Conditional image captioning]: three man walking in front of a bus
Some(["three man walking in front of a bus"])
```

## TODO

* [ ] Multi-batch inference for image caption
* [ ] VQA
* [ ] Retrival
* [ ] TensorRT support for textual model