flystem-usls/examples/blip/README.md

53 lines
1.1 KiB
Markdown

This demo shows how to use [BLIP](https://arxiv.org/abs/2201.12086) to do conditional or unconditional image captioning.
## Quick Start
```shell
cargo run -r --example blip
```
## Or you can manully
### 1. Donwload CLIP ONNX Model
[blip-visual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-visual-base.onnx)
[blip-textual-base](https://github.com/jamjamjon/assets/releases/download/v0.0.1/blip-textual-base.onnx)
### 2. Specify the ONNX model path in `main.rs`
```Rust
// visual
let options_visual = Options::default()
.with_model("VISUAL_MODEL") // <= modify this
.with_profile(false);
// textual
let options_textual = Options::default()
.with_model("TEXTUAL_MODEL") // <= modify this
.with_profile(false);
```
### 3. Then, run
```bash
cargo run -r --example blip
```
## Results
```shell
[Unconditional image captioning]: a group of people walking around a bus
[Conditional image captioning]: three man walking in front of a bus
```
## TODO
* [ ] VQA
* [ ] Retrival
* [ ] TensorRT support for textual model