I'll discuss ColPali, a model that extends PaliGemma-3B to generate ColBERT-style multi-vector representations for both text and images. We'll explore how ColPali enhances multimodal retrieval and search across text and image data, showcasing its recent benchmark success in vision retrieval tasks. The session will also cover the key concepts of ColBERT-style multi-vector representations and their impact on retrieval performance.
github link: https://github.com/samvardhan777/Colipali
ppt: https://github.com/samvardhan777/Colipali/blob/main/ppt/ColiPali%20Model%20OCR.pdf