Talk
Intermediate

Exploring the Multimodal LLM Inference Capabilities of PaliGemma with Keras

Rejected

PaliGemma is a recently announced versatile and lightweight vision-language model (VLM) inspired by PaLI-3 and based on open components such as the SigLIP vision model and the Gemma language model. It takes both image and text as input and generates text as output, supporting multiple languages. 


Paligemma is designed as a model for transfer to a wide range of vision-language tasks such as image and short video caption, visual question answering, text reading, object detection and object segmentation. This session explores the multi-modal capabilities of Paligemma also covers how you can use PaliGemma with Keras to set up a simple model that can infer information about supplied images and answer questions about them. 


We will also explore later how you can fine-tune Paligemma with the help of JAX.

None
FOSS

Shivay Lamba
TensorFlowJS SIG & WG Lead
Speaker Image

0 %
Approvability
0
Approvals
1
Rejections
0
Not Sure
Seems like a demo.
Reviewer #1
Rejected