Create a multimodal RAG system that processes both images and text for document understanding with table extraction.
Prepare sample documents of each type. Configure vision and text models with API keys. Process documents through the pipeline and test with queries that span text, table, and image content.
Initial release
Sign in and download this prompt to leave a review.