Unified AI Multi-modal Chatbot

P Chiranjeevi; Nagalaxmi Kalluri; Sai Saket Gurubhagavatula; Abhishek Kuncham; Mohammed Sami

doi:10.30574/wjaets.2025.15.2.0513

P Chiranjeevi, Nagalaxmi Kalluri, Sai Saket Gurubhagavatula ^*, Abhishek Kuncham and Mohammed Sami

Department of CSE (Data Science), ACE Engineering College, Telangana, India.

Research Article

World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 089-097

Article DOI: 10.30574/wjaets.2025.15.2.0513

DOI url: https://doi.org/10.30574/wjaets.2025.15.2.0513

Publication history

Received on 18 March 2025; revised on 29 April 2025; accepted on 01 May 2025

Abstract

In today’s digital age, we are surrounded by a massive amount of information in different formats—documents, images, and videos. However, making sense of all this data in a meaningful way is still a challenge. This project proposes a smart, unified chatbot system that can understand and interact with content from multiple sources using a multi-modal Retrieval-Augmented Generation (RAG) approach powered by Google’s Gemini-1.5 model. The chatbot allows users to upload PDFs, Word documents, CSV files, images containing text, and even YouTube links. It then extracts key information using techniques like OCR and video transcription, and allows users to ask questions directly about the content. What makes this system powerful is its ability to merge different types of inputs and generate accurate, context-aware answers. The entire interface is built using Streamlit, offering an easy and interactive user experience with features like real-time previews, downloadable notes, chat history, and multilingual support.The project reflects the growing need for AI systems that are intelligent, flexible, and capable of understanding information the way humans do—from all angles and in all forms.

Keywords

Multi-modal Retrieval-Augmented Generation; Gemini-1.5 Language Model; Document and Image Processing; YouTube Transcript Summarization

Download Article PDF

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0513.pdf

Preview Article PDF

How to cite this article

P Chiranjeevi, Nagalaxmi Kalluri, Sai Saket Gurubhagavatula, Abhishek Kuncham and Mohammed Sami. Unified AI Multi-modal Chatbot. World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 089-097. Article DOI: https://doi.org/10.30574/wjaets.2025.15.2.0513.

Unified AI Multi-modal Chatbot

P Chiranjeevi, Nagalaxmi Kalluri, Sai Saket Gurubhagavatula ^*, Abhishek Kuncham and Mohammed Sami

Preview Article PDF

Get Certificates

Issue details

Unified AI Multi-modal Chatbot

P Chiranjeevi, Nagalaxmi Kalluri, Sai Saket Gurubhagavatula *, Abhishek Kuncham and Mohammed Sami

Preview Article PDF

Get Certificates

Issue details

P Chiranjeevi, Nagalaxmi Kalluri, Sai Saket Gurubhagavatula ^*, Abhishek Kuncham and Mohammed Sami