Home
World Journal of Advanced Engineering Technology and Sciences
International, Peer reviewed, Referred, Open access | ISSN Approved Journal

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJAETS CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN: 2582-8266 (Online)  || UGC Compliant Journal || Google Indexed || Impact Factor: 9.48 || Crossref DOI

Fast Publication within 2 days || Low Article Processing charges || Peer reviewed and Referred Journal

Research and review articles are invited for publication in Volume 18, Issue 2 (February 2026).... Submit articles

Unified AI Multi-modal Chatbot

Breadcrumb

  • Home
  • Unified AI Multi-modal Chatbot

P Chiranjeevi, Nagalaxmi Kalluri, Sai Saket Gurubhagavatula *, Abhishek Kuncham and Mohammed Sami

Department of CSE (Data Science), ACE Engineering College, Telangana, India.

Research Article

World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 089-097

Article DOI: 10.30574/wjaets.2025.15.2.0513

DOI url: https://doi.org/10.30574/wjaets.2025.15.2.0513

Received on 18 March 2025; revised on 29 April 2025; accepted on 01 May 2025

In today’s digital age, we are surrounded by a massive amount of information in different formats—documents, images, and videos. However, making sense of all this data in a meaningful way is still a challenge. This project proposes a smart, unified chatbot system that can understand and interact with content from multiple sources using a multi-modal Retrieval-Augmented Generation (RAG) approach powered by Google’s Gemini-1.5 model. The chatbot allows users to upload PDFs, Word documents, CSV files, images containing text, and even YouTube links. It then extracts key information using techniques like OCR and video transcription, and allows users to ask questions directly about the content. What makes this system powerful is its ability to merge different types of inputs and generate accurate, context-aware answers. The entire interface is built using Streamlit, offering an easy and interactive user experience with features like real-time previews, downloadable notes, chat history, and multilingual support.The project reflects the growing need for AI systems that are intelligent, flexible, and capable of understanding information the way humans do—from all angles and in all forms.

Multi-modal Retrieval-Augmented Generation; Gemini-1.5 Language Model; Document and Image Processing; YouTube Transcript Summarization

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0513.pdf

Preview Article PDF

P Chiranjeevi, Nagalaxmi Kalluri, Sai Saket Gurubhagavatula, Abhishek Kuncham and Mohammed Sami. Unified AI Multi-modal Chatbot. World Journal of Advanced Engineering Technology and Sciences, 2025, 15(02), 089-097. Article DOI: https://doi.org/10.30574/wjaets.2025.15.2.0513.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content


Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


Copyright © 2026 World Journal of Advanced Engineering Technology and Sciences

Developed & Designed by VS Infosolution