Home
World Journal of Advanced Engineering Technology and Sciences
International, Peer reviewed, Referred, Open access | ISSN Approved Journal

Main navigation

  • Home
    • Journal Information
    • Abstracting and Indexing
    • Editorial Board Members
    • Reviewer Panel
    • Journal Policies
    • WJAETS CrossMark Policy
    • Publication Ethics
    • Instructions for Authors
    • Article processing fee
    • Track Manuscript Status
    • Get Publication Certificate
    • Issue in Progress
    • Current Issue
    • Past Issues
    • Become a Reviewer panel member
    • Join as Editorial Board Member
  • Contact us
  • Downloads

ISSN: 2582-8266 (Online)  || UGC Compliant Journal || Google Indexed || Impact Factor: 9.48 || Crossref DOI

Fast Publication within 2 days || Low Article Processing charges || Peer reviewed and Referred Journal

Research and review articles are invited for publication in Volume 18, Issue 3 (March 2026).... Submit articles

Innovations in visual language models for robotic interaction and contextual awareness: Progress, pitfalls and perspectives

Breadcrumb

  • Home
  • Innovations in visual language models for robotic interaction and contextual awareness: Progress, pitfalls and perspectives

Prashant Anand Srivastava *

Senior Software Engineer at Amazon Lab126, CA.

Review Article

World Journal of Advanced Engineering Technology and Sciences, 2025, 15(01), 1145-1152

Article DOI: 10.30574/wjaets.2025.15.1.0311

DOI url: https://doi.org/10.30574/wjaets.2025.15.1.0311

Received on 03 March 2025; revised on 08 April 2025; accepted on 11 April 2025

Vision‑Language Models (VLMs) promise to bridge visual perception and natural language for truly intuitive robotic interaction, yet their real‑world robustness remains underexplored. In this paper, we quantitatively evaluate state‑of‑the‑art VLM performance—showing VLM‑RT achieves 96.8% reasoning accuracy at 18.2 FPS but suffers dramatic degradation (94.3% → 37.8% accuracy) under variable lighting and a 48.4‑point recognition gap between Western and East Asian objects. We introduce a concise failure‑mode analysis that links these deficits to core root causes (environmental variability, distributional bias, multimodal misalignment) and map each to practical mitigation strategies. Building on this foundation, we propose a prioritized research roadmap—human‑in‑the‑loop systems, continual learning, and embodied intelligence—and define standardized metrics for fairness, privacy containment, and safety verification. Together, these contributions offer actionable benchmarks to guide the development of robust, trustworthy VLM‑powered robots. 

Multimodal Representation; Zero-Shot Generalization; Embodied Cognition; Distributional Bias; Human-Robot Collaboration

https://wjaets.com/sites/default/files/fulltext_pdf/WJAETS-2025-0311.pdf

Preview Article PDF

Prashant Anand Srivastava. Innovations in visual language models for robotic interaction and contextual awareness: Progress, pitfalls and perspectives. World Journal of Advanced Engineering Technology and Sciences, 2025, 15(01), 1145-1152. Article DOI: https://doi.org/10.30574/wjaets.2025.15.1.0311.

Get Certificates

Get Publication Certificate

Download LoA

Check Corssref DOI details

Issue details

Issue Cover Page

Editorial Board

Table of content


Copyright © Author(s). All rights reserved. This article is published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and source, a link to the license is provided, and any changes made are indicated.


Copyright © 2026 World Journal of Advanced Engineering Technology and Sciences

Developed & Designed by VS Infosolution