VideoAgent: Self-Improving Video Generation
A. Soni*, S. Venkataraman*, A. Chandra*, S. Fischmeister, P. Liang, Bo Dai, S. Yang, “VideoAgent:Self-Improving Video Generation” Spotlight Presentation at RL Beyond Rewards Workshop at RLC 2025
paper /
page /
code /
|
Real-World Offline Reinforcement Learning from Vision Language Model Feedback
S. Venkataraman*, Y. Wang*, Z. Wang, Z. Erickson, D. Held, “Real-World Offline Reinforcement Learning from Vision Language Model Feedback” Accepted at IROS 2025, LangRob Workshop @ CoRL 2024.
paper /
page /
code /
video /
|
DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning
S. Mani, S. Venkataraman, A. Chandra, A. Rizvi, Y. Sirvi, S. Bhattacharya, A. Hazra, “DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning” Gold Winning submission at Train Offline Test Online Workshop Competition at NeurIPS 2023.
paper /
|
Research
I'm interested in reinforcement learning, robot learning and trajectory generation.
|
Bachelor’s Thesis Project
Prof. Aritra Hazra and Prof. Naveen Kumar Garg, IIT Kharagpur
I reproduced AVDC results, highlighting the impact of shared image-space representations for multi-task learning. Additionally, I explored unsupervised pretraining methods to enhance task-agnostic representations for multi-task reinforcement learning (RL).
|
Robotics Institute Summer Scholar
Prof David Held and Prof Zackory Erickson, Robotics Institute, Carnegie Mellon University
paper /
page /
code /
video /
I developed a novel reinforcement learning (RL) system using vision-language models (VLMs) to generate reward functions from offline unlabelled datasets. I conducted extensive experiments with the developed method, demonstrating a 40%+ increase in success rate and alignment with ground-truth task progress across a diverse range of manipulation tasks in random datasets. Additionally, I developed a parallelized and modular pipeline for processing high-dimensional point cloud data, deploying the framework in a real-world assistive dressing task, achieving state-of-the-art results of 0.82 in out-of-distribution settings.
|
Undergraduate Research Intern
Dr. Sherry Yang, Senior Research Scientist, DeepMind
paper /
page /
code /
I developed VideoAgent, a system designed to self-refine robotic video plans using feedback from pretrained vision-language models (VLMs), which enhanced task success in both simulated and real-world environments by reducing hallucinations. Improved task success was achieved through iterative VLM feedback refinement, online interaction, and replanning strategies in MetaWorld, raising success rates from 43.1% to 50%. In iTHOR, success rates increased from 31.3% to 34.2%. Additionally, VideoAgent demonstrated a 22% increase in task acceptance based on human evaluations, showcasing improved visual quality, temporal consistency, and factual accuracy in real-world video plan generation using the BridgeV2 dataset.
|
Undergraduate Researcher
Prof Debashish Chakravarty, AGV Group, Indian Institute of Technology, Kharagpur
I implemented and tested various CNN architectures, including ENet, UNet, and VGGNet, on standard datasets. Processed PointCloud and Odometry data from Carla to generate PCD files in Open3D, implemented KD-tree search for local mapping, and used the ICP Algorithm to achieve a 20 cm localization improvement. Additionally, I headed two teams for MLRC-TMLR submissions on action noise for exploration and in-sample learning.
|
Undergraduate Research Intern
Prof B Ravindran, RBCDSAI, Indian Institute of Technology, Madras
I worked on sim2real adaptations focusing on adversarial robustness, domain generalization, and adaptation. I developed visual domain adaptation for CARLA using stable diffusion models conditioned on simulator images, supported by uniform domain randomization, to learn a generalized control policy.
|
Undergraduate Research Intern
Prof. V Kamakoti and Prof Chester Robeiro, Indian Institute of Technology Madras
I encoded text semantics from HTTP headers using N-grams and χ² statistic to classify Android malware. I trained an SVM classifier and performed ablation studies with different kernels, achieving an accuracy of 92%.
|
Projects
These include coursework, side projects and persona research work.
|
Scalable Multi Agent Robot Swarm Navigation in Dynamic Environment
Captain | Inter IIT Tech Meet 13.0
I utilized Active-SLAM and YOLO-NAS for unsupervised exploration and goal detection in dynamic environments. To optimize task-agent matching, I constructed a sparse bipartite graph and applied the Hungarian algorithm. Additionally, I implemented Deep Q-Learning with hindsight experience replay for high-level, map-agnostic path planning. For local path execution, I developed an NMPC-based controller that enabled static and dynamic obstacle avoidance.
|
Train Offline, Test Online: A Real Robot Learning Benchmark Challenge(Gold)
website /
paper /
report /
code /
I developed DiffClone to solve the robotic control task of pouring and scooping in an offline reinforcement learning setup with sparse rewards. The system utilized a Momentum Contrast fine-tuned ResNet50 visual encoder to generate robust representations, combined with a DDPM-based behavioral cloning agent for precise action prediction in complex multi-modal environments. In simulation, DiffClone achieved a 92% success rate and a mean reward of 51 for pouring, surpassing existing benchmarks.
|
Improving Domain-Specific QA using the SQuAD 2.0 dataset(Event Silver)
Inter IIT Tech Meet 11.0
paper /
report /
code /
I developed a closed-domain QA system on SQuAD-like datasets and applied generative augmentations using T5 and GPT-3. Additionally, I pioneered sentence-level improvisation to Facebook’s DrQA retriever, improving retriever latency by 3x. I utilized the FedAvg algorithm, Reptile meta-learning, and an Incremental Replay Mechanism to effectively handle domain adaptation. By leveraging an Electra-BERT ensemble, I achieved an F1 score of 0.85 and improved runtime by 2.65x through caching, ONNX, and quantization techniques.
|
Exploration Strategies in Reinforcement Learning: A Comparative Analysis
AGV
I conducted a comparative analysis of colored noise strategies and demonstrated that Pink Noise enhances exploration by effectively balancing local and global strategies, outperforming traditional white and Ornstein-Uhlenbeck (OU) noise. Additionally, I developed a novel spatio-temporal noise strategy, which outperformed Pink Noise in 7 out of 8 environments.
|
In-Sample Softmax Analysis in Offline RL across Diverse Environments
AGV
I extended INAC’s application to sub-optimal and imbalanced datasets, demonstrating its adaptability and effectiveness in handling various data distributions without the need for additional environmental interaction. I also showcased enhanced training stability and maintained mean reward performance by integrating Behavioral Cloning regularization, highlighting INAC’s ability to learn near-perfect policies in expert settings.
|
Position of Responsibilities
|
Member
Data Team lead by Rachel Burcin
Curated data on diversity, inclusion, and equity in graduate programs and presented the findings to various stakeholders.
|
Executive Head
Autonomous Ground Vehicle Research Group
Mentored a team of 25 students in RL & Vision, leading to successful participation in over 3 international competitions. Spearheaded the drafting of a budget exceeding Rs. 50 lakhs for procuring GPUs, robots, sensors, and other essential equipment.
|
|