Sreyas Venkataraman

I am a senior undergraduate at the Department of Mathematics, pursuing a Bachelor of Science in Mathematics and Computing at the Indian Institute of Technology, Kharagpur. My research interests lie in reinforcement learning, robot learning, and utlising multimodal data for building adaptive and robust robotic systems.

I have been an integral part of the Autonomous Vehicles Group (AGV) at IIT Kharagpur, led by Professor Debashish Chakravarty, where I worked on SLAM and reinforcement learning for F1-Tenth scale autonomous cars (MushR). Additionally, I collaborated with Professor Aritra Hazra on offline reinforcement learning and meta RL. I also gained valuable insights during my internship with Professor Balaraman Ravindran at IIT Madras, where I worked on sim-to-real transfer methods for robotics.

I was fortunate to spend a summer as a RISS scholar at the Robotics Institute, Carnegie Mellon University, working with Yufei Wang under the guidance of Professor David Held and Professor Zackory Erickson. Together, we developed innovative methods for reward learning from multimodal data and applied these techniques to real-world robotic tasks. I also collaborated with Dr. Sherry Yang and Professor Bo Dai at DeepMind on improving video generation for robotic planning, focusing on task fidelity and robustness.

Email  /  GitHub  /  Google Scholar  /  LinkedIn

profile photo

Publications

VideoAgent: Self-Improving Video Generation


A. Soni*, S. Venkataraman*, A. Chandra*, S. Fischmeister, P. Liang, Bo Dai, S. Yang, “VideoAgent:Self-Improving Video Generation” in submission at a top conference
paper / page / code /

Real-World Offline Reinforcement Learning from Vision Language Model Feedback


S. Venkataraman*, Y. Wang*, Z. Wang, Z. Erickson, D. Held, “Real-World Offline Reinforcement Learning from Vision Language Model Feedback” Accepted at the LangRob Workshop @ CoRL 2024, in submission at ICRA 2025.
paper / page / code / video /

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning


S. Mani, S. Venkataraman, A. Chandra, A. Rizvi, Y. Sirvi, S. Bhattacharya, A. Hazra, “DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning” Gold Winning submission at Train Offline Test Online Workshop Competition at NeurIPS 2023.
paper /

Research

I'm interested in reinforcement learning, robot learning and trajectory generation.

Bachelor’s Thesis Project


Prof. Aritra Hazra and Prof. Naveen Kumar Garg, IIT Kharagpur

I reproduced AVDC results, highlighting the impact of shared image-space representations for multi-task learning. Additionally, I explored unsupervised pretraining methods to enhance task-agnostic representations for multi-task reinforcement learning (RL).

Robotics Institute Summer Scholar


Prof David Held and Prof Zackory Erickson, Robotics Institute, Carnegie Mellon University
paper / page / code / video /

I developed a novel reinforcement learning (RL) system using vision-language models (VLMs) to generate reward functions from offline unlabelled datasets. I conducted extensive experiments with the developed method, demonstrating a 40%+ increase in success rate and alignment with ground-truth task progress across a diverse range of manipulation tasks in random datasets. Additionally, I developed a parallelized and modular pipeline for processing high-dimensional point cloud data, deploying the framework in a real-world assistive dressing task, achieving state-of-the-art results of 0.82 in out-of-distribution settings.

Undergraduate Research Intern


Dr. Sherry Yang, Senior Research Scientist, DeepMind
paper / page / code /

I developed VideoAgent, a system designed to self-refine robotic video plans using feedback from pretrained vision-language models (VLMs), which enhanced task success in both simulated and real-world environments by reducing hallucinations. Improved task success was achieved through iterative VLM feedback refinement, online interaction, and replanning strategies in MetaWorld, raising success rates from 43.1% to 50%. In iTHOR, success rates increased from 31.3% to 34.2%. Additionally, VideoAgent demonstrated a 22% increase in task acceptance based on human evaluations, showcasing improved visual quality, temporal consistency, and factual accuracy in real-world video plan generation using the BridgeV2 dataset.

Undergraduate Researcher


Prof Debashish Chakravarty, AGV Group, Indian Institute of Technology, Kharagpur

I implemented and tested various CNN architectures, including ENet, UNet, and VGGNet, on standard datasets. Processed PointCloud and Odometry data from Carla to generate PCD files in Open3D, implemented KD-tree search for local mapping, and used the ICP Algorithm to achieve a 20 cm localization improvement. Additionally, I headed two teams for MLRC-TMLR submissions on action noise for exploration and in-sample learning.

Undergraduate Research Intern


Prof B Ravindran, RBCDSAI, Indian Institute of Technology, Madras

I worked on sim2real adaptations focusing on adversarial robustness, domain generalization, and adaptation. I developed visual domain adaptation for CARLA using stable diffusion models conditioned on simulator images, supported by uniform domain randomization, to learn a generalized control policy.

Undergraduate Research Intern


Prof. V Kamakoti and Prof Chester Robeiro, Indian Institute of Technology Madras

I encoded text semantics from HTTP headers using N-grams and χ² statistic to classify Android malware. I trained an SVM classifier and performed ablation studies with different kernels, achieving an accuracy of 92%.




Projects

These include coursework, side projects and persona research work.

Scalable Multi Agent Robot Swarm Navigation in Dynamic Environment


Captain | Inter IIT Tech Meet 13.0

I utilized Active-SLAM and YOLO-NAS for unsupervised exploration and goal detection in dynamic environments. To optimize task-agent matching, I constructed a sparse bipartite graph and applied the Hungarian algorithm. Additionally, I implemented Deep Q-Learning with hindsight experience replay for high-level, map-agnostic path planning. For local path execution, I developed an NMPC-based controller that enabled static and dynamic obstacle avoidance.

Train Offline, Test Online: A Real Robot Learning Benchmark Challenge(Gold)


website / paper / report / code /

I developed DiffClone to solve the robotic control task of pouring and scooping in an offline reinforcement learning setup with sparse rewards. The system utilized a Momentum Contrast fine-tuned ResNet50 visual encoder to generate robust representations, combined with a DDPM-based behavioral cloning agent for precise action prediction in complex multi-modal environments. In simulation, DiffClone achieved a 92% success rate and a mean reward of 51 for pouring, surpassing existing benchmarks.

Improving Domain-Specific QA using the SQuAD 2.0 dataset(Event Silver)


Inter IIT Tech Meet 11.0
paper / report / code /

I developed a closed-domain QA system on SQuAD-like datasets and applied generative augmentations using T5 and GPT-3. Additionally, I pioneered sentence-level improvisation to Facebook’s DrQA retriever, improving retriever latency by 3x. I utilized the FedAvg algorithm, Reptile meta-learning, and an Incremental Replay Mechanism to effectively handle domain adaptation. By leveraging an Electra-BERT ensemble, I achieved an F1 score of 0.85 and improved runtime by 2.65x through caching, ONNX, and quantization techniques.

Exploration Strategies in Reinforcement Learning: A Comparative Analysis


AGV

I conducted a comparative analysis of colored noise strategies and demonstrated that Pink Noise enhances exploration by effectively balancing local and global strategies, outperforming traditional white and Ornstein-Uhlenbeck (OU) noise. Additionally, I developed a novel spatio-temporal noise strategy, which outperformed Pink Noise in 7 out of 8 environments.

In-Sample Softmax Analysis in Offline RL across Diverse Environments


AGV

I extended INAC’s application to sub-optimal and imbalanced datasets, demonstrating its adaptability and effectiveness in handling various data distributions without the need for additional environmental interaction. I also showcased enhanced training stability and maintained mean reward performance by integrating Behavioral Cloning regularization, highlighting INAC’s ability to learn near-perfect policies in expert settings.

Position of Responsibilities

Member


Data Team lead by Rachel Burcin

Curated data on diversity, inclusion, and equity in graduate programs and presented the findings to various stakeholders.

Executive Head


Autonomous Ground Vehicle Research Group

Mentored a team of 25 students in RL & Vision, leading to successful participation in over 3 international competitions. Spearheaded the drafting of a budget exceeding Rs. 50 lakhs for procuring GPUs, robots, sensors, and other essential equipment.


Design and source code from Leonid Keselman's website