Google Deepmind - Research Intern
During the summer of 2024, I interned at Google DeepMind as part of the Academy program. My project focused on implementing Proximal Policy Optimization (PPO) within the Dopamine Reinforcement Learning research framework, where I successfully reproduced benchmark results for the HalfCheetah, Reacher, and Swimmer environments in MuJoCo.
I developed actor-critic neural networks using JAX, for both continuous and discrete action spaces. The agent training loop was also crafted in JAX, incorporating Generalized Advantage Estimation and minibatching to optimize learning efficiency. Additionally, I applied advanced preprocessing techniques to enhance environment observations, significantly improving agent behavior and performance.
Leveraging Google's compute infrastructure, I conducted extensive experiments, including large-scale parameter sweeps, on both GPUs and TPUs. These experiments not only validated my implementation but also provided valuable insights into performance variations across different environments, such as MuJoCo and Atari.
This internship not only introduced me to reinforcement learning and JAX but also provided hands-on experience with large-scale experiments. It gave me the opportunity to quickly apply what I learned and contribute directly to research tooling that supports cutting-edge AI development.