Reinforcement Learning for Tank Battalion
This project showcases the development of a Reinforcement Learning (RL) agent capable of playing the classic Battle City arcade game. The agent was trained using Proximal Policy Optimization (PPO), an advanced RL algorithm, through Stable Baselines 3 and OpenAI’s Gymnasium API. The game code was adapted from Raitis’ repository, and an A* algorithm-based agent was adapted from Wong Mun Hou’s repository to assist in benchmarking and training the RL agent.
Project Overview
The objective of the project is to develop an AI that can proficiently play Battle City, a 2D shooter game where the player’s goal is to protect a base while eliminating enemy tanks. To accomplish this, a custom RL environment was created by integrating the game code with Gymnasium, allowing the RL agent to learn and improve through interaction with the game.
The PPO algorithm was selected for its balance between simplicity and performance, making it ideal for complex environments like Battle City. By incorporating reward policies that encourage the agent to defend the base, eliminate enemy tanks, and navigate efficiently, the agent gradually learned to improve its strategy and performance.
Key Features
- RL Algorithm: Trained using the PPO implementation from Stable Baselines 3, known for its stability and ease of use in continuous learning tasks.
- Game Adaptation: The classic Battle City game was modified and adapted to a reinforcement learning framework using Gymnasium. The game’s state, actions, and rewards were customized to enable efficient RL training.
- A* Algorithm: An A*-based agent was implemented and used to benchmark the performance of the RL agent, providing a non-RL-based comparison to assess improvements.
- Pretrained Model: A pretrained model of the RL agent, capable of completing several levels of the game with success, is available for download here. The model demonstrates solid performance, although there is room for further improvement.
Challenges and Opportunities
The development of this RL agent involved overcoming several key challenges, such as adapting a classic game into a reinforcement learning environment and crafting an effective reward policy that encourages the agent to balance offense (destroying enemy tanks) and defense (protecting the base). The agent is capable of handling multiple scenarios, but there is still room for optimization. The game environment is complex, and the agent could benefit from additional training, fine-tuning of hyperparameters, or further exploration of reward structures.
Future Directions
Although the RL agent performs well and is able to successfully complete some levels, there is significant room for enhancement. Future improvements could focus on extending the training duration, experimenting with different RL algorithms like DQN, or fine-tuning the existing reward policy. Users are encouraged to download the pretrained model, experiment with different strategies, and further refine the agent’s gameplay.
Feel free to try out the code, retrain the model, and improve upon the existing strategies! 🚀