Setting Up Webots with Stable Baselines3 for Reinforcement Learning
Discover how to set up Webots with Stable Baselines3 for reinforcement learning, enabling you to create a robust simulation environment without costly hardware.

How to Set Up Webots with Stable Baselines3 for Reinforcement Learning
Have you ever dreamed of building a robot, only to be deterred by high hardware costs and the risk of damage? You're not alone. For many, physical robots are simply out of reach. A decent mobile robot platform can cost hundreds or even thousands of dollars, often breaking down and requiring more space than we have.
But don't let hardware limitations stop you from diving into robotics. You can create amazing projects without an expensive setup. Simulation brings you incredibly close to real-world environments, allowing you to learn, experiment, and prototype effectively. Reinforcement learning (RL) in simulation can be just as engaging as working with physical robots. While understanding concepts like policy gradients, PPO, and SAC is essential, there's something uniquely satisfying about watching a trained agent navigate a simulated world that mirrors reality.
This is where Webots comes into play. As an industry-grade physics simulator used by researchers and companies worldwide, it’s completely free. In this tutorial, you'll learn how to connect Webots with Stable Baselines3, combining a professional simulator with proven RL algorithms. By the end, you’ll have a fully functional simulation environment ready for RL training—no hardware needed, just Python and a bit of curiosity.
What Will You Achieve with This Tutorial?
By the end of this tutorial, you’ll have:
- A working Webots simulation world featuring a robot and a target.
- A Python virtual environment with Stable Baselines3 installed.
- An external controller set up to run RL code from your IDE.
- A verified connection between Python and Webots.
- A foundation for building a Gymnasium environment in the next tutorial.
Your task is to create a robot that learns to navigate toward a target from any starting position. This setup is intentionally simple yet powerful—once you grasp this foundation, you can extend it to more complex scenarios, such as autonomous driving.
What is Reinforcement Learning and How Does Simulation Work?
Reinforcement Learning (RL) is a branch of Artificial Intelligence that trains agents through trial and error. It can be mathematically represented as an optimization problem, where we design closed-loop control policies to maximize accumulated rewards over time. RL has proven successful in various applications, from large language models to robotics and autonomous vehicles.
Simulation uses computer software to create virtual environments that mimic real-world physics and dynamics. Instead of testing your RL agent on costly hardware that can break or pose safety risks, you train it in a controlled digital replica. Think of it as a sandbox where your agent can fail countless times without consequences, learning what works before ever interacting with physical hardware.
Webots offers industry-standard, physics-accurate simulation that is completely free and robot-agnostic. Whether you're working with wheeled robots, drones, or manipulator arms, Webots manages the physics engine, sensors, and actuators, allowing you to focus on your RL and control logic.
Stable Baselines3 provides production-ready RL algorithms (PPO, SAC, TD3, etc.) with clean APIs, excellent documentation, and active maintenance. Instead of spending weeks implementing and debugging DDPG, you gain access to reliable, tested implementations. By connecting Webots with Stable Baselines3, you equip yourself with professional-grade tools on both ends.
What Are the Prerequisites for This Tutorial?
To follow this tutorial, you should have:
- Basic Python programming skills.
- Familiarity with RL concepts (agent, environment, reward, policy).
- A curiosity to learn.
What Software Do You Need?
- Python 3.8 or later (I’m using Python 3.12.0).
- Webots R2023b or later.
- Stable Baselines3 and its dependencies.
What Are the Hardware Requirements?
- Any modern computer (Windows, macOS, or Linux).
- At least 4GB of RAM is recommended.
How to Install the Necessary Software
-
Download and Install Python from python.org. Be sure to check "Add Python to PATH" during installation.
- Verify installation:
python --version
- Verify installation:
-
Download Webots:
- Visit https://cyberbotics.com/ and download the package for your operating system.
- Run the installer and follow the prompts, agreeing to all defaults.
- Launch Webots to verify installation.
-
Create a Webots Project:
- Open Webots.
- Go to File → New → New Project Directory.
- Use the Project Creation Wizard:
- Directory name:
Webots_SB3_Tutorial - World name:
robot_navigation - Check "Add a rectangle arena."
- Click Finish.
- Directory name:
- Webots will create your project structure and open your new world with a basic arena.
How to Set Up the Virtual Environment
Webots uses its own Python environment, making traditional virtual environments incompatible. To resolve this, we will set up an external controller, allowing you to run your code from your terminal or IDE while connecting to the Webots simulation.
-
Navigate to your project folder:
cd {path-to-your}/Webots_SB3_Tutorial -
Create the virtual environment:
python -m venv webots_rl_env -
Activate the environment:
- On Windows:
webots_rl_env\Scripts\activate - On macOS/Linux:
source webots_rl_env/bin/activate
- On Windows:
-
Install Required Packages:
pip install stable-baselines3[extra] gymnasium numpy- Verify installation:
python -c "import stable_baselines3; print(stable_baselines3.__version__)"
- Verify installation:
-
Set the Webots Environment Variable: For external controllers to work, Python needs to know the location of your Webots installation. Set this once:
- Windows PowerShell:
$env:WEBOTS_HOME = "C:\Program Files\Webots" - Windows CMD:
set WEBOTS_HOME=C:\Program Files\Webots - macOS/Linux:
export WEBOTS_HOME=/Applications/Webots.app - To make this permanent, add it to your system environment variables or shell profile.
- Windows PowerShell:
Your project structure should now look like this:
Webots_SB3_Tutorial/
├── webots_rl_env/ # Your virtual environment
├── controllers/
├── libraries/
├── plugins/
├── worlds/
│ └── robot_navigation.wbt
└── protos/
How to Add Components to the RL Agent
Next, we’ll add a robot, a target, and a supervisor to manage the training loop.
-
Add a Robot to Your World:
- In Webots, click the Add button (+ icon) in the scene.
- Navigate to: PROTO nodes (Webots Projects) → robots → gctronic → e-puck → E-puck (Robot) or search for "E-Puck."
- Click Add.
- Assign the robot a DEF name by clicking on the E-puck and adding
ROBOTto the DEF field. - Set the robot controller to external by changing the controller field from "e-puck" to
<extern>.
-
Add a Target:
- Click the Add button and select Base nodes → Solid.
- Assign the Solid node a DEF name by adding
TARGETto the DEF field. - Add a visual appearance by adding a Shape node and configuring it as a cylinder with specified dimensions.
- Position the target using the translation field.
-
Add a Supervisor:
- Click Add and select Base nodes → Robot.
- Configure it as a supervisor by setting the name to "supervisor_controller," marking the supervisor field as TRUE, and setting the controller to
<extern>.
How to Save Your World
- Go to File → Save World.
- Your scene tree should now look like this, with the ROBOT, TARGET, and Supervisor correctly set up.
How to Verify the Setup
Create a test controller to ensure that everything is connected properly:
-
In your project, create a new folder:
controllers/test_supervisor/. -
Inside that folder, create a file:
test_supervisor.py. -
Add the following code:
from controller import Supervisor # Initialize supervisor supervisor = Supervisor() timestep = int(supervisor.getBasicTimeStep()) # Test: Can we access our nodes? robot_node = supervisor.getFromDef("ROBOT") target_node = supervisor.getFromDef("TARGET") if robot_node and target_node: print("✓ Setup successful!") print(f" Robot found at: {robot_node.getPosition()}") print(f" Target found at: {target_node.getPosition()}") # Test moving the target trans_field = target_node.getField("translation") current_pos = trans_field.getSFVec3f() print(f" Target can be moved: {current_pos}") else: print("✗ Setup error!") if not robot_node: print(" Missing: ROBOT (check DEF name on E-puck)") if not target_node: print(" Missing: TARGET (check DEF name on Solid)") # Run one simulation step supervisor.step(timestep) print("✓ Simulation step successful!") -
Running the Test: In Webots, open your
robot_navigation.wbtworld, select the Robot (supervisor_controller) node, and change its controller field from<extern>totest_supervisor. Click the Play button (▶️) in Webots.
You should see expected output in the Webots console confirming a successful setup. After testing, change the supervisor's controller field back to <extern>.
Conclusion
Congratulations! You have built a solid foundation for RL training in Webots:
- Installed Webots and set up your Python environment.
- Created a simulation world with a robot and a target.
- Configured your external controller setup and verified communication between Python and Webots.
In the next tutorial, we will build a Gymnasium environment for Webots robot control. We will write the code that bridges Stable Baselines3 and Webots, implementing reset() and step() methods, defining observation and action spaces, and designing a reward function.
Final Thoughts
Interacting with Webots might feel confusing at first, but the best way to learn is through experimentation. Go beyond what we've covered in this tutorial. Explore the pre-made robots available in Webots, and try customizing different nodes to enhance your understanding. You now have a professional-grade simulation setup ready for RL experimentation—no expensive hardware required. Thank you for reading this tutorial. If you encounter any issues during implementation, feel free to leave a comment, and I’ll respond promptly.
Related Articles

EU Bans Destruction of Unsold Apparel: A Tech Perspective
The EU's ban on unsold apparel destruction sparks innovation in sustainability, leveraging tech to reshape the fashion industry.
Feb 15, 2026

Nvidia, Groq, and the Limestone Race to Real-Time AI
Nvidia and Groq are at the forefront of the race for real-time AI. This blog explores how enterprises can leverage their innovations to succeed.
Feb 15, 2026

AI Uncovers Hidden Genetic Control Centers Driving Alzheimer’s
Discover how AI is mapping genetic control centers in Alzheimer's, revealing critical insights into the disease's progression and potential treatments.
Feb 15, 2026
