Roadmap
Planned additions and improvements for PyBulletFleet. Items are grouped by category; ordering within a group does not imply priority.
Assets
New robot and infrastructure models:
Physics Mobile Robot — Wheeled robot driven by PyBullet physics (motor torques, friction, contact forces)
Physics Mobile Manipulator — Physics-mode mobile manipulator with motor-driven base and arm
Conveyor / Elevator / Mobile Rack — Warehouse infrastructure entities for material handling scenarios
Features
Snapshot & Replay — Full and delta snapshot serialization for logging, replay, and external synchronization (USO integration)
Behavior tree integration — Create agent behavior from behavior trees
SimObject.from_sdf()→List[SimObject]— Factory method that loads an SDF file viap.loadSDF()and wraps each returned body_id in aSimObject. Collision detection and lifecycle management viaadd_object()are applied automatically. Required for Open-RMF SDF environment loading and official support for pybullet_data SDF models (kiva_shelf, wsg50_gripper, etc.). Currently the catalog demo calls rawp.loadSDF()directly.
Interfaces
External communication layers:
ROS 2 — Topic / service / action bridge for ROS 2 ecosystem integration
gRPC — Language-agnostic RPC interface for orchestrators, WMS, and fleet managers
Refactoring
Remove scipy dependency — Currently only
scipy.spatial.transform.Rotationis used (9 call sites for quat↔euler, quat↔matrix, relative rotation). Replace with PyBullet utilities + lightweight helpers ingeometry.pyto eliminate the ~150 MB transitive dependency. Low priority: no runtime performance impact, only install size.Manual quaternion helpers in
geometry.py— Extend the pattern established bySlerpPrecomp/quat_slerp(avoiding scipy, hand-written scalar math) by adding the following helpers togeometry.py. The goal is to eliminate scipyRotationobject creation overhead on hot paths.Required helper functions:
quat_rotate_vector(q, v)— Rotate vectorvby quaternionq(body→world transform). ReplacesRotation.from_quat(q).apply(v)quat_multiply(q1, q2)— Quaternion product. Replaces(Rotation.from_quat(q1) * Rotation.from_quat(q2)).as_quat()quat_from_rotvec(rotvec)— Quaternion from rotation vector. ReplacesRotation.from_rotvec(rotvec).as_quat()(used in angular velocity → orientation update)
Primary application sites:
OmniVelocityController._apply_velocity()— body→world velocity transform + quaternion update from angular velocitytools.body_to_world_velocity_3d()— same as abovePath._calculate_orientation_for_plane()— rotation matrix → quaternion conversionPath.visualize_waypoints()— quaternion → rotation matrix conversion
Design policy:
Pure Python scalar math (
math.sin/math.cos) or small numpy array opsSame file and style as
SlerpPrecompApply incrementally after profiling confirms bottleneck (YAGNI)
Full scipy removal in a separate PR after all call sites are replaced
Performance
Near-term optimizations within the current PyBullet-backed architecture. Goal: reduce per-step cost at 100–1000 agents without a full backend swap (see Long-Term section for that).
Profiling Baseline (measured 2026-04-03)
Benchmark: 500 agents, omnidirectional MoveAction, collision_check_frequency=0 (disabled), physics=False, simple_cube.urdf.
Per-step breakdown (median):
Component |
Time |
% of total |
What |
|---|---|---|---|
Python overhead (TPI + slerp + action queue + set_pose logic) |
11.5 ms |
88% |
CPython for-loop + object dispatch |
|
0.7 ms |
5% |
PyBullet C API |
|
0.5 ms |
4% |
PyBullet C API |
Movement detection |
0.3 ms |
3% |
Pure Python arithmetic |
Total |
13.0 ms |
100% |
FPS 77 |
Key insight: 88% of step time is Pure Python overhead, not C API calls.
Collision at 10 Hz adds only ~0.2 ms at 500 agents — negligible compared to agent_update.
Vectorization micro-benchmark (500 agents):
Operation |
Python for-loop |
NumPy vectorized |
Speedup |
|---|---|---|---|
Position compute ( |
1,150 μs |
5.6 μs |
205× |
TPI-like trapezoidal profile |
(per-agent) |
33 μs |
— |
Per-agent cost |
23 μs/agent |
~0.01 μs/agent |
— |
Two-Phase Step: Decouple Computation from PyBullet C API
Current step_once() iterates agents one-by-one, each calling controller.compute() (Python/NumPy) → set_pose_raw() (PyBullet C API + AABB update + spatial grid) interleaved. This prevents vectorization and adds per-agent Python↔C crossing overhead.
Proposed split:
Phase |
What |
Hot path |
|---|---|---|
Phase 1 — Compute |
All controllers compute new poses; no side effects |
Pure Python / NumPy |
Phase 2 — Apply |
Tight loop of |
PyBullet C calls |
Phase 3 — Bookkeep |
Batch AABB refresh ( |
C calls + Python dict |
This requires removing direct pybullet API calls from sim_object.py, agent.py, and controller.py. Instead, these modules produce pose intents (position + orientation tuples), and core_simulation.py flushes them to PyBullet in bulk.
Key changes:
SimObject.set_pose()/set_pose_raw()writes to an internal buffer (cached pose + dirty flag) without callingp.resetBasePositionAndOrientation()Controller.compute()returns(new_pos, new_orn)or writes to agent’s pending pose buffercore_simulation.step_once()collects dirty poses → batchresetBasePositionAndOrientation→ batch AABB updateMovement detection stays pure Python (already cached-pose-based), unaffected
Attached-object propagation runs after Phase 2 using the buffered parent poses
Vectorized Agent Update (NumPy Batch)
For the common “N agents on straight-line TPI paths” case, Phase 1 can be further vectorized:
Store all active agents’
forward_start_pos,forward_direction, and TPI parameters in contiguous(N, 3)NumPy arraysCompute
new_positions = start_positions + directions * ratios[:, np.newaxis]in one vectorized callSlerp batch: pre-compute all
(start_quat, target_quat, t_fraction)and batchquat_slerpFallback: agents with non-standard controllers (velocity mode, custom callbacks) use the existing per-agent path
This is a “BatchController” or “VectorizedOmniController” that sits alongside the existing Controller ABC.
C++ Extensions for Hot-Path Functions
Profile-guided candidates for C++ (via pybind11) or Cython acceleration:
Function |
Current |
Why C++ helps |
|---|---|---|
TwoPointInterpolation |
Pure Python |
Called N× per step; tight numerical loop ideal for native code |
|
Python scalar math in |
N× per step; SIMD-friendly |
Spatial hash broad-phase |
Python dict + set ops in |
Dict overhead at 1000+ objects; Rust/C++ hash map faster |
AABB overlap test |
Python comparisons in |
Tight inner loop; autovectorizable in C++ |
|
PyBullet C API (already native) |
Already fast; not a candidate |
Priority: TPI and slerp first (highest call frequency), then spatial hash (scales with agent count²).
Deferred AABB Update
Currently set_pose() calls p.getAABB() and updates the spatial grid per object, immediately. For kinematic-only mode:
Defer all AABB updates to a single
p.performCollisionDetection()call after Phase 2Batch
p.getAABB()for all moved objects at onceRebuild spatial grid once per step instead of incrementally per-object
This removes N p.getAABB() C-API round-trips from the set_pose hot path.
Summary: Expected Impact (500 agents, measured baseline)
Optimization |
Estimated step time |
Estimated FPS |
Speedup vs current |
Effort |
|---|---|---|---|---|
Current |
13.0 ms |
77 |
1.0× |
— |
NumPy vectorized controller |
~1.2 ms |
~850 |
~11× |
Medium |
+ C++ TPI/slerp extensions |
~1.0 ms |
~1,000 |
~13× |
Medium |
Theoretical floor (C API only) |
0.7 ms |
~1,400 |
~18× |
— |
Scaling by agent count (current → vectorized estimate):
Agents |
Current FPS |
Est. vectorized FPS |
Speedup |
|---|---|---|---|
100 |
225 |
~2,000+ |
~9× |
500 |
77 |
~850 |
~11× |
1,000 |
27 |
~450 |
~17× |
Note: Collision detection (10 Hz spatial hash) adds <1% overhead at 500 agents. C++ spatial hash becomes relevant at 1,000+ agents with higher collision frequencies.
Relation to Long-Term Backend Abstraction: The two-phase split and the “remove direct pybullet calls” refactoring are natural stepping stones toward the SimBackend ABC (Phase 1 of Long-Term). The buffered-pose pattern becomes the write side of
SimBackend.set_positions_batch().
Environments
Simulation environment assets (warehouse floors, factory layouts, etc.):
pybullet-fleet-environmentspackage — Manage environment assets in a separate repository, installable viapip install pybullet-fleet-environmentsfor on-demand retrieval. Keep PyBulletFleet core lightweight by not bundling meshes.AWS RoboMaker Small Warehouse (MIT-0): DAE→OBJ converted meshes + URDF wrappers
Open-RMF rmf_demos maps (office, hotel, clinic, airport, campus): OBJ mesh export
pybullet_data bundled environments (kiva_shelf, samurai, stadium) wrappers
Original license clearly noted per environment
resolve_environment()API — Name resolution similar toresolve_urdf()for loading environments. Shows install hints when not installed.
CI / DevOps
GitHub Actions refactoring — Streamlined CI pipeline
Automated performance tracking — Run time / memory benchmarks in CI, auto-update results in documentation, and alert on significant performance regressions
Long-Term: Backend Abstraction & Beyond PyBullet
Current architecture is tightly coupled to PyBullet (body_id, per-entity FFI calls). At 1000 agents PyBullet’s per-call overhead dominates step time (88% of 40.9 ms). The following items explore decoupling from PyBullet to unlock 10–100× performance gains while keeping the Python user API unchanged.
Phase 1: SimBackend ABC + Numpy Pure Kinematic Backend
SimBackend ABC — Abstract interface (
set_positions_batch,detect_collisions,load_model,step_physics) thatAgentandSimObjectprogram against instead of rawpybulletcallsNumpyBackend (default) — Contiguous numpy arrays for positions/orientations,
scipy.spatial.cKDTreefor collision. No physics engine dependency. Expected: 1000 agents in ~2–5 ms (RTF 20–50×), 5000+ agents at RTF > 1.0PyBulletBackend (compat) — Wraps existing PyBullet calls behind SimBackend ABC for backward compatibility and physics-mode users
URDF parsing without PyBullet — Use
yourdfpyor similar to parse URDF into internal model data, removing the last hard dependency on PyBullet for kinematic-only modeVisualization decoupling — Replace
p.GUIwith Rerun, Open3D, or RViz for rendering. Backend-agnostic scene display
Phase 2: Native Backend (Rust/C++ via PyO3/pybind11)
Only justified if Phase 1 numpy performance is insufficient (e.g., 5000+ agents at 240 Hz).
Rust kinematic core — Position update, yaw integration, AABB collision in Rust. Exposed to Python via PyO3. Expected: further 3–10× over numpy (RTF 100–300× for 1000 agents)
Batch collision in Rust — Sweep-and-prune or spatial hash for O(n log n) broad-phase, replacing Python KDTree
Zero-copy interop — Share numpy arrays directly with Rust via buffer protocol (no serialization)
Phase 3: GPU Backend (optional)
For 10,000+ agent scenarios or RL training workloads.
MuJoCo MJX backend — JAX-accelerated batch physics on GPU. URDF via MJCF conversion
JAX pure kinematic —
jax.numpydrop-in for NumpyBackend, JIT-compiled, GPU-parallel
Phase 4: ECS Architecture (v2.0 candidate)
Considered when entity diversity explodes beyond what Agent/SimObject OOP can express cleanly (e.g., drones + ground robots + conveyors + elevators + dynamic obstacles all in one scene with distinct component sets).
Component-based entity model — Replace Agent/SimObject inheritance with composable components (Transform, Collision, Kinematics, JointState, ActionQueue, etc.)
System pipeline — Each system (MovementSystem, CollisionSystem, ActionSystem) operates on component arrays. Maps naturally from current Controller ABC → System
Rust ECS runtime — Leverage
bevy_ecsorhecsfor cache-friendly SoA memory layout. Python API via PyO3 for user-facing logicMigration path — Controller ABC → System, EventBus → ECS Events, Registry → ECS resource/archetype queries. Plugin Architecture phases are designed as stepping stones toward this
Note: At this point the project may evolve beyond “PyBulletFleet” in name, as PyBullet would be just one optional backend among several, and the core would no longer depend on it. A rename (e.g., FleetSim, KinematicFleet) may be appropriate.