# Roadmap Planned additions and improvements for PyBulletFleet. Items are grouped by category; ordering within a group does not imply priority. ## Assets New robot and infrastructure models: - **Physics Mobile Robot** — Wheeled robot driven by PyBullet physics (motor torques, friction, contact forces) - **Physics Mobile Manipulator** — Physics-mode mobile manipulator with motor-driven base and arm - **Conveyor / Elevator / Mobile Rack** — Warehouse infrastructure entities for material handling scenarios ## Features - **Snapshot & Replay** — Full and delta snapshot serialization for logging, replay, and external synchronization ([USO](https://github.com/yuokamoto/Unified-Simulation-Orchestrator) integration) - **Behavior tree integration** — Create agent behavior from behavior trees - **`SimObject.from_sdf()` → `List[SimObject]`** — Factory method that loads an SDF file via `p.loadSDF()` and wraps each returned body_id in a `SimObject`. Collision detection and lifecycle management via `add_object()` are applied automatically. Required for Open-RMF SDF environment loading and official support for pybullet_data SDF models (kiva_shelf, wsg50_gripper, etc.). Currently the catalog demo calls raw `p.loadSDF()` directly. ## Interfaces External communication layers: - **ROS 2** — Topic / service / action bridge for ROS 2 ecosystem integration - **gRPC** — Language-agnostic RPC interface for orchestrators, WMS, and fleet managers ## Refactoring - **Remove scipy dependency** — Currently only `scipy.spatial.transform.Rotation` is used (9 call sites for quat↔euler, quat↔matrix, relative rotation). Replace with PyBullet utilities + lightweight helpers in `geometry.py` to eliminate the ~150 MB transitive dependency. Low priority: no runtime performance impact, only install size. - **Manual quaternion helpers in `geometry.py`** — Extend the pattern established by `SlerpPrecomp` / `quat_slerp` (avoiding scipy, hand-written scalar math) by adding the following helpers to `geometry.py`. The goal is to eliminate scipy `Rotation` object creation overhead on hot paths. Required helper functions: - `quat_rotate_vector(q, v)` — Rotate vector `v` by quaternion `q` (body→world transform). Replaces `Rotation.from_quat(q).apply(v)` - `quat_multiply(q1, q2)` — Quaternion product. Replaces `(Rotation.from_quat(q1) * Rotation.from_quat(q2)).as_quat()` - `quat_from_rotvec(rotvec)` — Quaternion from rotation vector. Replaces `Rotation.from_rotvec(rotvec).as_quat()` (used in angular velocity → orientation update) Primary application sites: - `OmniVelocityController._apply_velocity()` — body→world velocity transform + quaternion update from angular velocity - `tools.body_to_world_velocity_3d()` — same as above - `Path._calculate_orientation_for_plane()` — rotation matrix → quaternion conversion - `Path.visualize_waypoints()` — quaternion → rotation matrix conversion Design policy: - Pure Python scalar math (`math.sin`/`math.cos`) or small numpy array ops - Same file and style as `SlerpPrecomp` - Apply incrementally after profiling confirms bottleneck (YAGNI) - Full scipy removal in a separate PR after all call sites are replaced ## Performance Near-term optimizations within the current PyBullet-backed architecture. Goal: reduce per-step cost at 100–1000 agents without a full backend swap (see Long-Term section for that). ### Profiling Baseline (measured 2026-04-03) Benchmark: 500 agents, omnidirectional MoveAction, `collision_check_frequency=0` (disabled), `physics=False`, `simple_cube.urdf`. **Per-step breakdown (median):** | Component | Time | % of total | What | |---|---|---|---| | Python overhead (TPI + slerp + action queue + set_pose logic) | 11.5 ms | 88% | CPython for-loop + object dispatch | | `p.resetBasePositionAndOrientation` | 0.7 ms | 5% | PyBullet C API | | `p.getAABB` | 0.5 ms | 4% | PyBullet C API | | Movement detection | 0.3 ms | 3% | Pure Python arithmetic | | **Total** | **13.0 ms** | 100% | **FPS 77** | Key insight: **88% of step time is Pure Python overhead**, not C API calls. Collision at 10 Hz adds only ~0.2 ms at 500 agents — negligible compared to agent_update. **Vectorization micro-benchmark (500 agents):** | Operation | Python for-loop | NumPy vectorized | Speedup | |---|---|---|---| | Position compute (`start + dir × ratio`) | 1,150 μs | 5.6 μs | **205×** | | TPI-like trapezoidal profile | (per-agent) | 33 μs | — | | Per-agent cost | 23 μs/agent | ~0.01 μs/agent | — | ### Two-Phase Step: Decouple Computation from PyBullet C API Current `step_once()` iterates agents one-by-one, each calling `controller.compute()` (Python/NumPy) → `set_pose_raw()` (PyBullet C API + AABB update + spatial grid) interleaved. This prevents vectorization and adds per-agent Python↔C crossing overhead. **Proposed split:** | Phase | What | Hot path | |-------|------|----------| | **Phase 1 — Compute** | All controllers compute new poses; no side effects | Pure Python / NumPy | | **Phase 2 — Apply** | Tight loop of `p.resetBasePositionAndOrientation()` only | PyBullet C calls | | **Phase 3 — Bookkeep** | Batch AABB refresh (`p.performCollisionDetection()` + `p.getAABB()`) and spatial grid update | C calls + Python dict | This requires **removing direct `pybullet` API calls from `sim_object.py`, `agent.py`, and `controller.py`**. Instead, these modules produce *pose intents* (position + orientation tuples), and `core_simulation.py` flushes them to PyBullet in bulk. Key changes: - `SimObject.set_pose()` / `set_pose_raw()` writes to an internal buffer (cached pose + dirty flag) without calling `p.resetBasePositionAndOrientation()` - `Controller.compute()` returns `(new_pos, new_orn)` or writes to agent's pending pose buffer - `core_simulation.step_once()` collects dirty poses → batch `resetBasePositionAndOrientation` → batch AABB update - Movement detection stays pure Python (already cached-pose-based), unaffected - Attached-object propagation runs after Phase 2 using the buffered parent poses ### Vectorized Agent Update (NumPy Batch) For the common "N agents on straight-line TPI paths" case, Phase 1 can be further vectorized: - Store all active agents' `forward_start_pos`, `forward_direction`, and TPI parameters in contiguous `(N, 3)` NumPy arrays - Compute `new_positions = start_positions + directions * ratios[:, np.newaxis]` in one vectorized call - Slerp batch: pre-compute all `(start_quat, target_quat, t_fraction)` and batch `quat_slerp` - Fallback: agents with non-standard controllers (velocity mode, custom callbacks) use the existing per-agent path This is a "BatchController" or "VectorizedOmniController" that sits alongside the existing `Controller` ABC. ### C++ Extensions for Hot-Path Functions Profile-guided candidates for C++ (via pybind11) or Cython acceleration: | Function | Current | Why C++ helps | |----------|---------|---------------| | **TwoPointInterpolation** | Pure Python `math` | Called N× per step; tight numerical loop ideal for native code | | **`quat_slerp` / `quat_slerp_precompute`** | Python scalar math in `geometry.py` | N× per step; SIMD-friendly | | **Spatial hash broad-phase** | Python dict + set ops in `check_collisions()` | Dict overhead at 1000+ objects; Rust/C++ hash map faster | | **AABB overlap test** | Python comparisons in `_aabb_overlap_2d` | Tight inner loop; autovectorizable in C++ | | **`getClosestPoints` narrow-phase** | PyBullet C API (already native) | Already fast; not a candidate | Priority: TPI and slerp first (highest call frequency), then spatial hash (scales with agent count²). ### Deferred AABB Update Currently `set_pose()` calls `p.getAABB()` and updates the spatial grid **per object, immediately**. For kinematic-only mode: - Defer all AABB updates to a single `p.performCollisionDetection()` call after Phase 2 - Batch `p.getAABB()` for all moved objects at once - Rebuild spatial grid once per step instead of incrementally per-object This removes N `p.getAABB()` C-API round-trips from the set_pose hot path. ### Summary: Expected Impact (500 agents, measured baseline) | Optimization | Estimated step time | Estimated FPS | Speedup vs current | Effort | |---|---|---|---|---| | **Current** | 13.0 ms | 77 | 1.0× | — | | **NumPy vectorized controller** | ~1.2 ms | ~850 | **~11×** | Medium | | **+ C++ TPI/slerp extensions** | ~1.0 ms | ~1,000 | **~13×** | Medium | | **Theoretical floor** (C API only) | 0.7 ms | ~1,400 | ~18× | — | Scaling by agent count (current → vectorized estimate): | Agents | Current FPS | Est. vectorized FPS | Speedup | |---|---|---|---| | 100 | 225 | ~2,000+ | ~9× | | 500 | 77 | ~850 | ~11× | | 1,000 | 27 | ~450 | ~17× | > **Note:** Collision detection (10 Hz spatial hash) adds <1% overhead at 500 agents. C++ spatial hash becomes relevant at 1,000+ agents with higher collision frequencies. > **Relation to Long-Term Backend Abstraction:** The two-phase split and the "remove direct pybullet calls" refactoring are natural stepping stones toward the SimBackend ABC (Phase 1 of Long-Term). The buffered-pose pattern becomes the write side of `SimBackend.set_positions_batch()`. ## Environments Simulation environment assets (warehouse floors, factory layouts, etc.): - **`pybullet-fleet-environments` package** — Manage environment assets in a separate repository, installable via `pip install pybullet-fleet-environments` for on-demand retrieval. Keep PyBulletFleet core lightweight by not bundling meshes. - AWS RoboMaker Small Warehouse (MIT-0): DAE→OBJ converted meshes + URDF wrappers - Open-RMF rmf_demos maps (office, hotel, clinic, airport, campus): OBJ mesh export - pybullet_data bundled environments (kiva_shelf, samurai, stadium) wrappers - Original license clearly noted per environment - **`resolve_environment()` API** — Name resolution similar to `resolve_urdf()` for loading environments. Shows install hints when not installed. ## CI / DevOps - **GitHub Actions refactoring** — Streamlined CI pipeline - **Automated performance tracking** — Run time / memory benchmarks in CI, auto-update results in documentation, and alert on significant performance regressions ## Long-Term: Backend Abstraction & Beyond PyBullet Current architecture is tightly coupled to PyBullet (`body_id`, per-entity FFI calls). At 1000 agents PyBullet's per-call overhead dominates step time (88% of 40.9 ms). The following items explore decoupling from PyBullet to unlock 10–100× performance gains while keeping the Python user API unchanged. ### Phase 1: SimBackend ABC + Numpy Pure Kinematic Backend - **SimBackend ABC** — Abstract interface (`set_positions_batch`, `detect_collisions`, `load_model`, `step_physics`) that `Agent` and `SimObject` program against instead of raw `pybullet` calls - **NumpyBackend (default)** — Contiguous numpy arrays for positions/orientations, `scipy.spatial.cKDTree` for collision. No physics engine dependency. Expected: 1000 agents in ~2–5 ms (RTF 20–50×), 5000+ agents at RTF > 1.0 - **PyBulletBackend (compat)** — Wraps existing PyBullet calls behind SimBackend ABC for backward compatibility and physics-mode users - **URDF parsing without PyBullet** — Use `yourdfpy` or similar to parse URDF into internal model data, removing the last hard dependency on PyBullet for kinematic-only mode - **Visualization decoupling** — Replace `p.GUI` with Rerun, Open3D, or RViz for rendering. Backend-agnostic scene display ### Phase 2: Native Backend (Rust/C++ via PyO3/pybind11) Only justified if Phase 1 numpy performance is insufficient (e.g., 5000+ agents at 240 Hz). - **Rust kinematic core** — Position update, yaw integration, AABB collision in Rust. Exposed to Python via PyO3. Expected: further 3–10× over numpy (RTF 100–300× for 1000 agents) - **Batch collision in Rust** — Sweep-and-prune or spatial hash for O(n log n) broad-phase, replacing Python KDTree - **Zero-copy interop** — Share numpy arrays directly with Rust via buffer protocol (no serialization) ### Phase 3: GPU Backend (optional) For 10,000+ agent scenarios or RL training workloads. - **MuJoCo MJX backend** — JAX-accelerated batch physics on GPU. URDF via MJCF conversion - **JAX pure kinematic** — `jax.numpy` drop-in for NumpyBackend, JIT-compiled, GPU-parallel ### Phase 4: ECS Architecture (v2.0 candidate) Considered when entity diversity explodes beyond what Agent/SimObject OOP can express cleanly (e.g., drones + ground robots + conveyors + elevators + dynamic obstacles all in one scene with distinct component sets). - **Component-based entity model** — Replace Agent/SimObject inheritance with composable components (Transform, Collision, Kinematics, JointState, ActionQueue, etc.) - **System pipeline** — Each system (MovementSystem, CollisionSystem, ActionSystem) operates on component arrays. Maps naturally from current Controller ABC → System - **Rust ECS runtime** — Leverage `bevy_ecs` or `hecs` for cache-friendly SoA memory layout. Python API via PyO3 for user-facing logic - **Migration path** — Controller ABC → System, EventBus → ECS Events, Registry → ECS resource/archetype queries. Plugin Architecture phases are designed as stepping stones toward this > **Note:** At this point the project may evolve beyond "PyBulletFleet" in name, as PyBullet would be just one optional backend among several, and the core would no longer depend on it. A rename (e.g., *FleetSim*, *KinematicFleet*) may be appropriate.