# Roadmap

Planned additions and improvements for PyBulletFleet.
Items are grouped by category; ordering within a group does not imply priority.


## Assets

New robot and infrastructure models:

- **Physics Mobile Robot** — Wheeled robot driven by PyBullet physics (motor torques, friction, contact forces)
- **Physics Mobile Manipulator** — Physics-mode mobile manipulator with motor-driven base and arm
- **Conveyor / Elevator / Mobile Rack** — Warehouse infrastructure entities for material handling scenarios

## Features

- **Snapshot & Replay** — Full and delta snapshot serialization for logging, replay, and external synchronization ([USO](https://github.com/yuokamoto/Unified-Simulation-Orchestrator) integration)
- **Behavior tree integration** — Create agent behavior from behavior trees
- **`SimObject.from_sdf()` → `List[SimObject]`** — Factory method that loads an SDF file via `p.loadSDF()` and wraps each returned body_id in a `SimObject`. Collision detection and lifecycle management via `add_object()` are applied automatically. Required for Open-RMF SDF environment loading and official support for pybullet_data SDF models (kiva_shelf, wsg50_gripper, etc.). Currently the catalog demo calls raw `p.loadSDF()` directly.

## Interfaces

External communication layers:

- **ROS 2** — Topic / service / action bridge for ROS 2 ecosystem integration
- **gRPC** — Language-agnostic RPC interface for orchestrators, WMS, and fleet managers

## Refactoring

- **Remove scipy dependency** — Currently only `scipy.spatial.transform.Rotation` is used (9 call sites for quat↔euler, quat↔matrix, relative rotation). Replace with PyBullet utilities + lightweight helpers in `geometry.py` to eliminate the ~150 MB transitive dependency. Low priority: no runtime performance impact, only install size.

- **Manual quaternion helpers in `geometry.py`** — Extend the pattern established by `SlerpPrecomp` / `quat_slerp` (avoiding scipy, hand-written scalar math) by adding the following helpers to `geometry.py`. The goal is to eliminate scipy `Rotation` object creation overhead on hot paths.

  Required helper functions:
  - `quat_rotate_vector(q, v)` — Rotate vector `v` by quaternion `q` (body→world transform). Replaces `Rotation.from_quat(q).apply(v)`
  - `quat_multiply(q1, q2)` — Quaternion product. Replaces `(Rotation.from_quat(q1) * Rotation.from_quat(q2)).as_quat()`
  - `quat_from_rotvec(rotvec)` — Quaternion from rotation vector. Replaces `Rotation.from_rotvec(rotvec).as_quat()` (used in angular velocity → orientation update)

  Primary application sites:
  - `OmniVelocityController._apply_velocity()` — body→world velocity transform + quaternion update from angular velocity
  - `tools.body_to_world_velocity_3d()` — same as above
  - `Path._calculate_orientation_for_plane()` — rotation matrix → quaternion conversion
  - `Path.visualize_waypoints()` — quaternion → rotation matrix conversion

  Design policy:
  - Pure Python scalar math (`math.sin`/`math.cos`) or small numpy array ops
  - Same file and style as `SlerpPrecomp`
  - Apply incrementally after profiling confirms bottleneck (YAGNI)
  - Full scipy removal in a separate PR after all call sites are replaced

## Performance

Near-term optimizations within the current PyBullet-backed architecture.
Goal: reduce per-step cost at 100–1000 agents without a full backend swap (see Long-Term section for that).

### Profiling Baseline (measured 2026-04-03)

Benchmark: 500 agents, omnidirectional MoveAction, `collision_check_frequency=0` (disabled), `physics=False`, `simple_cube.urdf`.

**Per-step breakdown (median):**

| Component | Time | % of total | What |
|---|---|---|---|
| Python overhead (TPI + slerp + action queue + set_pose logic) | 11.5 ms | 88% | CPython for-loop + object dispatch |
| `p.resetBasePositionAndOrientation` | 0.7 ms | 5% | PyBullet C API |
| `p.getAABB` | 0.5 ms | 4% | PyBullet C API |
| Movement detection | 0.3 ms | 3% | Pure Python arithmetic |
| **Total** | **13.0 ms** | 100% | **FPS 77** |

Key insight: **88% of step time is Pure Python overhead**, not C API calls.

Collision at 10 Hz adds only ~0.2 ms at 500 agents — negligible compared to agent_update.

**Vectorization micro-benchmark (500 agents):**

| Operation | Python for-loop | NumPy vectorized | Speedup |
|---|---|---|---|
| Position compute (`start + dir × ratio`) | 1,150 μs | 5.6 μs | **205×** |
| TPI-like trapezoidal profile | (per-agent) | 33 μs | — |
| Per-agent cost | 23 μs/agent | ~0.01 μs/agent | — |

### Two-Phase Step: Decouple Computation from PyBullet C API

Current `step_once()` iterates agents one-by-one, each calling `controller.compute()` (Python/NumPy) → `set_pose_raw()` (PyBullet C API + AABB update + spatial grid) interleaved. This prevents vectorization and adds per-agent Python↔C crossing overhead.

**Proposed split:**

| Phase | What | Hot path |
|-------|------|----------|
| **Phase 1 — Compute** | All controllers compute new poses; no side effects | Pure Python / NumPy |
| **Phase 2 — Apply** | Tight loop of `p.resetBasePositionAndOrientation()` only | PyBullet C calls |
| **Phase 3 — Bookkeep** | Batch AABB refresh (`p.performCollisionDetection()` + `p.getAABB()`) and spatial grid update | C calls + Python dict |

This requires **removing direct `pybullet` API calls from `sim_object.py`, `agent.py`, and `controller.py`**. Instead, these modules produce *pose intents* (position + orientation tuples), and `core_simulation.py` flushes them to PyBullet in bulk.

Key changes:
- `SimObject.set_pose()` / `set_pose_raw()` writes to an internal buffer (cached pose + dirty flag) without calling `p.resetBasePositionAndOrientation()`
- `Controller.compute()` returns `(new_pos, new_orn)` or writes to agent's pending pose buffer
- `core_simulation.step_once()` collects dirty poses → batch `resetBasePositionAndOrientation` → batch AABB update
- Movement detection stays pure Python (already cached-pose-based), unaffected
- Attached-object propagation runs after Phase 2 using the buffered parent poses

### Vectorized Agent Update (NumPy Batch)

For the common "N agents on straight-line TPI paths" case, Phase 1 can be further vectorized:

- Store all active agents' `forward_start_pos`, `forward_direction`, and TPI parameters in contiguous `(N, 3)` NumPy arrays
- Compute `new_positions = start_positions + directions * ratios[:, np.newaxis]` in one vectorized call
- Slerp batch: pre-compute all `(start_quat, target_quat, t_fraction)` and batch `quat_slerp`
- Fallback: agents with non-standard controllers (velocity mode, custom callbacks) use the existing per-agent path

This is a "BatchController" or "VectorizedOmniController" that sits alongside the existing `Controller` ABC.

### C++ Extensions for Hot-Path Functions

Profile-guided candidates for C++ (via pybind11) or Cython acceleration:

| Function | Current | Why C++ helps |
|----------|---------|---------------|
| **TwoPointInterpolation** | Pure Python `math` | Called N× per step; tight numerical loop ideal for native code |
| **`quat_slerp` / `quat_slerp_precompute`** | Python scalar math in `geometry.py` | N× per step; SIMD-friendly |
| **Spatial hash broad-phase** | Python dict + set ops in `check_collisions()` | Dict overhead at 1000+ objects; Rust/C++ hash map faster |
| **AABB overlap test** | Python comparisons in `_aabb_overlap_2d` | Tight inner loop; autovectorizable in C++ |
| **`getClosestPoints` narrow-phase** | PyBullet C API (already native) | Already fast; not a candidate |

Priority: TPI and slerp first (highest call frequency), then spatial hash (scales with agent count²).

### Deferred AABB Update

Currently `set_pose()` calls `p.getAABB()` and updates the spatial grid **per object, immediately**. For kinematic-only mode:

- Defer all AABB updates to a single `p.performCollisionDetection()` call after Phase 2
- Batch `p.getAABB()` for all moved objects at once
- Rebuild spatial grid once per step instead of incrementally per-object

This removes N `p.getAABB()` C-API round-trips from the set_pose hot path.

### Summary: Expected Impact (500 agents, measured baseline)

| Optimization | Estimated step time | Estimated FPS | Speedup vs current | Effort |
|---|---|---|---|---|
| **Current** | 13.0 ms | 77 | 1.0× | — |
| **NumPy vectorized controller** | ~1.2 ms | ~850 | **~11×** | Medium |
| **+ C++ TPI/slerp extensions** | ~1.0 ms | ~1,000 | **~13×** | Medium |
| **Theoretical floor** (C API only) | 0.7 ms | ~1,400 | ~18× | — |

Scaling by agent count (current → vectorized estimate):

| Agents | Current FPS | Est. vectorized FPS | Speedup |
|---|---|---|---|
| 100 | 225 | ~2,000+ | ~9× |
| 500 | 77 | ~850 | ~11× |
| 1,000 | 27 | ~450 | ~17× |

> **Note:** Collision detection (10 Hz spatial hash) adds <1% overhead at 500 agents. C++ spatial hash becomes relevant at 1,000+ agents with higher collision frequencies.

> **Relation to Long-Term Backend Abstraction:** The two-phase split and the "remove direct pybullet calls" refactoring are natural stepping stones toward the SimBackend ABC (Phase 1 of Long-Term). The buffered-pose pattern becomes the write side of `SimBackend.set_positions_batch()`.

## Environments

Simulation environment assets (warehouse floors, factory layouts, etc.):

- **`pybullet-fleet-environments` package** — Manage environment assets in a separate repository, installable via `pip install pybullet-fleet-environments` for on-demand retrieval. Keep PyBulletFleet core lightweight by not bundling meshes.
  - AWS RoboMaker Small Warehouse (MIT-0): DAE→OBJ converted meshes + URDF wrappers
  - Open-RMF rmf_demos maps (office, hotel, clinic, airport, campus): OBJ mesh export
  - pybullet_data bundled environments (kiva_shelf, samurai, stadium) wrappers
  - Original license clearly noted per environment
- **`resolve_environment()` API** — Name resolution similar to `resolve_urdf()` for loading environments. Shows install hints when not installed.

## CI / DevOps

- **GitHub Actions refactoring** — Streamlined CI pipeline
- **Automated performance tracking** — Run time / memory benchmarks in CI, auto-update results in documentation, and alert on significant performance regressions

## Long-Term: Backend Abstraction & Beyond PyBullet

Current architecture is tightly coupled to PyBullet (`body_id`, per-entity FFI calls). At 1000 agents PyBullet's per-call overhead dominates step time (88% of 40.9 ms). The following items explore decoupling from PyBullet to unlock 10–100× performance gains while keeping the Python user API unchanged.

### Phase 1: SimBackend ABC + Numpy Pure Kinematic Backend

- **SimBackend ABC** — Abstract interface (`set_positions_batch`, `detect_collisions`, `load_model`, `step_physics`) that `Agent` and `SimObject` program against instead of raw `pybullet` calls
- **NumpyBackend (default)** — Contiguous numpy arrays for positions/orientations, `scipy.spatial.cKDTree` for collision. No physics engine dependency. Expected: 1000 agents in ~2–5 ms (RTF 20–50×), 5000+ agents at RTF > 1.0
- **PyBulletBackend (compat)** — Wraps existing PyBullet calls behind SimBackend ABC for backward compatibility and physics-mode users
- **URDF parsing without PyBullet** — Use `yourdfpy` or similar to parse URDF into internal model data, removing the last hard dependency on PyBullet for kinematic-only mode
- **Visualization decoupling** — Replace `p.GUI` with Rerun, Open3D, or RViz for rendering. Backend-agnostic scene display

### Phase 2: Native Backend (Rust/C++ via PyO3/pybind11)

Only justified if Phase 1 numpy performance is insufficient (e.g., 5000+ agents at 240 Hz).

- **Rust kinematic core** — Position update, yaw integration, AABB collision in Rust. Exposed to Python via PyO3. Expected: further 3–10× over numpy (RTF 100–300× for 1000 agents)
- **Batch collision in Rust** — Sweep-and-prune or spatial hash for O(n log n) broad-phase, replacing Python KDTree
- **Zero-copy interop** — Share numpy arrays directly with Rust via buffer protocol (no serialization)

### Phase 3: GPU Backend (optional)

For 10,000+ agent scenarios or RL training workloads.

- **MuJoCo MJX backend** — JAX-accelerated batch physics on GPU. URDF via MJCF conversion
- **JAX pure kinematic** — `jax.numpy` drop-in for NumpyBackend, JIT-compiled, GPU-parallel

### Phase 4: ECS Architecture (v2.0 candidate)

Considered when entity diversity explodes beyond what Agent/SimObject OOP can express cleanly (e.g., drones + ground robots + conveyors + elevators + dynamic obstacles all in one scene with distinct component sets).

- **Component-based entity model** — Replace Agent/SimObject inheritance with composable components (Transform, Collision, Kinematics, JointState, ActionQueue, etc.)
- **System pipeline** — Each system (MovementSystem, CollisionSystem, ActionSystem) operates on component arrays. Maps naturally from current Controller ABC → System
- **Rust ECS runtime** — Leverage `bevy_ecs` or `hecs` for cache-friendly SoA memory layout. Python API via PyO3 for user-facing logic
- **Migration path** — Controller ABC → System, EventBus → ECS Events, Registry → ECS resource/archetype queries. Plugin Architecture phases are designed as stepping stones toward this

> **Note:** At this point the project may evolve beyond "PyBulletFleet" in name, as PyBullet would be just one optional backend among several, and the core would no longer depend on it. A rename (e.g., *FleetSim*, *KinematicFleet*) may be appropriate.