# Benchmark Results Reference

Results are kept up-to-date with each release. For how to reproduce them,
see [Benchmark Suite](benchmark-suite) and [Profiling Guide](profiling-guide).

---

## Test Environment

| Item | Value |
|------|-------|
| OS | Ubuntu 20.04 (Linux 5.15.0) |
| CPU | Intel Core i7-1185G7 @ 3.00 GHz (4 cores / 8 logical) |
| RAM | 31 GB |
| Python | 3.8.10 |
| Conditions | Headless (`gui=false`), `physics=false`, `timestep=0.1`, 3 repetitions |

---

## Simulation Throughput

**Script:** `make bench-full` → `benchmark/run_benchmark.py --sweep 100 500 1000`
**Config:** `benchmark/configs/general.yaml` — `collision_check_frequency=null` (every step), 50% agents moving
**Last measured:** 2026-04-08

| Agents | Step Time (ms) | RTF   | Spawn Time | Memory Delta |
|--------|----------------|-------|------------|--------------|
| 100    | 2.17 ± 0.10    | 46.1× | 26 ms      | −24.8 MB     |
| 250    | 6.45 ± 0.30    | 16×   | 65 ms      | −19.7 MB †   |
| 500    | 13.21 ± 0.21   | 7.6×  | 137 ms     | −15.3 MB     |
| 1000   | 29.98 ± 0.39   | 3.3×  | 285 ms     | −3.2 MB      |
| 2000   | 94.82 ± 5.81   | 1.1×  | 731 ms     | +29.6 MB †   |

† Row from 2026-03-12; not included in `make bench-full` (100/500/1000 only).

**Source:** `benchmark/results/benchmark_sweep_10.0s.json`

**Memory note:** Negative delta below ~1000 agents is a Python GC artifact. Actual per-agent overhead is ~15 KB above 1000 agents (linear).

**Scalability:** O(n^1.3) — near-linear up to ~500 agents, slightly super-linear above.

---

## Step Time Component Breakdown

**Script:** `benchmark/profiling/simulation_profiler.py --test builtin --agents 500 --steps 500`

| Component | Share | Notes |
|-----------|-------|-------|
| Agent Update | ~81% | Dominant cost; much higher for moving than stationary |
| Collision Check | ~18% | Periodic (bursty); minimal cost on non-check steps |
| Monitor Update | ~1% | Near-zero if monitor disabled |
| Step Simulation | 0% | Physics off; up to ~40% with physics on |

> **Note on variance:** Profiler timestamps show high variance (stdev > mean) because collision checks
> fire in bursts and some steps include PyBullet warmup or GC pauses. The percentages above are
> representative of steady-state operation.

---

## Agent Update Cost

**Script:** `benchmark/profiling/agent_update.py --agents 500`

### Stationary vs Moving (500 agents)

| State | Total time | Per agent |
|-------|-----------|-----------|
| Stationary | 0.29 ms | 0.57 μs |
| Moving | 61.27 ms | 122.5 μs |
| Ratio | **214×** | |

**→ Design decision:** Benchmarks use 50% moving agents as a representative workload.
Moving agent cost dominates: kinematics, trajectory following, and PyBullet API calls
only execute for agents with an active goal.

### Motion Mode Comparison (500 agents, all moving)

| Mode | Total time | Per agent | Relative |
|------|-----------|-----------|---------|
| OMNIDIRECTIONAL | 9.26 ms | 18.5 μs | 1× (baseline) |
| DIFFERENTIAL | 26.66 ms | 53.3 μs | 2.9× slower |

**→ Optimisation note (2026-03-12):** DIFFERENTIAL was previously ~5× slower (90.6 μs/agent).
Replacing scipy Slerp with a precomputed quaternion slerp and removing per-tick numpy allocations
reduced the ROTATE phase from 37 μs to 5 μs. See [Differential Drive Optimization](#differential-drive-optimization) below.

**→ Design decision:** Benchmarks default to OMNIDIRECTIONAL. DIFFERENTIAL requires
heading alignment computation which adds ~2.9× update cost.

---

## Wrapper Layer Overhead

**Script:** `benchmark/profiling/wrapper_overhead.py --n 500 --reps 3`
**Conditions:** 500 objects, process-isolated per layer

| Layer | Spawn time | Update time (get+set pose) | Memory (RSS delta) |
|-------|-----------|---------------------------|-------------------|
| Direct PyBullet (baseline) | 63 ms | 2.21 ms | 5.2 MB |
| SimObject | 106 ms (+68%) | 6.80 ms (+3.1×) | 9.2 MB |
| SimObjectManager | 207 ms (+229%) | 10.75 ms (+4.9×) | 9.3 MB |
| Agent | 237 ms (+276%) | 10.60 ms (+4.8×) | 11.5 MB |
| AgentManager | 255 ms (+305%) | 10.20 ms (+4.6×) | 11.6 MB |

**Extrapolated to 10,000 objects (production feasibility, thresholds: spawn < 10 s, update < 150 ms/step, memory < +200 MB):**

| Layer | Spawn | Update/step | Memory | Pass? |
|-------|-------|-------------|--------|-------|
| SimObject | 2.1 s | 136 ms | +81 MB | ✅ all pass |
| Agent | 4.7 s | 212 ms | +127 MB | spawn/mem ✅, update ❌ |

**→ Design decision:** SimObject layer is production-viable at 10,000 scale.
Agent layer exceeds the 150 ms/step update threshold at 10,000 (extrapolated 212 ms),
but comfortably passes at 5,000 or below. For ≥10,000 agents, consider using SimObject
directly or batching update calls.

---

## Collision Mode Comparison (DISABLED / NORMAL_2D / NORMAL_3D)

**Script:** `benchmark/experiments/collision_mode_comparison.py --agents 500 --iterations 200`

| Mode | Step time (mean) | Collision time | vs Disabled |
|------|-----------------|---------------|-------------|
| DISABLED | 1.37 ms | — | baseline |
| NORMAL_2D (9 neighbors) | 2.43 ms | 0.21 ms | +77% |
| NORMAL_3D (27 neighbors) | 2.26 ms | 0.39 ms | +64% |

Collision check breakdown (both modes): AABB Filtering accounts for ~97–99% of collision check time.
Spatial Hashing and Contact Points together are <3%.

**→ Design decision:** NORMAL_2D yields only modest speedup (7%) over NORMAL_3D at this scale
because AABB filtering dominates in both. Use NORMAL_2D for ground robots where Z-axis neighbors
are irrelevant for correctness, not for dramatic performance gains.

---

## Collision Algorithm Comparison (Spatial Hashing vs Alternatives)

**Script:** `benchmark/experiments/collision_method_comparison.py --agents 100,500 --iterations 100`
**Conditions:** spacing=0.08m (dense), kinematic objects (mass=0)

| Method | 100 agents | 500 agents | Collisions detected | Valid? |
|--------|-----------|-----------|---------------------|--------|
| Spatial Hashing (current) | 8.0 ms | 92.1 ms | ✅ correct | ✅ |
| Brute Force AABB | 8.7 ms | 231.9 ms (2.5×) | ✅ correct | ✅ |
| `getClosestPoints` all-pairs | 23.0 ms | 370.6 ms (4.0×) | ✅ correct | ✅ |
| `getContactPoints()` no-args | 0.06 ms | 0.24 ms | ❌ 0 — invalid | ❌ |
| `getContactPoints(A,B)` pairwise | 12.1 ms | 190.3 ms | ❌ 0 — invalid | ❌ |

**Key finding:** `getContactPoints` variants return 0 collisions for kinematic objects (mass=0).
PyBullet's contact point solver only activates between physics-enabled bodies.
Spatial hashing with `getClosestPoints` is the only valid fast method for kinematic AGV-type robots.

**→ Design decision:** Spatial hashing (O(N) average) is used because:
1. It is the only algorithmically valid method for kinematic robots.
2. It is 2.5–4× faster than the valid alternatives at 500 agents.

---

## Differential Drive Optimization

**Script:** `benchmark/profiling/differential_breakdown.py`
**Conditions:** 1000 agents, DIFFERENTIAL mode (ROTATE phase)

The differential drive ROTATE phase was the single most expensive per-agent per-tick operation.
Profiling identified two root causes:

| Operation | Before | After | Speedup |
|-----------|--------|-------|---------|
| Quaternion slerp (scipy Slerp → `_quat_slerp`) | 26.8 μs | 2.3 μs | **11.8×** |
| Scalar clip (`np.clip` → `max/min`) | 6.8 μs | 0.15 μs | **44×** |
| Dead `np.array()` allocations (P0) | ~0.6 μs | 0 μs | eliminated |
| Redundant `np.dot` in slerp (P1) | ~0.3 μs | 0 μs | eliminated |
| Velocity array reallocation (P2) | ~0.3 μs | 0 μs | in-place |
| FORWARD direction recompute (P3) | ~0.5 μs | 0 μs | cached |
| **ROTATE phase total** | **36.1 μs** | **4.9 μs** | **7.3×** |

**End-to-end impact (1000 agents, differential):**

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Per-agent update | 57 μs | 12.4 μs | 4.6× faster |
| DIFFERENTIAL / OMNIDIRECTIONAL ratio | 5.0× | 2.9× | Closer to parity |
| 500-agent ROTATE frame time | 18.0 ms | 2.5 ms | −15.6 ms/frame |

**Why was scipy Slerp slow?**

`scipy.spatial.transform.Slerp` is a generic array-oriented API designed for batch interpolation
(e.g., `slerp(np.linspace(0, 1, 10000))`). When called with a single scalar `t` per agent per tick,
the fixed overhead — input normalisation, dtype promotion, Rotation object wrapping — dominates
the actual trigonometry. The replacement `_quat_slerp` sidesteps this by precomputing
`dot/theta/sin(theta)` once per waypoint transition and using `math.sin`/`math.cos` (C-level scalar
functions) instead of numpy array dispatch.

Absolute timings vary by environment; the ratios above are the meaningful comparison metric.

---

## Collision Config-Based Comparison (Physics ON vs OFF)

**Script:** `benchmark/experiments/collision_methods_config_based.py`
**Conditions:** 100 objects, 500 steps

| Config | Physics | Method | Step time | Collision time |
|--------|---------|--------|-----------|---------------|
| `physics_off_closest.yaml` | OFF | `CLOSEST_POINTS` | 1.51 ms | 1.24 ms |
| `physics_on_contact.yaml` | ON | `CONTACT_POINTS` | 2.64 ms | 1.41 ms |
| `hybrid.yaml` | ON | `HYBRID` | 2.00 ms | 1.27 ms |

Physics ON adds ~75% step overhead due to the physics engine stepping each simulation step.

**→ Design decision:** Default is `physics=false` (kinematics mode) for maximum throughput.
Use `physics=true` only when rigid-body dynamics are required.

**Source:** `benchmark/results/collision_methods_config_based.txt`

---

## Arm Joint Control Performance

**Script:** `benchmark/profiling/arm_joint_update.py --test scaling`
**Conditions:** arm_robot.urdf (4 revolute joints), fixed-base, JointAction cycling, 100 steps per count

### Physics vs Kinematic Scaling

| Arms | Joints | Physics (ms/step) | Kinematic (ms/step) | Ratio |
|------|--------|--------------------|---------------------|-------|
| 1    | 4      | 0.029              | 0.010               | 2.8×  |
| 5    | 20     | 0.082              | 0.056               | 1.5×  |
| 10   | 40     | 0.152              | 0.090               | 1.7×  |
| 25   | 100    | 0.415              | 0.253               | 1.6×  |
| 50   | 200    | 0.886              | 0.552               | 1.6×  |

Kinematic mode is consistently faster than physics mode for joint control.
The gap comes from skipping `stepSimulation()` — kinematic mode uses `resetJointState()`
with per-step interpolation at URDF velocity limits.
Exact ratios are environment-dependent; the trend (kinematic faster) is consistent.

### Component Breakdown (10 arms)

**Script:** `benchmark/profiling/arm_joint_update.py --test builtin --arms 10`

| Component | Physics | Kinematic |
|-----------|---------|----------|
| agent_update | 27.0% (0.050 ms) | 96.7% (0.087 ms) |
| step_simulation | 69.6% (0.129 ms) | 0.2% (0.000 ms) |
| callbacks | 1.0% | 0.4% |
| **total** | **0.186 ms** | **0.090 ms** |

In physics mode, `stepSimulation()` dominates (70%). In kinematic mode, the physics engine
is bypassed entirely — agent_update (joint interpolation + `resetJointState`) is the sole cost.

### Kinematic Joint Cache Optimization

**Problem:** cProfile showed `p.getJointState()` consuming ~36% of kinematic update time
(called per-joint per-step to read current positions before interpolating).

**Solution:** `_kinematic_joint_positions` cache — joint positions stored in a Python dict,
initialized via batch `p.getJointStates()` at agent creation, updated after each `resetJointState()`.
`get_joint_state()` returns cached values for kinematic robots (zero PyBullet calls).

| Metric | Before cache | After cache | Improvement |
|--------|-------------|-------------|-------------|
| 50 arms step time | 0.826 ms | 0.552 ms | **1.5× faster** |
| `p.getJointState` calls/step | 200 (50 arms × 4 joints) | 0 | **eliminated** |

**→ Design decision:** Kinematic joint cache is always active for `mass=0.0` URDF robots.
The cache is invisible to callers — `get_joint_state()` API is unchanged.

*Data collected 2026-03-15. Absolute timings and ratios are environment-dependent
(CPU, OS, PyBullet version). The qualitative trends — kinematic faster than physics,
cache eliminating per-step PyBullet calls — are expected to hold across environments.
Re-run `benchmark/profiling/arm_joint_update.py` to obtain numbers for your setup.*

---

## See Also

- [Benchmark Suite](benchmark-suite) — How to run benchmarks and reproduce these results
- [Experiment Scripts](experiments) — Collision algorithm comparison scripts
- [Profiling Guide](profiling-guide) — Per-component profiling scripts
- [Optimization Guide](optimization-guide) — Parameter recommendations based on these results