Benchmark Results Reference

Results are kept up-to-date with each release. For how to reproduce them, see Benchmark Suite and Profiling Guide.


Test Environment

Item

Value

OS

Ubuntu 20.04 (Linux 5.15.0)

CPU

Intel Core i7-1185G7 @ 3.00 GHz (4 cores / 8 logical)

RAM

31 GB

Python

3.8.10

Conditions

Headless (gui=false), physics=false, timestep=0.1, 3 repetitions


Simulation Throughput

Script: make bench-fullbenchmark/run_benchmark.py --sweep 100 500 1000 Config: benchmark/configs/general.yamlcollision_check_frequency=null (every step), 50% agents moving Last measured: 2026-04-08

Agents

Step Time (ms)

RTF

Spawn Time

Memory Delta

100

2.17 ± 0.10

46.1×

26 ms

−24.8 MB

250

6.45 ± 0.30

16×

65 ms

−19.7 MB †

500

13.21 ± 0.21

7.6×

137 ms

−15.3 MB

1000

29.98 ± 0.39

3.3×

285 ms

−3.2 MB

2000

94.82 ± 5.81

1.1×

731 ms

+29.6 MB †

† Row from 2026-03-12; not included in make bench-full (100/500/1000 only).

Source: benchmark/results/benchmark_sweep_10.0s.json

Memory note: Negative delta below ~1000 agents is a Python GC artifact. Actual per-agent overhead is ~15 KB above 1000 agents (linear).

Scalability: O(n^1.3) — near-linear up to ~500 agents, slightly super-linear above.


Step Time Component Breakdown

Script: benchmark/profiling/simulation_profiler.py --test builtin --agents 500 --steps 500

Component

Share

Notes

Agent Update

~81%

Dominant cost; much higher for moving than stationary

Collision Check

~18%

Periodic (bursty); minimal cost on non-check steps

Monitor Update

~1%

Near-zero if monitor disabled

Step Simulation

0%

Physics off; up to ~40% with physics on

Note on variance: Profiler timestamps show high variance (stdev > mean) because collision checks fire in bursts and some steps include PyBullet warmup or GC pauses. The percentages above are representative of steady-state operation.


Agent Update Cost

Script: benchmark/profiling/agent_update.py --agents 500

Stationary vs Moving (500 agents)

State

Total time

Per agent

Stationary

0.29 ms

0.57 μs

Moving

61.27 ms

122.5 μs

Ratio

214×

→ Design decision: Benchmarks use 50% moving agents as a representative workload. Moving agent cost dominates: kinematics, trajectory following, and PyBullet API calls only execute for agents with an active goal.

Motion Mode Comparison (500 agents, all moving)

Mode

Total time

Per agent

Relative

OMNIDIRECTIONAL

9.26 ms

18.5 μs

1× (baseline)

DIFFERENTIAL

26.66 ms

53.3 μs

2.9× slower

→ Optimisation note (2026-03-12): DIFFERENTIAL was previously ~5× slower (90.6 μs/agent). Replacing scipy Slerp with a precomputed quaternion slerp and removing per-tick numpy allocations reduced the ROTATE phase from 37 μs to 5 μs. See Differential Drive Optimization below.

→ Design decision: Benchmarks default to OMNIDIRECTIONAL. DIFFERENTIAL requires heading alignment computation which adds ~2.9× update cost.


Wrapper Layer Overhead

Script: benchmark/profiling/wrapper_overhead.py --n 500 --reps 3 Conditions: 500 objects, process-isolated per layer

Layer

Spawn time

Update time (get+set pose)

Memory (RSS delta)

Direct PyBullet (baseline)

63 ms

2.21 ms

5.2 MB

SimObject

106 ms (+68%)

6.80 ms (+3.1×)

9.2 MB

SimObjectManager

207 ms (+229%)

10.75 ms (+4.9×)

9.3 MB

Agent

237 ms (+276%)

10.60 ms (+4.8×)

11.5 MB

AgentManager

255 ms (+305%)

10.20 ms (+4.6×)

11.6 MB

Extrapolated to 10,000 objects (production feasibility, thresholds: spawn < 10 s, update < 150 ms/step, memory < +200 MB):

Layer

Spawn

Update/step

Memory

Pass?

SimObject

2.1 s

136 ms

+81 MB

✅ all pass

Agent

4.7 s

212 ms

+127 MB

spawn/mem ✅, update ❌

→ Design decision: SimObject layer is production-viable at 10,000 scale. Agent layer exceeds the 150 ms/step update threshold at 10,000 (extrapolated 212 ms), but comfortably passes at 5,000 or below. For ≥10,000 agents, consider using SimObject directly or batching update calls.


Collision Mode Comparison (DISABLED / NORMAL_2D / NORMAL_3D)

Script: benchmark/experiments/collision_mode_comparison.py --agents 500 --iterations 200

Mode

Step time (mean)

Collision time

vs Disabled

DISABLED

1.37 ms

baseline

NORMAL_2D (9 neighbors)

2.43 ms

0.21 ms

+77%

NORMAL_3D (27 neighbors)

2.26 ms

0.39 ms

+64%

Collision check breakdown (both modes): AABB Filtering accounts for ~97–99% of collision check time. Spatial Hashing and Contact Points together are <3%.

→ Design decision: NORMAL_2D yields only modest speedup (7%) over NORMAL_3D at this scale because AABB filtering dominates in both. Use NORMAL_2D for ground robots where Z-axis neighbors are irrelevant for correctness, not for dramatic performance gains.


Collision Algorithm Comparison (Spatial Hashing vs Alternatives)

Script: benchmark/experiments/collision_method_comparison.py --agents 100,500 --iterations 100 Conditions: spacing=0.08m (dense), kinematic objects (mass=0)

Method

100 agents

500 agents

Collisions detected

Valid?

Spatial Hashing (current)

8.0 ms

92.1 ms

✅ correct

Brute Force AABB

8.7 ms

231.9 ms (2.5×)

✅ correct

getClosestPoints all-pairs

23.0 ms

370.6 ms (4.0×)

✅ correct

getContactPoints() no-args

0.06 ms

0.24 ms

❌ 0 — invalid

getContactPoints(A,B) pairwise

12.1 ms

190.3 ms

❌ 0 — invalid

Key finding: getContactPoints variants return 0 collisions for kinematic objects (mass=0). PyBullet’s contact point solver only activates between physics-enabled bodies. Spatial hashing with getClosestPoints is the only valid fast method for kinematic AGV-type robots.

→ Design decision: Spatial hashing (O(N) average) is used because:

  1. It is the only algorithmically valid method for kinematic robots.

  2. It is 2.5–4× faster than the valid alternatives at 500 agents.


Differential Drive Optimization

Script: benchmark/profiling/differential_breakdown.py Conditions: 1000 agents, DIFFERENTIAL mode (ROTATE phase)

The differential drive ROTATE phase was the single most expensive per-agent per-tick operation. Profiling identified two root causes:

Operation

Before

After

Speedup

Quaternion slerp (scipy Slerp → _quat_slerp)

26.8 μs

2.3 μs

11.8×

Scalar clip (np.clipmax/min)

6.8 μs

0.15 μs

44×

Dead np.array() allocations (P0)

~0.6 μs

0 μs

eliminated

Redundant np.dot in slerp (P1)

~0.3 μs

0 μs

eliminated

Velocity array reallocation (P2)

~0.3 μs

0 μs

in-place

FORWARD direction recompute (P3)

~0.5 μs

0 μs

cached

ROTATE phase total

36.1 μs

4.9 μs

7.3×

End-to-end impact (1000 agents, differential):

Metric

Before

After

Improvement

Per-agent update

57 μs

12.4 μs

4.6× faster

DIFFERENTIAL / OMNIDIRECTIONAL ratio

5.0×

2.9×

Closer to parity

500-agent ROTATE frame time

18.0 ms

2.5 ms

−15.6 ms/frame

Why was scipy Slerp slow?

scipy.spatial.transform.Slerp is a generic array-oriented API designed for batch interpolation (e.g., slerp(np.linspace(0, 1, 10000))). When called with a single scalar t per agent per tick, the fixed overhead — input normalisation, dtype promotion, Rotation object wrapping — dominates the actual trigonometry. The replacement _quat_slerp sidesteps this by precomputing dot/theta/sin(theta) once per waypoint transition and using math.sin/math.cos (C-level scalar functions) instead of numpy array dispatch.

Absolute timings vary by environment; the ratios above are the meaningful comparison metric.


Collision Config-Based Comparison (Physics ON vs OFF)

Script: benchmark/experiments/collision_methods_config_based.py Conditions: 100 objects, 500 steps

Config

Physics

Method

Step time

Collision time

physics_off_closest.yaml

OFF

CLOSEST_POINTS

1.51 ms

1.24 ms

physics_on_contact.yaml

ON

CONTACT_POINTS

2.64 ms

1.41 ms

hybrid.yaml

ON

HYBRID

2.00 ms

1.27 ms

Physics ON adds ~75% step overhead due to the physics engine stepping each simulation step.

→ Design decision: Default is physics=false (kinematics mode) for maximum throughput. Use physics=true only when rigid-body dynamics are required.

Source: benchmark/results/collision_methods_config_based.txt


Arm Joint Control Performance

Script: benchmark/profiling/arm_joint_update.py --test scaling Conditions: arm_robot.urdf (4 revolute joints), fixed-base, JointAction cycling, 100 steps per count

Physics vs Kinematic Scaling

Arms

Joints

Physics (ms/step)

Kinematic (ms/step)

Ratio

1

4

0.029

0.010

2.8×

5

20

0.082

0.056

1.5×

10

40

0.152

0.090

1.7×

25

100

0.415

0.253

1.6×

50

200

0.886

0.552

1.6×

Kinematic mode is consistently faster than physics mode for joint control. The gap comes from skipping stepSimulation() — kinematic mode uses resetJointState() with per-step interpolation at URDF velocity limits. Exact ratios are environment-dependent; the trend (kinematic faster) is consistent.

Component Breakdown (10 arms)

Script: benchmark/profiling/arm_joint_update.py --test builtin --arms 10

Component

Physics

Kinematic

agent_update

27.0% (0.050 ms)

96.7% (0.087 ms)

step_simulation

69.6% (0.129 ms)

0.2% (0.000 ms)

callbacks

1.0%

0.4%

total

0.186 ms

0.090 ms

In physics mode, stepSimulation() dominates (70%). In kinematic mode, the physics engine is bypassed entirely — agent_update (joint interpolation + resetJointState) is the sole cost.

Kinematic Joint Cache Optimization

Problem: cProfile showed p.getJointState() consuming ~36% of kinematic update time (called per-joint per-step to read current positions before interpolating).

Solution: _kinematic_joint_positions cache — joint positions stored in a Python dict, initialized via batch p.getJointStates() at agent creation, updated after each resetJointState(). get_joint_state() returns cached values for kinematic robots (zero PyBullet calls).

Metric

Before cache

After cache

Improvement

50 arms step time

0.826 ms

0.552 ms

1.5× faster

p.getJointState calls/step

200 (50 arms × 4 joints)

0

eliminated

→ Design decision: Kinematic joint cache is always active for mass=0.0 URDF robots. The cache is invisible to callers — get_joint_state() API is unchanged.

Data collected 2026-03-15. Absolute timings and ratios are environment-dependent (CPU, OS, PyBullet version). The qualitative trends — kinematic faster than physics, cache eliminating per-step PyBullet calls — are expected to hold across environments. Re-run benchmark/profiling/arm_joint_update.py to obtain numbers for your setup.


See Also