How we chose Gryz's
on-device AI model

A public benchmark of 5 candidate models on real iPhone hardware. Same prompts, same device, same conditions. Then we published everything.

Why on-device matters

Gryz's privacy guarantee is structural: the model runs on your iPhone, the index lives in local storage, and no network call is ever made for inference. This is not a policy promise — it is architecture.

But on-device AI has hard constraints. An iPhone 15 has 6 GB of RAM total. The OS, other apps, and iOS itself consume a significant portion. We have less than 4 GB available for Gryz. Any model that exceeds this fails by definition.

Hard Gates

A model is disqualified if it fails any of these thresholds on an iPhone 15.

First-token latency

< 2,000 ms

First word visible within 2 seconds.

Peak RAM

< 4,000 MB

Must not trigger memory pressure on a 6GB iPhone 15.

SHA-256 match

Verified

Hash must match manifest. File integrity guaranteed.

Zero network calls

Charles Proxy

Absolute zero network activity during inference.

The Candidates

Qwen 2.5 3B

Alibaba

Active

Phi-4-mini

Microsoft

3.8B

Candidate

Llama 3.2 3B

Results

Benchmark in progress

Running on iPhone 16 Pro (A18 Pro)