Persian ASR Triple Threat 🔥 Benchmark

A same-test leaderboard for Persian speech recognition. Two real-world sets — VisualEars6669 (6,669-row / 10.49h noisy benchmark with clean, farfield & obstructed conditions) and FLEURS-fa Full (4,341-row public Persian FLEURS) — now scored three ways:

WER word error CER character error S³ ShenavaSanj Score — semantic, DHH-weighted 🔑 Ess Err essential-content error rate

S³ is a Persian-first Deaf/hard-of-hearing-focused semantic metric: it weights every ASR error by how much that word matters for understanding, so dropping a keyword costs far more than a filler. 🔑 Ess Err is the error rate on only the most meaning-critical words (importance ≥ 0.8) — the words a DHH reader can least afford to lose. Avg Fair 🔥 = the Triple Threat, the mean of WER, CER and S³. Fair metrics ignore punctuation, spaces, half-spaces and diacritics, and normalize numbers (spelled-out ↔ digits) so a correctly-heard «صد» isn't penalized against «۱۰۰». Lower is better. 🔒 marks closed paid APIs. S³ is undergoing validation with Persian DHH annotators.

🔥 Best Avg Fair (Triple Threat) 🥇 gemini-3.5-flash · 5.63 mean of WER · CER · S³
🧠 Best S³ (semantic, DHH-weighted) 🥇 gemini-3.5-flash · 7.85 lower = fewer meaning-critical errors

Sortable leaderboard — sorted by Avg Fair (Triple Threat)

Sortable leaderboard — sorted by Avg Fair (Triple Threat)
1
0.115
10.65
12.12
10.06
12.11
16.04
11.64
10.22
1916.5
Closed browser API / Google Web Speech
complete
Google Chrome Web Speech API capture at locale fa-IR using BlackHole virtual audio routing. Full Golden6669 run: original ws6669_* capture overlaid with wsretry78_* retry rows; 78 retry rows, 41 blank-to-text recoveries, 37 genuine blank results remain. Scored with the official Persian normalizer on all 6,669 clips. Online-only browser/cloud recognition; no local weights. Decode: 1×RT (real-time streaming) — finalizes during playback; measured per-clip wall ~7.3s/6669, ~13.6s/FLEURS.