Persian ASR Triple Threat 🔥 Benchmark
A same-test leaderboard for Persian speech recognition. Two real-world sets — VisualEars6669 (6,669-row / 10.49h noisy benchmark with clean, farfield & obstructed conditions) and FLEURS-fa Full (4,341-row public Persian FLEURS) — now scored three ways:
S³ is a Persian-first Deaf/hard-of-hearing-focused semantic metric: it weights every ASR error by how much that word matters for understanding, so dropping a keyword costs far more than a filler. 🔑 Ess Err is the error rate on only the most meaning-critical words (importance ≥ 0.8) — the words a DHH reader can least afford to lose. Avg Fair 🔥 = the Triple Threat, the mean of WER, CER and S³. Fair metrics ignore punctuation, spaces, half-spaces and diacritics, and normalize numbers (spelled-out ↔ digits) so a correctly-heard «صد» isn't penalized against «۱۰۰». Lower is better. 🔒 marks closed paid APIs. S³ is undergoing validation with Persian DHH annotators.
Sortable leaderboard — sorted by Avg Fair (Triple Threat)
| 1 | 0.115 | 10.65 | 12.12 | 10.06 | 12.11 | 16.04 | 11.64 | 10.22 | 1916.5 | Closed browser API / Google Web Speech | complete | Google Chrome Web Speech API capture at locale fa-IR using BlackHole virtual audio routing. Full Golden6669 run: original ws6669_* capture overlaid with wsretry78_* retry rows; 78 retry rows, 41 blank-to-text recoveries, 37 genuine blank results remain. Scored with the official Persian normalizer on all 6,669 clips. Online-only browser/cloud recognition; no local weights. Decode: 1×RT (real-time streaming) — finalizes during playback; measured per-clip wall ~7.3s/6669, ~13.6s/FLEURS. |