Persian ASR Triple Threat 🔥 Benchmark

A same-test leaderboard for Persian speech recognition. Two real-world sets — VisualEars6669 (6,669-row / 10.49h noisy benchmark with clean, farfield & obstructed conditions) and FLEURS-fa Full (4,341-row public Persian FLEURS) — now scored three ways:

WER word error CER character error S³ ShenavaSanj Score — semantic, DHH-weighted 🔑 Ess Err essential-content error rate

S³ is a Persian-first Deaf/hard-of-hearing-focused semantic metric: it weights every ASR error by how much that word matters for understanding, so dropping a keyword costs far more than a filler. 🔑 Ess Err is the error rate on only the most meaning-critical words (importance ≥ 0.8) — the words a DHH reader can least afford to lose. Avg Fair 🔥 = the Triple Threat, the mean of WER, CER and S³. Fair metrics ignore punctuation, spaces, half-spaces and diacritics, and normalize numbers (spelled-out ↔ digits) so a correctly-heard «صد» isn't penalized against «۱۰۰». Lower is better. 🔒 marks closed paid APIs. S³ is undergoing validation with Persian DHH annotators.


1	🥇 🔒 gemini-3.5-flash	null	5.63	7.85	7.59	6.38	1.25	7.52	2.95	1916.5	Closed paid API / LLM audio	complete	Closed paid OpenRouter benchmark via chat/completions audio input. Golden6669: 6,669 rows / 10.49h, reported OpenRouter ...
2	🥈 🔒 Chrome Web Speech API (fa-IR)	null	6.74	9.08	10.06	7.02	1.7	9.04	4.52	1×RT	Closed browser API / Google Web Speech	complete	Google Chrome Web Speech API capture at locale fa-IR using BlackHole virtual audio routing. Full Golden6669 run: origina...
3	🥉 Shenava Koochik v1.0 (114M)	0.115	8.09	12.12	16.39	7.21	1.6	11.64	3.87	10.7	Shenava (FastConformer streaming)	complete	On-device FastConformer-CTC, ve_tok_v4. Decoded CTC @[70,13]; spoken-form + ITN. fp16 ONNX/CoreML on-device variants ~pa...
4	Vosk	0.495	8.76	11.97	11.8	11	2.51	11.25	3.88	754.1	Vosk / Kaldi	complete	Vosk Persian model local CPU decode on Reza2kn/fleurs-fa-benchmark full 4,341-row FLEURS-fa set. Text column: raw_transc...
5	Shenava Rizeh v1.0 (32M)	0.032	10.65	14.16	22.49	12.11	3.94	14.45	5.1	5.8	Shenava (FastConformer streaming)	complete	32M Hybrid RNNT/CTC distilled from Koochik 114M (logit+feature KD). CTC head deployed @[70,13]. fp16 ONNX/CoreML variant...
6	nezamisafa/whisper-persian-v4	1.543	12.35	17.73	19.91	16.71	4.91	12.7	4.32	390.5	Whisper Large v3	complete	Local Stallion Transformers Whisper decode on Reza2kn/fleurs-fa-benchmark full 4,341-row FLEURS-fa set. Text column: raw...
7	vhdm/whisper-large-fa-v1	0.809	13.12	19.07	22.3	16.94	4.75	14.07	4.84	131.8	Whisper Large v3 Turbo	complete	Local Stallion Transformers Whisper decode on Reza2kn/fleurs-fa-benchmark full 4,341-row FLEURS-fa set. Text column: raw...
8	Shenava Rizeh Pizeh v1.0 (6.9M)	0.007	21.43	28.99	42.75	24.55	8.89	26.95	10.22	4.2	Shenava (FastConformer streaming)	complete	6.9M CTC distilled 2 hops from Koochik. Real-time on a 2015 Cortex-A7 (RTF~0.91 fp32, tract). fp16 ONNX/CoreML + fp32 tr...
9	nvidia/stt_fa_fastconformer_hybrid_large	0.115	35.06	44.61	51.27	37.05	16.04	43.05	25	10.0	NVIDIA FastConformer Hybrid	complete	Public NVIDIA Persian FastConformer Hybrid Large native NeMo decode on Reza2kn/fleurs-fa-benchmark full 4,341-row FLEURS...
10	Qwen/Qwen3-ASR-0.6B	0.782	46.4	60.84	65.1	60.87	26.49	48.18	21.15	294.2	Qwen3 ASR	complete	Qwen3-ASR 0.6B local Stallion decode on Reza2kn/fleurs-fa-benchmark full 4,341-row FLEURS-fa set. Text column: raw_trans...
11	🔒 google/chirp-3	null	null	null	null	null	null	null	null	1590.1	Closed paid API / STT	complete	Closed paid OpenRouter benchmark on VisualEars269 via audio/transcriptions. Includes one upstream timeout counted as bla...
12	🔒 microsoft/mai-transcribe-1.5	null	null	null	null	null	null	null	null	588.0	Closed paid API / STT	complete	Closed paid OpenRouter benchmark on VisualEars269 via audio/transcriptions. Reported OpenRouter cost: $0.10160000. No FL...
13	🔒 openai/gpt-audio-mini	null	null	null	null	null	null	null	null	847.7	Closed paid API / LLM audio	complete	Closed paid OpenRouter benchmark on VisualEars269 via chat/completions audio input. Two benchmark rows contain effective...