Benchmark 🧪
History
This page tracks the evolution of benchmark results over time, with the goal of ensuring continuous improvement in overall performance with each new version.
TIP
Want to see a detailed breakdown of the latest results? Go to Latest.
Stable Modules and Providers
Not all modules and providers are stable. Some are in beta or alpha stages, which means they might have breaking changes in future releases, and their benchmark results can vary significantly between versions.
This section focuses on the stable modules and providers:
- Modules: DataExtractor, Comparator, and Scorer.
- Providers: OpenAI, Anthropic, Google, and Deepseek.

| Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini |
|---|---|---|---|---|---|---|---|---|
| v0.26.5 | - | - | 70 ~ 92 (79) | 64 ~ 93 (82,7) | - | 70 ~ 100 (87,7) | - | - |
| v0.26.6 | 64 ~ 91 (80,3) | - | 64 ~ 92 (79,7) | 64 ~ 94 (83) | - | 64 ~ 100 (85,7) | - | - |
| v0.27.1 | 70 ~ 100 (85) | - | 76 ~ 92 (86,3) | 76 ~ 93 (86,7) | - | 70 ~ 100 (87) | - | - |
| v0.28.0 | 64 ~ 85 (75) | - | 64 ~ 95 (81) | 70 ~ 94 (82,7) | - | 58 ~ 92 (78) | - | - |
| v0.28.1 | 64 ~ 85 (75) | - | 70 ~ 95 (83) | 64 ~ 94 (78) | - | 52 ~ 94 (71,7) | - | - |
| v0.28.2 | 58 ~ 84 (75,3) | - | 76 ~ 93 (84,3) | 52 ~ 91 (70,7) | - | 70 ~ 94 (82,7) | - | - |
| v0.29.0 | 70 ~ 84 (79) | - | 70 ~ 95 (83) | 70 ~ 94 (80) | - | 70 ~ 91 (81,7) | - | - |
| v0.29.1 | 64 ~ 86 (78) | - | 82 ~ 94 (86,7) | 58 ~ 92 (73) | - | 58 ~ 90 (77,3) | - | - |
| v0.30.3 | 74 ~ 96 (84) | 41 ~ 96 (71,3) | 83 ~ 96 (90) | 61 ~ 95 (83,1) | 54 ~ 95 (80,3) | - | 58 ~ 96 (79,7) | 51 ~ 96 (80,4) |
DataExtractor

| Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini | gpt-5-nano |
|---|---|---|---|---|---|---|---|---|---|---|
| v0.26.5 | - | - | 92 ~ 92 (92) | 93 ~ 93 (93) | - | - | 93 ~ 93 (93) | - | - | - |
| v0.26.6 | 86 ~ 86 (86) | - | 92 ~ 92 (92) | 94 ~ 94 (94) | - | - | 93 ~ 93 (93) | - | - | - |
| v0.27.1 | 85 ~ 85 (85) | - | 92 ~ 92 (92) | 93 ~ 93 (93) | - | - | 91 ~ 91 (91) | - | - | - |
| v0.28.0 | 85 ~ 85 (85) | - | 95 ~ 95 (95) | 94 ~ 94 (94) | - | - | 92 ~ 92 (92) | - | - | - |
| v0.28.1 | 85 ~ 85 (85) | - | 95 ~ 95 (95) | 94 ~ 94 (94) | - | - | 94 ~ 94 (94) | - | - | - |
| v0.28.2 | 84 ~ 84 (84) | - | 93 ~ 93 (93) | 91 ~ 91 (91) | - | - | 94 ~ 94 (94) | - | - | - |
| v0.29.0 | 83 ~ 83 (83) | - | 95 ~ 95 (95) | 94 ~ 94 (94) | - | - | 91 ~ 91 (91) | - | - | - |
| v0.29.1 | 86 ~ 86 (86) | - | 94 ~ 94 (94) | 92 ~ 92 (92) | - | - | 90 ~ 90 (90) | - | - | - |
| v0.30.3 | 84 ~ 85 (84,7) | 77 ~ 77 (77) | 93 ~ 94 (93,7) | 93 ~ 95 (94,3) | 73 ~ 76 (74,3) | 95 ~ 95 (95) | - | 85 ~ 85 (85) | 88 ~ 93 (91,3) | 78 ~ 80 (79) |
Comparator

| Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini | gpt-5-nano |
|---|---|---|---|---|---|---|---|---|---|---|
| v0.26.5 | - | - | 75 ~ 75 (75) | 91 ~ 91 (91) | - | - | 100 ~ 100 (100) | - | - | - |
| v0.26.6 | 91 ~ 91 (91) | - | 83 ~ 83 (83) | 91 ~ 91 (91) | - | - | 100 ~ 100 (100) | - | - | - |
| v0.27.1 | 100 ~ 100 (100) | - | 91 ~ 91 (91) | 91 ~ 91 (91) | - | - | 100 ~ 100 (100) | - | - | - |
| v0.28.0 | 76 ~ 76 (76) | - | 84 ~ 84 (84) | 84 ~ 84 (84) | - | - | 84 ~ 84 (84) | - | - | - |
| v0.28.1 | 76 ~ 76 (76) | - | 84 ~ 84 (84) | 76 ~ 76 (76) | - | - | 69 ~ 69 (69) | - | - | - |
| v0.28.2 | 84 ~ 84 (84) | - | 84 ~ 84 (84) | 69 ~ 69 (69) | - | - | 84 ~ 84 (84) | - | - | - |
| v0.29.0 | 84 ~ 84 (84) | - | 84 ~ 84 (84) | 76 ~ 76 (76) | - | - | 84 ~ 84 (84) | - | - | - |
| v0.29.1 | 84 ~ 84 (84) | - | 84 ~ 84 (84) | 69 ~ 69 (69) | - | - | 84 ~ 84 (84) | - | - | - |
| v0.30.3 | 92 ~ 96 (93,3) | 96 ~ 96 (96) | 92 ~ 96 (93,3) | 92 ~ 92 (92) | 85 ~ 92 (89,7) | 92 ~ 92 (92) | - | 96 ~ 96 (96) | 92 ~ 96 (94,7) | 92 ~ 100 (96) |
Scorer

| Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini | gpt-5-nano |
|---|---|---|---|---|---|---|---|---|---|---|
| v0.26.5 | - | - | 70 ~ 70 (70) | 64 ~ 64 (64) | - | - | 70 ~ 70 (70) | - | - | - |
| v0.26.6 | 64 ~ 64 (64) | - | 64 ~ 64 (64) | 64 ~ 64 (64) | - | - | 64 ~ 64 (64) | - | - | - |
| v0.27.1 | 70 ~ 70 (70) | - | 76 ~ 76 (76) | 76 ~ 76 (76) | - | - | 70 ~ 70 (70) | - | - | - |
| v0.28.0 | 64 ~ 64 (64) | - | 64 ~ 64 (64) | 70 ~ 70 (70) | - | - | 58 ~ 58 (58) | - | - | - |
| v0.28.1 | 64 ~ 64 (64) | - | 70 ~ 70 (70) | 64 ~ 64 (64) | - | - | 52 ~ 52 (52) | - | - | - |
| v0.28.2 | 58 ~ 58 (58) | - | 76 ~ 76 (76) | 52 ~ 52 (52) | - | - | 70 ~ 70 (70) | - | - | - |
| v0.29.0 | 70 ~ 70 (70) | - | 70 ~ 70 (70) | 70 ~ 70 (70) | - | - | 70 ~ 70 (70) | - | - | - |
| v0.29.1 | 64 ~ 64 (64) | - | 82 ~ 82 (82) | 58 ~ 58 (58) | - | - | 58 ~ 58 (58) | - | - | - |
| v0.30.3 | 74 ~ 74 (74) | 41 ~ 41 (41) | 83 ~ 83 (83) | 61 ~ 64 (63) | 70 ~ 77 (73,7) | 54 ~ 54 (54) | - | 58 ~ 58 (58) | 51 ~ 61 (55,3) | 41 ~ 48 (43,3) |
Lister (unstable)
| Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-5 | gpt-5-mini | gpt-5-nano |
|---|---|---|---|---|---|---|---|---|---|
| v0.30.3 | 73.0 ~ 75.0 (74.3) | 70.0 ~ 70.0 (70.0) | 14.0 ~ 79.0 (56.0) | 61.0 ~ 63.0 (61.7) | 51.0 ~ 55.0 (52.7) | 52.0 ~ 52.0 (52.0) | 42.0 ~ 42.0 (42.0) | 45.0 ~ 51.0 (48.7) | 22.0 ~ 36.0 (29.0) |
Ranker (unstable)

| Version | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gpt-4.1-mini | gpt-5-mini | gpt-5-nano |
|---|---|---|---|---|---|---|
| v0.26.5 | 50 ~ 50 (50) | 50 ~ 50 (50) | - | 100 ~ 100 (100) | - | 50 ~ 100 (66,7) |
| v0.26.6 | 100 ~ 100 (100) | 100 ~ 100 (100) | - | 50 ~ 50 (50) | - | 0 ~ 100 (62,5) |
| v0.27.1 | 50 ~ 50 (50) | 50 ~ 50 (50) | - | 50 ~ 50 (50) | - | 0 ~ 50 (37,5) |
| v0.28.2 | 100 ~ 100 (100) | 100 ~ 100 (100) | - | 50 ~ 50 (50) | - | 0 ~ 100 (62,5) |
| v0.29.0 | 100 ~ 100 (100) | 50 ~ 50 (50) | - | 50 ~ 50 (50) | - | 0 ~ 100 (50) |
| v0.29.1 | - | 50 ~ 50 (50) | - | 50 ~ 50 (50) | - | 0 ~ 50 (25) |
| v0.30.3 | 50 ~ 50 (50) | 50 ~ 100 (66,7) | 0 ~ 50 (33,3) | - | 50 ~ 100 (66,7) | 0 ~ 100 (36,1) |
ps: Logs for versions prior to v0.26.5 are unavailable. I will, however, keep this page updated with every new version.
