Benchmark 🧪
History
This page tracks the evolution of benchmark results over time, with the goal of ensuring continuous improvement in overall performance with each new version.
TIP
Want to see a detailed breakdown of the latest results? Go to Latest.
Stable Modules and Providers
Not all modules and providers are stable. Some are in beta or alpha stages, which means they might have breaking changes in future releases, and their benchmark results can vary significantly between versions.
This section focuses on the stable modules and providers:
- Modules: DataExtractor, Comparator, and Scorer.
- Providers: OpenAI, Anthropic, Google, and Deepseek.
Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini |
---|---|---|---|---|---|---|---|---|
v0.26.5 | - | - | 70 ~ 92 (79) | 64 ~ 93 (82,7) | - | 70 ~ 100 (87,7) | - | - |
v0.26.6 | 64 ~ 91 (80,3) | - | 64 ~ 92 (79,7) | 64 ~ 94 (83) | - | 64 ~ 100 (85,7) | - | - |
v0.27.1 | 70 ~ 100 (85) | - | 76 ~ 92 (86,3) | 76 ~ 93 (86,7) | - | 70 ~ 100 (87) | - | - |
v0.28.0 | 64 ~ 85 (75) | - | 64 ~ 95 (81) | 70 ~ 94 (82,7) | - | 58 ~ 92 (78) | - | - |
v0.28.1 | 64 ~ 85 (75) | - | 70 ~ 95 (83) | 64 ~ 94 (78) | - | 52 ~ 94 (71,7) | - | - |
v0.28.2 | 58 ~ 84 (75,3) | - | 76 ~ 93 (84,3) | 52 ~ 91 (70,7) | - | 70 ~ 94 (82,7) | - | - |
v0.29.0 | 70 ~ 84 (79) | - | 70 ~ 95 (83) | 70 ~ 94 (80) | - | 70 ~ 91 (81,7) | - | - |
v0.29.1 | 64 ~ 86 (78) | - | 82 ~ 94 (86,7) | 58 ~ 92 (73) | - | 58 ~ 90 (77,3) | - | - |
v0.30.3 | 74 ~ 96 (84) | 41 ~ 96 (71,3) | 83 ~ 96 (90) | 61 ~ 95 (83,1) | 54 ~ 95 (80,3) | - | 58 ~ 96 (79,7) | 51 ~ 96 (80,4) |
DataExtractor
Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini | gpt-5-nano |
---|---|---|---|---|---|---|---|---|---|---|
v0.26.5 | - | - | 92 ~ 92 (92) | 93 ~ 93 (93) | - | - | 93 ~ 93 (93) | - | - | - |
v0.26.6 | 86 ~ 86 (86) | - | 92 ~ 92 (92) | 94 ~ 94 (94) | - | - | 93 ~ 93 (93) | - | - | - |
v0.27.1 | 85 ~ 85 (85) | - | 92 ~ 92 (92) | 93 ~ 93 (93) | - | - | 91 ~ 91 (91) | - | - | - |
v0.28.0 | 85 ~ 85 (85) | - | 95 ~ 95 (95) | 94 ~ 94 (94) | - | - | 92 ~ 92 (92) | - | - | - |
v0.28.1 | 85 ~ 85 (85) | - | 95 ~ 95 (95) | 94 ~ 94 (94) | - | - | 94 ~ 94 (94) | - | - | - |
v0.28.2 | 84 ~ 84 (84) | - | 93 ~ 93 (93) | 91 ~ 91 (91) | - | - | 94 ~ 94 (94) | - | - | - |
v0.29.0 | 83 ~ 83 (83) | - | 95 ~ 95 (95) | 94 ~ 94 (94) | - | - | 91 ~ 91 (91) | - | - | - |
v0.29.1 | 86 ~ 86 (86) | - | 94 ~ 94 (94) | 92 ~ 92 (92) | - | - | 90 ~ 90 (90) | - | - | - |
v0.30.3 | 84 ~ 85 (84,7) | 77 ~ 77 (77) | 93 ~ 94 (93,7) | 93 ~ 95 (94,3) | 73 ~ 76 (74,3) | 95 ~ 95 (95) | - | 85 ~ 85 (85) | 88 ~ 93 (91,3) | 78 ~ 80 (79) |
Comparator
Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini | gpt-5-nano |
---|---|---|---|---|---|---|---|---|---|---|
v0.26.5 | - | - | 75 ~ 75 (75) | 91 ~ 91 (91) | - | - | 100 ~ 100 (100) | - | - | - |
v0.26.6 | 91 ~ 91 (91) | - | 83 ~ 83 (83) | 91 ~ 91 (91) | - | - | 100 ~ 100 (100) | - | - | - |
v0.27.1 | 100 ~ 100 (100) | - | 91 ~ 91 (91) | 91 ~ 91 (91) | - | - | 100 ~ 100 (100) | - | - | - |
v0.28.0 | 76 ~ 76 (76) | - | 84 ~ 84 (84) | 84 ~ 84 (84) | - | - | 84 ~ 84 (84) | - | - | - |
v0.28.1 | 76 ~ 76 (76) | - | 84 ~ 84 (84) | 76 ~ 76 (76) | - | - | 69 ~ 69 (69) | - | - | - |
v0.28.2 | 84 ~ 84 (84) | - | 84 ~ 84 (84) | 69 ~ 69 (69) | - | - | 84 ~ 84 (84) | - | - | - |
v0.29.0 | 84 ~ 84 (84) | - | 84 ~ 84 (84) | 76 ~ 76 (76) | - | - | 84 ~ 84 (84) | - | - | - |
v0.29.1 | 84 ~ 84 (84) | - | 84 ~ 84 (84) | 69 ~ 69 (69) | - | - | 84 ~ 84 (84) | - | - | - |
v0.30.3 | 92 ~ 96 (93,3) | 96 ~ 96 (96) | 92 ~ 96 (93,3) | 92 ~ 92 (92) | 85 ~ 92 (89,7) | 92 ~ 92 (92) | - | 96 ~ 96 (96) | 92 ~ 96 (94,7) | 92 ~ 100 (96) |
Scorer
Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-4.1-mini | gpt-5 | gpt-5-mini | gpt-5-nano |
---|---|---|---|---|---|---|---|---|---|---|
v0.26.5 | - | - | 70 ~ 70 (70) | 64 ~ 64 (64) | - | - | 70 ~ 70 (70) | - | - | - |
v0.26.6 | 64 ~ 64 (64) | - | 64 ~ 64 (64) | 64 ~ 64 (64) | - | - | 64 ~ 64 (64) | - | - | - |
v0.27.1 | 70 ~ 70 (70) | - | 76 ~ 76 (76) | 76 ~ 76 (76) | - | - | 70 ~ 70 (70) | - | - | - |
v0.28.0 | 64 ~ 64 (64) | - | 64 ~ 64 (64) | 70 ~ 70 (70) | - | - | 58 ~ 58 (58) | - | - | - |
v0.28.1 | 64 ~ 64 (64) | - | 70 ~ 70 (70) | 64 ~ 64 (64) | - | - | 52 ~ 52 (52) | - | - | - |
v0.28.2 | 58 ~ 58 (58) | - | 76 ~ 76 (76) | 52 ~ 52 (52) | - | - | 70 ~ 70 (70) | - | - | - |
v0.29.0 | 70 ~ 70 (70) | - | 70 ~ 70 (70) | 70 ~ 70 (70) | - | - | 70 ~ 70 (70) | - | - | - |
v0.29.1 | 64 ~ 64 (64) | - | 82 ~ 82 (82) | 58 ~ 58 (58) | - | - | 58 ~ 58 (58) | - | - | - |
v0.30.3 | 74 ~ 74 (74) | 41 ~ 41 (41) | 83 ~ 83 (83) | 61 ~ 64 (63) | 70 ~ 77 (73,7) | 54 ~ 54 (54) | - | 58 ~ 58 (58) | 51 ~ 61 (55,3) | 41 ~ 48 (43,3) |
Lister (unstable)
Version | claude-3-5-haiku-20241022 | claude-sonnet-4-20250514 | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gemini-2.5-pro | gpt-5 | gpt-5-mini | gpt-5-nano |
---|---|---|---|---|---|---|---|---|---|
v0.30.3 | 73.0 ~ 75.0 (74.3) | 70.0 ~ 70.0 (70.0) | 14.0 ~ 79.0 (56.0) | 61.0 ~ 63.0 (61.7) | 51.0 ~ 55.0 (52.7) | 52.0 ~ 52.0 (52.0) | 42.0 ~ 42.0 (42.0) | 45.0 ~ 51.0 (48.7) | 22.0 ~ 36.0 (29.0) |
Ranker (unstable)
Version | deepseek-chat | gemini-2.5-flash | gemini-2.5-flash-lite | gpt-4.1-mini | gpt-5-mini | gpt-5-nano |
---|---|---|---|---|---|---|
v0.26.5 | 50 ~ 50 (50) | 50 ~ 50 (50) | - | 100 ~ 100 (100) | - | 50 ~ 100 (66,7) |
v0.26.6 | 100 ~ 100 (100) | 100 ~ 100 (100) | - | 50 ~ 50 (50) | - | 0 ~ 100 (62,5) |
v0.27.1 | 50 ~ 50 (50) | 50 ~ 50 (50) | - | 50 ~ 50 (50) | - | 0 ~ 50 (37,5) |
v0.28.2 | 100 ~ 100 (100) | 100 ~ 100 (100) | - | 50 ~ 50 (50) | - | 0 ~ 100 (62,5) |
v0.29.0 | 100 ~ 100 (100) | 50 ~ 50 (50) | - | 50 ~ 50 (50) | - | 0 ~ 100 (50) |
v0.29.1 | - | 50 ~ 50 (50) | - | 50 ~ 50 (50) | - | 0 ~ 50 (25) |
v0.30.3 | 50 ~ 50 (50) | 50 ~ 100 (66,7) | 0 ~ 50 (33,3) | - | 50 ~ 100 (66,7) | 0 ~ 100 (36,1) |
ps: Logs for versions prior to v0.26.5 are unavailable. I will, however, keep this page updated with every new version.