branch: externals/minuet commit 6959bfa64717b93d24ce792651f5d967c490a14e Author: Milan Glacier <d...@milanglacier.com> Commit: Milan Glacier <d...@milanglacier.com>
doc: update doc to prefer `gemini-2.0-flash` over `gemini-2.5-flash`. --- README.md | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index cb4332b125..81b9f961cc 100644 --- a/README.md +++ b/README.md @@ -61,8 +61,8 @@ as dancers move during a minuet. **With minibuffer frontend**: -Note: Previewing insertion results within the buffer requires the -`consult` package. +Note: Previewing insertion results within the buffer requires the `consult` +package.  @@ -289,16 +289,28 @@ significantly slow down the default provider used by Minuet (`openai-fim-compatible` with deepseek). We recommend trying alternative providers instead. +For Gemini model users: + +<details> + +We recommend using `gemini-2.0-flash` over `gemini-2.5-flash`, as the 2.0 +version offers significantly lower costs with comparable performance. The +primary improvement in version 2.5 lies in its extended thinking mode, which +provides minimal value for code completion scenarios. Furthermore, the thinking +mode substantially increases latency, so we recommend disabling it entirely. + +</details> + ## Understanding Model Speed For cloud-based providers, -[Openrouter](https://openrouter.ai/google/gemini-2.0-flash-001/providers) -offers a valuable resource for comparing the speed of both closed-source and +[Openrouter](https://openrouter.ai/google/gemini-2.0-flash-001/providers) offers +a valuable resource for comparing the speed of both closed-source and open-source models hosted by various cloud inference providers. When assessing model speed, two key metrics are latency (time to first token) -and throughput (tokens per second). Latency is often a more critical factor -than throughput. +and throughput (tokens per second). Latency is often a more critical factor than +throughput. Ideally, one would aim for a latency of less than 1 second and a throughput exceeding 100 tokens per second. @@ -561,6 +573,12 @@ settings following the example: :threshold "BLOCK_NONE")]) ``` +We recommend using `gemini-2.0-flash` over `gemini-2.5-flash`, as the 2.0 +version offers significantly lower costs with comparable performance. The +primary improvement in version 2.5 lies in its extended thinking mode, which +provides minimal value for code completion scenarios. Furthermore, the thinking +mode substantially increases latency, so we recommend disabling it entirely. + </details> ## OpenAI-compatible