Qwen3 Reranker 4B GGUF
JSON →A GGUF quantized version of the Qwen3 4B reranker model for efficient local inference via llama.cpp.
Specs
context window 41K tokens
max output 41K tokens
Capabilities
streaming
Dates
releasedApr 2025
A GGUF quantized version of the Qwen3 4B reranker model for efficient local inference via llama.cpp.