Optimized for complex reasoning and long-context windows up to 200k tokens.
Sub-millisecond inference for simple tasks and conversational UI.