We present an eight-stage single-pass pipeline for natural-language-to-SQL translation on the BIRD benchmark, built around a public-API LLM (openai/gpt-oss-120b) at temperature 0. The system combines database-level value grounding, BM25 few-shot retrieval over the BIRD training set, priority-scored column annotations under a per-table token budget, a structured schema-linking LLM call that jointly emits tables, key columns, JOIN conditions, and filter hints, a weighted Steiner-tree fallback over a schema graph, a rich schema prompt with double-emphasized business evidence, and a cascading self-correction loop. Without fine-tuning or training data beyond BIRD's released materials, the pipeline attains 63.6% execution accuracy on the full 1,534-question BIRD dev set-a +16.8 pp improvement over a strong same-model baseline (46.8%). The submitted system was independently evaluated by the BIRD team on the held-out private test set (1,789 questions), where it attains 66.46% execution accuracy (simple 73.55, moderate 63.60, challenging 48.42), a Soft-F1 of 67.77, and an R-VES of 60.07-placing it #60 on the public BIRD Overall leaderboard under the entry "Elysian-SQL + gpt-oss-120b," ahead of several systems built on substantially larger proprietary models such as GPT-4o.
Publication Date: 2026-06-14