Gallery

Ask your database in English, locally

Plain-English questions over a local SQLite database. Two implementation variants.

Apple Silicon (M1, 32 GB) · Python 3.13 + FastAPI · SQLite · LM Studio / Ollama / llama.cpp

A local-only consulting demo that shows Australian businesses they can run an AI assistant against their own data with zero data leaving the machine. We demonstrate two variants of the same underlying idea: using a locally run LLM to query a local relational database with natural language. Each version has different performance and accuracy characteristics.

Notes

This is a demo, not a product. It is not designed for production use, and it has not been tested for security, robustness, or edge cases. It is a sketch of what is possible with current technology, and a starting point for discussion about how to build real-world applications that use LLMs to query databases.

We use Qwen 2.5-14B-Instruct at Q4 quantisation for the LLM, served by llama-server on the same machine as the database. This model is particulary suidted to the task because of its strong instruction-following capabilities and good performance at writing SQL. In addition, it allows a wide context window that can accommodate both the user's question and the returned rows for post-processing. The database is a SQLite file seeded with fictional Queensland property-management data.

1. Direct SQL, raw data response

The simplest path. The LLM reads the question, emits one or more SQL statements, the app runs them read-only against SQLite, and the rows are rendered in a table. The user reads the table.

This version is fast — one LLM call in the whole loop. The cost is that raw rows are not always an answer. For some questions the table is exactly what was asked for; for others the user has to do the interpretation themselves.

Architecture diagram for the direct-SQL scenario. The App sits at the top centre. Below the App, on the left, is SQLite; on the right, llama-server with Qwen2.5 nested inside. Arrows show the App sending the user's text question to llama-server and receiving SQL back, then the App sending that SQL to SQLite and receiving rows.
The App brokers everything. The LLM writes SQL; the database speaks only to the App.

In practice it looks like this. The model is Qwen2.5-14B-Instruct at Q4 quantisation, served by llama-server on the same machine; the database is a SQLite file seeded with fictional Queensland property-management data.

Two queries against the demo. The first asks for a random property with a tenant and animal — a multi-table join — and returns a row for property 299, 35 Dornoch Tce, Fortitude Valley, with tenant Matilda Bailey and a dog, in 12.9 seconds. The second asks for a property whose weekly rent is between $800 and $900 and returns property 874, 98 Gregory Tce, Kelvin Grove, $820, in 3.9 seconds. A banner at the top reads: All data stays on this machine. No external API calls.
Two queries from the demo. Latency varies with query complexity: 12.9 s for a multi-table join (tenant + pet), 3.9 s for a simple range filter.

Result-set size barely changes things. A query that returns 100 rows comes back in 5.3 s — only marginally slower than the single-row queries above, because the LLM's job (writing one SQL statement) is the same regardless of how many rows it ends up matching.

The query 'print 100 random properties which have no tenant' followed by the start of a long table — properties 504, 156, 600, 855, 129, 216, and others — with id, address, suburb, type, bedrooms, bathrooms, and weekly rent columns.
The tail of the same 100-row table, ending at properties 318, 789, 92, 445, 54, 201, 100, and 196. Below the table, the elapsed time reads 5.3 s.
Top and tail of the same 100-row response. 5.3 s end-to-end. The LLM writes one SQL statement; SQLite does the rest.

2. Tool-calling, no SQL written

The LLM is not asked to write SQL at all. Instead it is given a small set of typed tools, exposed by a layer the demo calls the Domain Tool Engine — functions that wrap the database with named parameters, validation, and a constrained surface. The model decides which tool to call and with what arguments; the engine runs the actual query and returns structured results.

This shifts the contract. The model no longer has to get SQL syntactically and semantically right; it only has to pick a tool and fill in its arguments. In practice this is the most accurate of the three — the engine's schema does the work the model would otherwise have to do, and the surface area for hallucinated table or column names collapses.

Architecture diagram for the tool-calling scenario. User on top; the Coastal Property Management App in the middle; llama-server with Qwen2.5 below-left; below-right, a Domain Tool Engine that sits between the App and SQLite. The App sends the question and available tools to llama-server and gets back a tool call with parameters; the App invokes the engine, which is the only component that talks to SQLite.
The Domain Tool Engine sits between the App and SQLite. The LLM produces tool calls; the engine is the only thing that writes SQL.

The downsides of this approach become immediately visible. Consider this example:

The same query 'Give me details of a random property with a tenant and animal' answered by the Domain Tool Engine in 1.5 seconds. The result is a single row for property 859, 181 Wynnum Rd, Fortitude Valley, QLD, with a tenant name of Mackenzie Morgan, but with no animal or pet column in the response.
Same question as before. 1.5 s — but the answer drops the “animal” half of the question.

What we gained in accuracy we lose in flexibility. The engine selects the tool that is most suitable to the task, and simultaneously has to make compromises — here, a tool that returns property and tenant data, with no slot for the pet, so the pet quietly drops out of the answer.

So we try again, this time being a little more explicit:

A second attempt at the query: 'Give me details of a random property with a tenant and animal including the pet name.' The model now returns three sections — Properties (property 859 at 181 Wynnum Rd, Fortitude Valley), Tenants (Mackenzie Morgan, with email, phone, and lease dates), and the start of a Leases section.
The tail of the same response. A 'Pet Records' section reads 'No results found.' followed by the start of an 'Emergency Contacts' section.
2.2 s. The model invokes a separate pet-records tool — but selects a property that has no pet, so the lookup returns nothing.

This time the model attempts to get information about the pet by invoking a different function, but it omits to set the pet constraint — so a property is returned that does not actually have a pet. This immediately exposes a new limitation: the model is not trained for the specific tool-calling required for our database. It can pick tools, but it does not know that “with an animal” should restrict the property selection to ones with a pet record on file. The constraint is a domain rule, and the off-the-shelf model has no way to know it.

Where each one fits

VersionLatencyScales with result sizeAccuracyQuestion shape
Direct SQL, raw rowsFastYes (5.3 s for 100 rows)Depends on the SQLAnything the schema can answer
Tool-callingFastest on small queries (1.5 s)YesHighest, by constructionOnly what the engine has tools for

Summary

Two viable patterns came out of the demo. They are not equivalent, and neither is shippable on its own.

Direct SQL, raw rows trades narration for flexibility. The LLM only has to write a single SQL statement, which keeps it fast at any result-set size — 100 rows came back in 5.3 s — and lets the user ask anything the schema can answer. The cost is that the result is a table; turning that into a sentence is an exercise for the reader.

Tool-calling via the Domain Tool Engine trades flexibility for accuracy. Hallucinated columns are impossible by construction, and small queries return in 1–2 s. The cost is that the model can only ask the questions the engine has tools for, and the questions it asks are bounded by what the model already knows about the schema — which is, currently, nothing specific to our database.

Where this would go next

  • Agentic loops. Both patterns above are single-shot: the LLM gets one chance to write the SQL, or one chance to pick a tool. An agentic version — multiple LLM calls, the model checking its own work and retrying when the result does not match the question — would address both kinds of failure on the page. In direct SQL it would catch a query that returned nothing useful and revise it; in tool-calling it would chain tools correctly, picking a property that has a pet before looking up the pet record.
  • More capable models, in a sovereign region. Agentic loops mean more LLM calls per question, and that is the cost. A 14B model running locally on an M1 starts to feel slow under multi-step reasoning — we ran experiments that confirmed this; they are not shown here. The natural next step is a more capable model hosted in a region-locked cloud environment: an Australian-resident hyperscaler region, or a sovereign AI provider. The “no data leaves the machine” purity goes; the “no data leaves the country” property is preserved, and the model has the headroom to plan, retry, and chain tool calls without timing out.