Querying Federation of Malaya Budget from 1959
Some time back REFSA and the Japan Society for the Promotion of Science donated scanned budgets and financial statements, some from all the way back to 1957 to Sinar Project.
You can browse or search for them at the Malaysian Government Documents Archive, maintained by yours truly.
I did some some cleanups for some of the documents, including OCR about 10 years ago.
But other than searching for text, it would have been quite tedious to extract the tables manually with technology back then.
Let's try see how well we can query this document when fed into a VLM.

A lot of tables in old documents were formatted for human visual clarity and not for machines. So the structure of the tables, often have merged cells or labels that don't convert cleanly into flat machine readable formats like CSVs.
The trick with VLMs is to try extract the information in a nested data format like JSON.
Extract the figures of this Malaysian government statement of assets and
liabilities document as JSON
You'll get something like this, and from here you can already get an idea on how well the VLM is parsing the tables.
{
"consolidated_revenue_account": {
"balance_31_12_58": 82087297.59,
"add_appropriations_written_back": 17150000.00,
"surplus_1959": {
"revenue_for_year": 890972163.39,
"expenditure_for_year": 838092333.35,
"appreciation_of_investments": 3699676.09
},
"balance_31_12_59": 155816803.72
},
"consolidated_loan_account": {
"premium_bonds": 1139450.00,
"treasury_deposits": 104400000.00,
"treasury_bills": 93395804.21,
"total_liabilities": 198935254.21
},
... <snip> ...
Let's ask some questions to see how accurately it answers it:
Based on the provided "STATEMENT OF ASSETS AND LIABILITIES", the balance of the Consolidated Revenue Account at the end of 1959 (31-12-59) is $155,816,803.72.
This figure is explicitly stated in the document under the "CONSOLIDATED REVENUE ACCOUNT" section.

Checks out.
The next one is more impressive. The Consolidated Trust Account table is not so straightforward, because there is no header for the last column for the subtotals. And some of the row labels are on two lines, so there is a blank row above it.

Under the Consolidated Trusts Account, how much funds are left for Advances, Government Officers Accounts?
Based on the "STATEMENT OF ASSETS AND LIABILITIES", under the CONSOLIDATED TRUST ACCOUNT section, the final balance for the Advances, Government Officers Accounts at the end of 1959 is $2,536,147.50.
This is found in the "STATUTORY FUNDS" subsection, in the final column labeled "final_balance" for that specific account. The calculation is as follows:
- Starting Balance (31-12-58): $489,141.75
- Appropriations (1959): $2,600,000.00
- Disbursements (Net 1959): $552,994.25
The final balance is calculated as: $489,141.75 + $2,600,000.00 - $552,994.25 = $2,536,147.50
This matches the value provided in the table.
Got it right. The full JSON output was also correct.
This is on just consumer level hardware with Qwen3-VL 8B model, and it takes about 1.5 seconds for each answer.
With small modern VLM models, we can now extract and query structured data from a lot of old documents easily, and it's quite accessible.
My setup:
- Radeon 9070XT 16GB VRAM
- llama.cpp
- Qwen3-VL-8B-Instruct (FP8)
Document Actions