Filed - Introducing AI tax prep

Intro to Filed

Filed is an AI tool for tax preparers in US CPA firms. A tax preparer using Filed can do tax returns up to 9x faster than without Filed.

A typical medium complexity tax return that used to take 2 to 3 hours to prepare can now be done in 20 to 30 minutes with Filed

We do this by preparing the first draft of the tax return in the tax software that the tax preparer uses and provide the tax preparer with a workpaper (a detailed report of what’s done and what needs attention). The tax preparer then takes over from there and does the final touches in their tax software.

Why using AI to prepare tax returns is harder than it looks

What is in a tax return - a short primer

A tax return is basically an exercise to figure out your total taxable income and what that means how much tax you owe or get back.

Here’s the mental model:

You start with everything you earned: wages, investments, side gigs, etc.
You subtract things the IRS says shouldn’t be taxed (retirement contributions, HSA deposits, etc.).
You then take deductions (standard or itemized) to get to your taxable income.
From that, you calculate how much tax you owe using IRS rates.
Finally, you compare it to how much tax you’ve already paid (through paycheck withholdings or estimated payments).

If you paid more than what you owe → refund. If you paid less → tax bill.

Why is this a hard problem for AI?

Problem 1

Large amount of contextual data

Problem 2

Lack of contextual data

Problem 3

Accuracy & trust requirements

Problem 1: Large context: Solving the needle-in-a-haystack problem

A tax preparer typically gets 100s of pages of documents from their clients. These documents are often in different formats - paper, pdf, images, etc. and often don’t have any explanations. Even if they have explanations, they are often not very detailed. Humans are good at handling a large amount of background context almost instantly. A human tax preparer can figure out and keep track of all the people in a tax return, how they are related to each other, what their financial situation is, etc. In order to get AI to do this effectively today, we need multiple AI agents who work on specific parts. And where there are multiple people (AI Agents here) we need a plan to orchestrate their work.

Filed: AI with a plan

At a high level, Filed’s AI does the following using multiple AI agents:

Analyze prior year

Analyze the prior year tax return and uncover insights - missing documents, errors and financial details, complexity.

Create a plan

Create a plan for the current year tax return - break down the documents into smaller tasks

Create data entry tasks

Create a set of data entry tasks - w2, 1099, schedule A, etc

Create workpaper

Create a workpaper - to keep track of what the AI is doing and what the tax preparer needs to do to finish the tax return

Execute tasks

Do the data entry tasks one by one

Final review

Create a final review of the tax return so the tax preparer can finish the tax return

This collaborative approach of breaking down work into smaller parts is what allows AI to pick the right context from the documents and do the taxprep effectively.

Problem 2: Missing context: Figuring out what’s missing before agents do work

A tax preparer can look at the prior year tax return and quickly figure out what is missing and what the client needs to submit this year. We trained an AI agent specifically for this task - the prior year agent.

Analyze prior year return

The prior year agent looks at the prior year tax return of the client and list down all the possible documents that the client needs to submit this year. It does this even before looking at the current year source documents.

Confirm document presence

Then the agent looks through the current year source documents and confirms if the documents are present or not. If not, it lists down the specific reason why the document is missing and why it is needed.

Find related missing documents

Finally the agent looks through the current year source documents and tries to find if any related documents are missing.

This provides the tax preparer a detailed list of what is missing and why - which is very important so the tax prepare can take over and finish the work.

Problem 3: Accuracy & trust: How do we build trust with AI?

Today when we use AI agents or ChatGPT in general, we often see a disclaimer that the output that the AI gives is not guaranteed to be accurate and should be verified by a human. We call this problem with a very popular name - “hallucinations”.

Taxes on the other hand need to be 100% accurate.

This is the hardest problem we had to solve. Filed solved this using two clever approaches:

Guardrails

Constrained pathways that guide AI behavior

Transparency

Clear handoff with workpapers for human review

Guardrails

We built a proprietary guardrails system that guides the AI to follow specific rules and pathways.

These pathways and rules are defined by a tax engineer (our CPAs who train the AI)
These handcrafted rules and pathways ensure that the AI has a limited paths it can take to the solve a specific problem while preparing a tax return
Think of our AI as the chef in a restaurant. The chef has to make the dish but the recipe — the ingredients, techniques, and plating styles are created by our CPAs. The chef can make minor adjustments to balance the flavor but never invents random recipes.

Transparency

We designed Filed to reduce the number of hours it takes to complete a tax return. The idea here is that AI does the first draft of the tax return and then hands over the work to be polished and finished by a tax preparer. To do this effectively, the AI needs to not only do the work but also indicate where to glance through and what to pay attention to. We do this with the help of our AI workpaper. The workpaper is a way for the AI to handover and indicate what’s missing and where the preparer needs to pay attention to.

Benchmarks

We run our AI agent against rigourous real world test cases before releasing new updates.

Filed achieved 72.5% strict accuracy on complete federal tax returns in Column Tax’s TaxCalcBench evaluation—more than double the performance of leading standalone LLMs (GPT-5: 41.7%, Claude Opus 4: 27.5%, Gemini 2.5 Pro: 32.4%)

The key differentiator is Filed’s multi-agent architecture with layered validation and deterministic checks, not just the underlying models
Generic LLMs struggle with tax accuracy, achieving only 23-42% on the benchmark, while Filed’s specialized system demonstrates that thoughtful architecture can substantially improve AI reliability
Calculation represents just one phase of real tax work—Filed also handles document intake, data entry into professional tax software, and provides transparent workpapers for review

​Intro to Filed

​Why using AI to prepare tax returns is harder than it looks

​What is in a tax return - a short primer

​Why is this a hard problem for AI?

Problem 1

Problem 2

Problem 3

​Problem 1: Large context: Solving the needle-in-a-haystack problem

​Filed: AI with a plan

​Problem 2: Missing context: Figuring out what’s missing before agents do work

​Problem 3: Accuracy & trust: How do we build trust with AI?

Guardrails

Transparency

​Guardrails

​Transparency

​Benchmarks

Read more about our benchmarks

Intro to Filed

Why using AI to prepare tax returns is harder than it looks

What is in a tax return - a short primer

Why is this a hard problem for AI?

Problem 1: Large context: Solving the needle-in-a-haystack problem

Filed: AI with a plan

Problem 2: Missing context: Figuring out what’s missing before agents do work

Problem 3: Accuracy & trust: How do we build trust with AI?

Guardrails

Transparency

Benchmarks