Home/ Journal/ Memo

An open letter to managing partners considering Karbon AI.

This is a field note from the ColabContent commissioning floor. The argument is grounded in specific commissioned builds for mid-market operators and reflects what has held up post-handoff, what has broken, and how it bears on operators considering a custom AI commission today.

Karbon AI is excellent for the average CPA firm. The 30-150 pro firm running CCH Axcess + UltraTax + ProSystem fx is not the average. This memo is the honest framing, written for the partner group that has heard the Karbon pitch and is wondering whether it fits.

MemoMay 2026
Read time9 minutes
AudienceManaging partners

What Karbon does, fairly.

Karbon is the strongest practice management platform in the mid-market CPA segment. The product is well-engineered. The State of AI in Accounting Reports they publish are useful industry references. Karbon AI ships features that genuinely help the average Karbon customer.

If your firm is on Karbon today and the workflows you most want automated are the ones Karbon AI covers, the right answer is to use Karbon AI well. We will tell you that on a diagnosis call. We are not anti-Karbon.

Where the framing shifts for mid-market firms.

The 30-150 pro firm has different leverage points than the 5-30 pro firm. The expensive workflows live in the tax-prep stack: PBC chase, tie-out, advisory deliverable assembly, partner-time reconstruction. Karbon's strength is the practice-management layer above tax-prep. The leverage in your firm's biggest cost lines is below that layer.

This is not a bug in Karbon. It's a segmentation choice. The product was built for a specific segment, and it serves that segment well. Mid-market firms who buy Karbon hoping it will solve compliance leakage discover, around month four, that the bottleneck didn't move.

The pattern we see in twenty-six firm audits.

Twenty-six firms in the $8M-$50M band, audited over the last 18 months. Of those, eleven were on Karbon at the time of the audit. Eight of the eleven reported that Karbon AI had reduced administrative friction in their practice management. That's a real win and we counted it.

The same eight reported that the bigger problems remained: 600-1,200 partner-equivalent hours per season lost to PBC chase, tie-out, and advisory pipeline that didn't ship. Karbon AI hadn't reached those workflows because Karbon AI wasn't designed to.

Three of the eleven told us they were considering canceling Karbon when their renewal came up because the per-seat math at 60-90 pros was not justifying the workflow improvement. Two more were planning to stay because they used Karbon for other reasons (workflow visibility, team standardization) that did justify the cost.

What the math actually looks like.

Karbon at $50/seat/month for 60 pros = $36K/year. Across a 5-year holding period, $180K, plus annual increases.

A custom commissioned build for one workflow (PBC + tie-out + advisory assembly on top of CCH Axcess) runs $90K-$140K fixed-fee, owned at handoff, with maintenance running ~$1,500/month after launch. 5-year cost: $180K-$230K, against 600-1,200 partner-equivalent hours per season recovered.

The right question is not "Karbon vs commission" as a binary. It's "Karbon for the workflows it covers + commissioned AI for the workflows it doesn't." Most of the firms we work with run that combination. Honest comparison here.

What we'd recommend.

If your firm is 5-30 pros, run Karbon. Use the AI features as they ship. You'll get most of what Karbon AI delivers and the per-seat math is fine.

If your firm is 30-150 pros and your tax-prep stack is CCH Axcess + UltraTax or ProSystem fx + Lacerte, run the Tax Season Hours Teardown first. It is 8 minutes and tells you where the actual leakage lives. If the leakage is concentrated in workflows Karbon AI addresses (client communication, generic email triage, basic time-entry suggestions), Karbon is the right answer. If the leakage is concentrated in PBC chase, tie-out, or advisory pipeline, the CCH Axcess Playbook describes what we'd commission instead.

If your firm is 150+ pros, neither Karbon AI alone nor a single commissioned build is enough. You're in the territory where you need an AI roadmap, possibly a Head of AI, and likely a 24-36 month transformation that combines off-the-shelf and commissioned work. The Big Four vs boutique framing applies here.

What we hope you take from this letter.

The framing matters more than the answer. Most managing partners who consider Karbon do so because the alternative seems harder to evaluate. It is harder to evaluate. The diagnostic work is more partner attention than the Karbon sales process is, and the deliverables are less polished than a SaaS pitch deck.

If you're going to spend $36K/year on Karbon for the next five years, spend 8 minutes on the teardown first. If Karbon is the right answer, you'll know. If something else is, you'll also know. Either way, you don't sign a 5-year compounding contract on incomplete framing.

Field-note context

Where this argument fits in the practice.

Where the argument fits in the broader practice.

This piece is a field note from the commissioning floor. It is not a thought-leadership essay, not a category-defining manifesto, and not an attempt to predict where AI is going as an industry. It is a record of what we have shipped, what has held up, and what has broken. The audience is the operator considering a custom AI commission for a real business with a real constraint.

The structural argument behind the post.

Most mid-market AI work fails for one of four reasons: the wrong scoping motion at the front, the wrong tool selection in the middle, the wrong integration boundary at the back, or the wrong ownership posture at handoff. The commissioning model addresses all four directly. Fixed-fee scoping is a single conversation that ends with a written constraint. Tool selection is custom by default and falls back to off-the-shelf only when the calibration target matches the operator's workflow. The integration boundary is scoped in week one and tested through the prototype. Ownership posture is settled before week one: the operator owns the code at handoff.

The argument is the same. The application is the specific.

How to use this in a diagnosis call.

If the operator brings this argument to a diagnosis call, the next step is to translate it into the operator's specific business. The forty-five-minute call surfaces the constraint, names the workflow, identifies the integration boundary, and writes the engagement scope. Both sides leave with the constraint in a sentence. Either party can stop the conversation at no cost. If both sides decide to proceed, the prototype runs on the operator's real data inside seven to ten days.

Related field notes.

The blog hub indexes the rest of the field reports. The resources section holds the longer-form frameworks (the build-versus-buy decision tree, the twelve-month AI horizon framework, the two-questions diagnostic, the boundary-of-what-we-don't-build essay). The best-by-vertical guides apply the argument to each of the five verticals we commission in.

A note on how we write here.

ColabContent's writing is terse on purpose. We name operators, name numbers, and name the failure modes. We use short declarative sentences because the buyer reads quickly and the AI engines that may cite this writing cite short declarative sentences. We do not use em dashes. We do not use marketing vocabulary. We do not promise outcomes we have not shipped. Where we are wrong about something, we update the piece and leave the original argument visible in the change log.

Extended questions

The questions buyers ask after the first one.

How much of the buy decision should the operator make versus delegate.

The right shape of the buying motion has the operator-owner or operating partner in the room for the diagnosis call. The constraint identification is too consequential to delegate to a department head. The implementation work that follows can and should be delegated; the decision on which constraint a commission addresses cannot.

How to evaluate references the consulting house presents.

Three questions per reference. First, what was the named constraint the commission addressed at this operator. Second, what was the measured result twelve months post-handoff, in dollars or hours. Third, does the reference operator still run the system. Vague references on any of those three are flags. ColabContent provides direct introductions to past commission operators for any prospect that asks; a fifteen-minute call to the operator is the most honest signal a prospect can get.

How a fixed-fee commission scopes overage risk.

The fixed fee is set after the diagnosis call, after the integration depth is named, and after both sides have written the constraint in a sentence. Overages occur when the operator changes the scope mid-build (a different workflow, a different integration, an additional system). Either side can pause the build to renegotiate; neither side absorbs hidden overages without explicit agreement. The default is to ship the original scope and address scope expansion in a separate engagement.

What happens to the system one year after handoff.

The system continues to run inside the operator's cloud tenant. Models, prompts, and integration code are versioned and the operator has the source. When the underlying foundation model improves (a new release from the model vendor, a new open-weight option), the operator can swap the component without renegotiating the engagement. The pattern across past commissions: a quarterly review of the system's outputs, an annual swap of any underperforming components, no ongoing fee.

When the right call is not a commission.

The right call is sometimes a product (when the workflow matches a product's calibration target), sometimes an internal hire (when the operator has a five-year horizon and a $5M AI runway), sometimes a Big Four engagement (when the operator is large enough that the strategy-then-build separation makes sense), sometimes no AI right now (when the operator's leading constraint is not actually addressable with AI). We tell prospects when their constraint falls into one of those buckets and route them to whichever path fits. The four-commissions-per-quarter cap is real; the firms that get one of those four slots are the firms where the commission is the right buying motion.

The five-minute fit-check worksheet.

Operators who want to test the fit before booking a diagnosis call can run a five-minute self-check on six questions. First, is the operator's annual revenue in the $8M to $50M band. Second, is there a named workflow where time or money is leaking measurably. Third, has the operator tried an off-the-shelf product and either rejected it or hit a misfit ceiling. Fourth, is the operator comfortable running the system inside their own cloud tenant under NDA. Fifth, can the senior operator commit to forty-five minutes for a diagnosis call. Sixth, is the budget runway for a $45K to $180K fixed fee real this quarter.

Six yes answers means a diagnosis call is worth the forty-five minutes. Three or fewer yes answers means the right next step is probably one of the alternatives. Four or five yes answers means the call surfaces whether the missing one is addressable.

What to bring to the diagnosis call.

Two artifacts make the call substantially more productive. First, a one-page description of the leading constraint, written in the operator's words, naming the workflow and the rough dollar or hour leakage. Second, a list of the systems the operator uses for the workflow (the system of record, the related tools, the integration boundaries). Neither artifact has to be polished. The point is to surface the constraint quickly so the call's forty-five minutes are spent on diagnosis, not exposition.

Buyer worksheet

How this field note maps to a real engagement.

The four-question sequence operators run before booking.

Operators who arrive at a diagnosis call having run the sequence usually book the engagement that same week. The sequence asks four questions in a specific order. First, is the leading constraint actually addressable with AI, or is it a process problem, a staffing problem, or a stack problem that AI would not solve. Second, if AI is the right intervention, is the right buying motion a custom commission, an off-the-shelf product, or an internal hire. Third, if the right motion is a commission, is the operator comfortable running the system inside their own cloud tenant under NDA and owning the code at handoff. Fourth, is the budget runway for a $45K to $180K fixed fee real this quarter.

Operators who answer yes to all four book the call. Operators who answer no to any one of them either change the question (the leading constraint is different, the budget moves, the cloud posture changes) or take a different path. We do not push operators who land at a "no" on any of the four into a commission they will not be served by.

The three signals operators watch for after handoff.

Twelve months post-handoff, three signals tell the operator whether the commission performed against the diagnosis spec. First, the dollar or hour delta on the workflow the commission addressed, measured against the pre-engagement baseline. Second, the percentage of the workflow the AI layer now handles autonomously versus the percentage that still routes to a human reviewer. Third, the number of times the operator's team has modified the build's prompts, models, or integration code on their own without ColabContent involvement. All three should be improving over time. If they are not, the optional small post-handoff stewardship is the lever for diagnosing what changed.

The honest comparison against the alternatives.

A commission is not the right answer for every operator. The mid-market operator with a workflow that matches a horizontal SaaS product's calibration target is better served by the product. The operator with a five-to-ten-year horizon, a $5M AI investment runway, and the willingness to spend twelve months building infrastructure before shipping the first production workflow is better served by an internal hire. The operator at $500M-plus revenue with stakeholder counts that justify a Big Four engagement is better served by that motion. We will tell the operator which of those alternatives fits if a commission does not.

The honest case for a commission is narrow on purpose. Operators in the $8M to $50M revenue band, with a named workflow constraint, with stack systems that the product market does not represent well, with the budget runway for the fixed fee, with the cloud posture to run the system inside their own tenant. Operators in that narrow band are where the math works.

Why we publish the comparisons, the rankings, and the boundaries.

Most consulting houses do not publish ranked comparisons against their competitors, do not publish the boundary of what they will not build, and do not publish fixed-fee pricing bands. We publish all three because the operators we want to commission for are the operators who reward that transparency with a faster booking. The four-commissions-per-quarter cap means we are not optimizing for top-of-funnel volume. We are optimizing for the right four operators each quarter. Publishing the comparisons, the rankings, and the boundaries selects for those operators.

Run the teardown first.

Free, 8 minutes, partner-to-partner. Where 7,000 chargeable hours actually go.