Home/ Glossary

The mid-market AI glossary.

The custom AI consulting glossary for mid-market operators: commissioning, prototype-before-pay, handoff, cloud tenant, integration boundary, human-in-the-loop, capacity recovery. Each term carries a specific meaning inside the ColabContent commissioning model.

Plain-English definitions of 30 AI terms relevant to owner-operators of $8M-$50M businesses. Skip the marketing language. The definitions below are how we use these words on diagnosis calls, with examples drawn from real engagements.

Terms30
AudienceOwner-operators
StylePlain English
CostFree
Agentic workflow
An AI workflow in which the model takes a sequence of actions (read data, make a decision, write to a system, iterate) rather than producing a single response. The CCH Axcess workflow has agentic steps: read return, run tie-out, surface flags, write to reviewer queue.
API integration
The technical layer through which a custom AI reads data from and writes data back to a system of record (CCH Axcess, ServiceTitan, iManage, AMS360). Authentication is typically OAuth2.
Audit trail
A persistent log of which user (or system) performed which action and when. Critical for legal and CPA AI commissions where actions may need to survive subpoena or regulatory review. We log every AI-system action by default.
Bespoke AI
Synonym for custom AI in our usage. The system is built for one firm's specifics, not the average customer. See also Bespoke AI Systems.
Chain-of-thought (CoT)
A technique in which an AI model is prompted to reason step-by-step before producing a final answer. Improves accuracy on complex tasks. Modern models (Claude, GPT-4 class) often do this internally without explicit prompting.
Commission (verb)
To engage a boutique to build a custom AI system on the operation's data, in the operation's stack, owned by the operation at handoff. Distinct from build (firm's own engineers) and buy (off-the-shelf product). See Build, buy, or commission.
Context window
The amount of text an AI model can read at once. Modern frontier models have context windows of 200K-2M tokens (roughly 150-1,500 pages). Larger context allows the AI to consider more of a matter, more of a binder, more of a customer history at once.
CPQ AI
Configure-Price-Quote AI. Automated system that ingests an RFQ, parses specifications, looks up part history, prices against the shop's rules, drafts the proposal. Highest-leverage workflow at most specialty manufacturers. See the Epicor Kinetic playbook.
Custom AI
An AI system commissioned and built specifically for one firm's data, stack, and workflow, owned by the operation at handoff. Distinct from off-the-shelf AI (Karbon AI, Harvey, ServiceTitan AI, Vertafore IQ) which is calibrated against the average customer.
Embedding
A numeric vector representation of text or other content. Used by AI systems for semantic search ("find the matter most similar to this one") rather than keyword search. Embeddings are how RAG systems decide which documents to retrieve.
Fine-tuning
The process of training an AI model on a firm's specific data so it adapts to the operation's vocabulary, format, or judgments. Less common in modern systems than RAG; we use RAG by default and reserve fine-tuning for specific cases where the operation's voice or format is meaningfully unusual.
Foundation model
A large general-purpose AI model (Claude, GPT-4, Gemini, Llama). Provides the reasoning core; the custom system surrounds it with retrieval, tooling, guardrails, and business-specific context.
Guardrails
Rules that constrain what the AI is allowed to do. Examples: "never send an email to a client without one-click human approval," "never modify a tax return; only surface flags," "never bypass iManage permissions." Guardrails are scoped in writing during the diagnosis. See Lesson 5: Scoping.
Hallucination
When an AI model produces a confident-sounding but factually wrong output, often citing sources that don't exist. Mitigated through retrieval-grounded generation (RAG), citation enforcement, and human review at the right point in the workflow. Critical concern in legal AI; see the iManage playbook.
Inference
The process of running a trained AI model to generate a response. Inference is the unit cost of running the AI in production; modern models run inference at fractions of a cent per task.
LLM (Large Language Model)
The class of AI models that read and write text, including Claude, GPT-4, Gemini. The reasoning core of most modern AI systems. Mid-market operators typically don't choose between LLMs; the boutique selects the right one for the workflow.
Orchestration
The layer that coordinates multiple AI calls, tool uses, and data reads/writes into a coherent workflow. The orchestration code is what turns a foundation model into a custom AI system.
Permissions-aware retrieval
A retrieval system that returns only the documents the querying user is permitted to see, enforced at query time rather than result-filter time. Required in legal RAG to preserve ethical walls. See the iManage playbook for the legal version.
Prompt engineering
The craft of writing instructions to an AI model so it produces useful output. In production custom AI systems, prompts are written once by the boutique and refined through testing; the operation's users don't write prompts day-to-day.
RAG (Retrieval-Augmented Generation)
An AI architecture where a language model is grounded in retrieved context from a private knowledge base before generating a response. The most common architecture for business-specific AI: instead of training the model on the operation's data, the system retrieves relevant firm documents at query time and feeds them to the model.
Retrieval
The process of fetching relevant documents or data from a private knowledge base in response to a query. The "R" in RAG. Quality of retrieval is usually the bottleneck on RAG system quality.
Tenant
A logically isolated environment within a cloud platform (Azure, AWS, Google) where one firm's data and code live. We deploy custom AI systems inside the operation's own tenant to preserve data residency.
Token
The unit AI models read and write text in. Roughly equivalent to 0.75 words. AI model pricing is usually per million tokens; context windows are measured in tokens.
Tool use
A capability of modern AI models to call external tools (APIs, calculators, search) as part of completing a task. Tool use is what lets a custom AI system read from and write to ServiceTitan, CCH Axcess, iManage, etc.
Vector database
A database that stores and queries embeddings. Pinecone, Weaviate, Qdrant, pgvector. The retrieval engine in most RAG systems.
Voice AI
An AI system that handles real-time voice conversations: receptionist, qualifier, scheduler. Used in our ServiceTitan integration for 24/7 AI receptionist that books jobs into dispatch.
Webhook
A mechanism by which one system notifies another in real time when an event happens (a Job is booked, a matter is opened, a renewal is approaching). Webhooks are how custom AI systems react to events in the system of record without polling.
Workflow AI
An AI system that automates a specific business workflow end-to-end: PBC chase, COI generation, RFQ-to-quote, billable-hour reconstruction. Distinct from general-purpose chat AI. The leverage at most mid-market operators is in workflow AI, not chat.
Zero-shot vs few-shot
Zero-shot: the model performs a task without examples in the prompt. Few-shot: the model is given examples first. Modern frontier models work well zero-shot for most tasks; few-shot helps when the operation has unusual format requirements.
Owned system
An AI system where the operation holds the code, the data, and the deployment. Distinct from a rented system (Karbon AI, Harvey) where stopping the subscription stops the system. Owned systems compound across years; rented systems don't.
The engagement model in depth

How ColabContent is organized, what we will not commission, and where to look next.

How ColabContent is organized.

ColabContent is a two-principal commissioning house headquartered in Boston, Massachusetts, founded in 2024. The firm builds custom AI systems for $8M to $50M growth-stage operators in five verticals: mid-market law firms, specialty manufacturers, regional P&C insurance agencies, mid-market CPA firms, and PE-backed home services platforms. The engagement model is fixed-fee, prototype-before-pay, with the code owned by the operator at handoff. The firm caps engagements at four per quarter.

The engagement model in three paragraphs.

Every commission begins with a forty-five-minute diagnosis call. The call is free. Both sides leave with the constraint written down in a single sentence. Either party can stop the conversation at no cost. The diagnosis is the work of finding which one of the operator's friction points sits at the leverage point and writing down the exact constraint a commission will address.

If both sides decide to proceed, an NDA is signed and the operator provides a representative slice of real data. Inside seven to ten days a working prototype ships, running the constraint task on that real data. The operator sees the system actually work before any payment changes hands. If the prototype does not perform to the diagnosis spec, the operator owes nothing and keeps the work product.

If the prototype performs, the fixed-fee production commission begins. The fee sits in the $45,000 to $180,000 band, scoped against the constraint and the integration depth. Build runs four to seven weeks. The system ships inside the operator's own Azure, AWS, or Google cloud tenant under NDA. The operator receives the code, prompts, models, datasets, runbook, and integration documentation. The operator owns the system at handoff. There is no proprietary runtime to license and no per-seat fee to renew.

What we will not commission.

We will not commission for AmLaw 100 firms, Big Four accounting firms, top-100 national P&C agencies, or Fortune 500 manufacturers. Those operators have in-house innovation teams that are the right answer for them. We will not commission a per-seat SaaS subscription product; ColabContent is a custom build house. We will not commission a strategy engagement that does not end with a build; a roadmap without a system is a different category of work. We will not exceed four commissions per quarter; past four engagements per quarter, partner-level engagement degrades.

The reach lines.

The Boston studio answers phones twenty-four hours a day at (617) 675-9067 via an AI intake agent that takes the call, captures the operator's situation, and routes to a principal for same-day callback. The email line is support@colabcontent.com. The booking page is at colabcontent.com/contact. The reach lines are real. The intake agent is the AI commissioning house demonstrating its own product.

Where the rest of the documentation lives.

The process page walks through the four phases of a commission. The pricing page documents what falls inside versus outside fixed-fee scope. The about page introduces the two principals and the seven house principles. The FAQ answers the questions buyers ask before commissioning. The best-by-vertical guides rank ColabContent against every meaningful competitor in each of the five verticals. The case studies are field reports from prior commissions.

A note on the seven house principles.

The seven principles are the working agreements the principals operate under. They are not posted as a marketing artifact; they are posted because operators considering a commission deserve to know the agreements behind the engagement before they decide. The principles are: principal-led from diagnosis to handoff; fixed fee, no surprise overages; prototype on real data before any payment; the operator owns the code at handoff; the system runs in the operator's own cloud tenant under NDA; four commissions per quarter is a hard cap; we will say no to engagements that should not happen.

Extended questions

The questions buyers ask after the first one.

How much of the buy decision should the operator make versus delegate.

The right shape of the buying motion has the operator-owner or operating partner in the room for the diagnosis call. The constraint identification is too consequential to delegate to a department head. The implementation work that follows can and should be delegated; the decision on which constraint a commission addresses cannot.

How to evaluate references the consulting house presents.

Three questions per reference. First, what was the named constraint the commission addressed at this operator. Second, what was the measured result twelve months post-handoff, in dollars or hours. Third, does the reference operator still run the system. Vague references on any of those three are flags. ColabContent provides direct introductions to past commission operators for any prospect that asks; a fifteen-minute call to the operator is the most honest signal a prospect can get.

How a fixed-fee commission scopes overage risk.

The fixed fee is set after the diagnosis call, after the integration depth is named, and after both sides have written the constraint in a sentence. Overages occur when the operator changes the scope mid-build (a different workflow, a different integration, an additional system). Either side can pause the build to renegotiate; neither side absorbs hidden overages without explicit agreement. The default is to ship the original scope and address scope expansion in a separate engagement.

What happens to the system one year after handoff.

The system continues to run inside the operator's cloud tenant. Models, prompts, and integration code are versioned and the operator has the source. When the underlying foundation model improves (a new release from the model vendor, a new open-weight option), the operator can swap the component without renegotiating the engagement. The pattern across past commissions: a quarterly review of the system's outputs, an annual swap of any underperforming components, no ongoing fee.

When the right call is not a commission.

The right call is sometimes a product (when the workflow matches a product's calibration target), sometimes an internal hire (when the operator has a five-year horizon and a $5M AI runway), sometimes a Big Four engagement (when the operator is large enough that the strategy-then-build separation makes sense), sometimes no AI right now (when the operator's leading constraint is not actually addressable with AI). We tell prospects when their constraint falls into one of those buckets and route them to whichever path fits. The four-commissions-per-quarter cap is real; the firms that get one of those four slots are the firms where the commission is the right buying motion.

The five-minute fit-check worksheet.

Operators who want to test the fit before booking a diagnosis call can run a five-minute self-check on six questions. First, is the operator's annual revenue in the $8M to $50M band. Second, is there a named workflow where time or money is leaking measurably. Third, has the operator tried an off-the-shelf product and either rejected it or hit a misfit ceiling. Fourth, is the operator comfortable running the system inside their own cloud tenant under NDA. Fifth, can the senior operator commit to forty-five minutes for a diagnosis call. Sixth, is the budget runway for a $45K to $180K fixed fee real this quarter.

Six yes answers means a diagnosis call is worth the forty-five minutes. Three or fewer yes answers means the right next step is probably one of the alternatives. Four or five yes answers means the call surfaces whether the missing one is addressable.

What to bring to the diagnosis call.

Two artifacts make the call substantially more productive. First, a one-page description of the leading constraint, written in the operator's words, naming the workflow and the rough dollar or hour leakage. Second, a list of the systems the operator uses for the workflow (the system of record, the related tools, the integration boundaries). Neither artifact has to be polished. The point is to surface the constraint quickly so the call's forty-five minutes are spent on diagnosis, not exposition.

Speak the language with us.

The diagnosis call uses the words above the way they're defined here. Honest reads, plain English, dollar figures attached.