The mid-market AI glossary.
Plain-English definitions of 30 AI terms relevant to owner-operators of $8M-$50M businesses. Skip the marketing language. The definitions below are how we use these words on diagnosis calls, with examples drawn from real engagements.
Agentic
API
Audit Trail
Bespoke AI
CoT
Commission
Context
CPQ AI
Custom AI
Embedding
Fine-tuning
Foundation
Guardrails
Hallucination
Inference
LLM
Orchestration
Permissions-aware
Prompt Eng.
RAG
Retrieval
Tenant
Token
Tool Use
Vector DB
Voice AI
Webhook
Workflow AI
Zero-Shot
Owned
- Agentic workflow
- An AI workflow in which the model takes a sequence of actions (read data, make a decision, write to a system, iterate) rather than producing a single response. The CCH Axcess workflow has agentic steps: read return, run tie-out, surface flags, write to reviewer queue.
- API integration
- The technical layer through which a custom AI reads data from and writes data back to a system of record (CCH Axcess, ServiceTitan, iManage, AMS360). Authentication is typically OAuth2.
- Audit trail
- A persistent log of which user (or system) performed which action and when. Critical for legal and CPA AI commissions where actions may need to survive subpoena or regulatory review. We log every AI-system action by default.
- Bespoke AI
- Synonym for custom AI in our usage. The system is built for one firm's specifics, not the average customer. See also Bespoke AI Systems.
- Chain-of-thought (CoT)
- A technique in which an AI model is prompted to reason step-by-step before producing a final answer. Improves accuracy on complex tasks. Modern models (Claude, GPT-4 class) often do this internally without explicit prompting.
- Commission (verb)
- To engage a boutique to build a custom AI system on the operation's data, in the operation's stack, owned by the operation at handoff. Distinct from build (firm's own engineers) and buy (off-the-shelf product). See Build, buy, or commission.
- Context window
- The amount of text an AI model can read at once. Modern frontier models have context windows of 200K-2M tokens (roughly 150-1,500 pages). Larger context allows the AI to consider more of a matter, more of a binder, more of a customer history at once.
- CPQ AI
- Configure-Price-Quote AI. Automated system that ingests an RFQ, parses specifications, looks up part history, prices against the shop's rules, drafts the proposal. Highest-leverage workflow at most specialty manufacturers. See the Epicor Kinetic playbook.
- Custom AI
- An AI system commissioned and built specifically for one firm's data, stack, and workflow, owned by the operation at handoff. Distinct from off-the-shelf AI (Karbon AI, Harvey, ServiceTitan AI, Vertafore IQ) which is calibrated against the average customer.
- Embedding
- A numeric vector representation of text or other content. Used by AI systems for semantic search ("find the matter most similar to this one") rather than keyword search. Embeddings are how RAG systems decide which documents to retrieve.
- Fine-tuning
- The process of training an AI model on a firm's specific data so it adapts to the operation's vocabulary, format, or judgments. Less common in modern systems than RAG; we use RAG by default and reserve fine-tuning for specific cases where the operation's voice or format is meaningfully unusual.
- Foundation model
- A large general-purpose AI model (Claude, GPT-4, Gemini, Llama). Provides the reasoning core; the custom system surrounds it with retrieval, tooling, guardrails, and business-specific context.
- Guardrails
- Rules that constrain what the AI is allowed to do. Examples: "never send an email to a client without one-click human approval," "never modify a tax return; only surface flags," "never bypass iManage permissions." Guardrails are scoped in writing during the diagnosis. See Lesson 5: Scoping.
- Hallucination
- When an AI model produces a confident-sounding but factually wrong output, often citing sources that don't exist. Mitigated through retrieval-grounded generation (RAG), citation enforcement, and human review at the right point in the workflow. Critical concern in legal AI; see the iManage playbook.
- Inference
- The process of running a trained AI model to generate a response. Inference is the unit cost of running the AI in production; modern models run inference at fractions of a cent per task.
- LLM (Large Language Model)
- The class of AI models that read and write text, including Claude, GPT-4, Gemini. The reasoning core of most modern AI systems. Mid-market operators typically don't choose between LLMs; the boutique selects the right one for the workflow.
- Orchestration
- The layer that coordinates multiple AI calls, tool uses, and data reads/writes into a coherent workflow. The orchestration code is what turns a foundation model into a custom AI system.
- Permissions-aware retrieval
- A retrieval system that returns only the documents the querying user is permitted to see, enforced at query time rather than result-filter time. Required in legal RAG to preserve ethical walls. See the iManage playbook for the legal version.
- Prompt engineering
- The craft of writing instructions to an AI model so it produces useful output. In production custom AI systems, prompts are written once by the boutique and refined through testing; the operation's users don't write prompts day-to-day.
- RAG (Retrieval-Augmented Generation)
- An AI architecture where a language model is grounded in retrieved context from a private knowledge base before generating a response. The most common architecture for business-specific AI: instead of training the model on the operation's data, the system retrieves relevant firm documents at query time and feeds them to the model.
- Retrieval
- The process of fetching relevant documents or data from a private knowledge base in response to a query. The "R" in RAG. Quality of retrieval is usually the bottleneck on RAG system quality.
- Tenant
- A logically isolated environment within a cloud platform (Azure, AWS, Google) where one firm's data and code live. We deploy custom AI systems inside the operation's own tenant to preserve data residency.
- Token
- The unit AI models read and write text in. Roughly equivalent to 0.75 words. AI model pricing is usually per million tokens; context windows are measured in tokens.
- Tool use
- A capability of modern AI models to call external tools (APIs, calculators, search) as part of completing a task. Tool use is what lets a custom AI system read from and write to ServiceTitan, CCH Axcess, iManage, etc.
- Vector database
- A database that stores and queries embeddings. Pinecone, Weaviate, Qdrant, pgvector. The retrieval engine in most RAG systems.
- Voice AI
- An AI system that handles real-time voice conversations: receptionist, qualifier, scheduler. Used in our ServiceTitan integration for 24/7 AI receptionist that books jobs into dispatch.
- Webhook
- A mechanism by which one system notifies another in real time when an event happens (a Job is booked, a matter is opened, a renewal is approaching). Webhooks are how custom AI systems react to events in the system of record without polling.
- Workflow AI
- An AI system that automates a specific business workflow end-to-end: PBC chase, COI generation, RFQ-to-quote, billable-hour reconstruction. Distinct from general-purpose chat AI. The leverage at most mid-market operators is in workflow AI, not chat.
- Zero-shot vs few-shot
- Zero-shot: the model performs a task without examples in the prompt. Few-shot: the model is given examples first. Modern frontier models work well zero-shot for most tasks; few-shot helps when the operation has unusual format requirements.
- Owned system
- An AI system where the operation holds the code, the data, and the deployment. Distinct from a rented system (Karbon AI, Harvey) where stopping the subscription stops the system. Owned systems compound across years; rented systems don't.
Speak the language with us.
The diagnosis call uses the words above the way they're defined here. Honest reads, plain English, dollar figures attached.