Category: Knowledge Engineering

  • Why Handovers Don’t Work.

    What happens when your star performer decides to quit?

    If you’re like most managers you congratulate them, ask if you can get them to stay, and then get straight to planning their handover.

    Most companies I’ve worked at have elaborate systems for this exact situation. Most of them work very well — until they don’t. In my experience, no matter how elaborate, even the best ones work only 50% of the time. Pretty soon you run into the biggest weakness of any handover process: edge cases.

    An edge case is a problem that carries meaningful impact but happens infrequently. Easy to miss, but impossible to skip when it’s staring right at you.

    Now before you judge the documentation process or suggest using AI, let me tell you this: expert knowledge is hard to capture even with the best tools and processes.

    Here’s why.

    A philosopher named Hubert Dreyfus1 spent years studying how humans develop skill. What he found was unexpected. Experts don’t just accumulate knowledge, they reorganise it into heuristics, recognisable patterns, and perception.

    Simply put, expertise is as much learning through lived experience as it is acquiring deeper knowledge.

    Here’s how we know this.

    In the 1940s, a Dutch chess player and psychologist named Adriaan de Groot wanted to understand what separated great chess players from good ones. The obvious assumption was that grandmasters were simply smarter.

    And their intelligence gave them the edge to think further ahead, consider more moves, and process more combinations. That assumption, ironically, is still how many organisations treat expertise today: as innate ability rather than accumulated experience.

    So de Groot ran an experiment. He showed chess positions to players of different skill levels and asked them to think aloud as they analysed the board. What he found changed the was unexpected.

    The grandmasters weren’t thinking more. In many cases they were thinking less. They considered fewer moves than intermediate players but the moves they considered were almost always the right ones.

    De Groot couldn’t fully explain the mechanism. That came later, when Herbert Simon and William Chase picked up the work in the 1970s. They discovered that grandmasters had memorised somewhere between 50,000 and 100,000 meaningful board patterns or chunks, as they called them, accumulated over years of play. When a grandmaster looks at a board they aren’t seeing 32 pieces. They’re recognising configurations, the way you recognise a face without consciously processing each individual feature.

    Dreyfus took this further and made it philosophical. While a novice follows rules. For the grandmaster, decades of pattern recognition have dissolved into instinct.

    That’s the first clue to our handover problem.

    When you ask as expert to explain their thinking, what you get is a reconstruction built after the fact. The reasoning is plausible, often technically accurate, but it isn’t the actual cognitive process. It’s a story told to explain something that happened faster than language.

    This is what makes handovers structurally impossible to fix after the fact.

    Most handover documents contain reasonable questions about the job, the projects, the clients, the processes. Experts can answer them accurately. But the knowledge that handles edge cases isn’t organised as answers to questions, it’s organised as responses to situations.

    Like the grandmaster reading the position of pieces on a board, it only becomes accessible when the right situation activates it.

    Michael Polanyi captured this with a phrase that has stayed with me:

    ‘We know more than we can tell.’

    But even that undersells it. It’s not just that expert knowledge is hard to articulate.

    It’s that the very process of becoming expert reorganises knowledge into forms that are faster, more contextually sensitive, and more integrated than language allows.

    The next clue comes from Herbert Simon, the same Simon who helped explain the grandmaster’s chunking.

    Studying how people make decisions under complexity, in organisations, in economics, in everyday life, he found that nobody optimises. Not really…

    He wanted to understand how any mind, human or machine, navigates a world too complex to fully process. His answer was bounded rationality: the idea that minds don’t optimise, they satisfice.

    We search until we find something good enough, using the categories and heuristics available to us, and we stop. This is the only viable strategy for a finite mind operating in an infinitely complex world.

    The implication for expertise is precise: when you sit down to document your knowledge, you would find it much easier to write down what’s prompted and what’s top of mind. Easy peasy.

    But you won’t be able to consciously thing of every edge case unless you’re actively prompted. They knowledge exists in your experience but it’s never stored as something a finite mind could retrieve on demand.

    So what do you do?

    The answer isn’t a better offboarding process. It’s a different relationship with expertise altogether.

    One that doesn’t wait for the resignation letter. Expert knowledge needs to be treated as something you harvest continuously, while the expert is still performing, while the knowledge is still alive and activated in real situations.

    The most underused tool for this is reflection and managers can facilitate it in regular 1-2-1s.

    When your star performer has a great quarter, don’t just celebrate and move on. Help them contextualise it by asking them to walk you through exactly the specifics: What did they choose to do? What did they choose not to do, and why? Where did they make a judgment call that isn’t in any playbook?

    Be mindful that is not an interrogation. it is a way to contextualise knowledge that’s as valuable for the expert as it is for the organisation. Repeated over months and years, it builds something no exit interview ever could.

    One of my favourite experts on the subject is Peter Senge. He spent years studying why some organisations learn and others don’t. His conclusion was that the unit of capability in any organisation isn’t the individual — it’s the team.

    Top performers matter, but an organisation that depends on them is fragile by design. What makes organisations genuinely capable is the degree to which knowledge circulates, gets tested, gets refined.

    The handover problem is really a symptom of this: organisations that treat expertise as individually owned discover what they’ve lost only when the individual leaves.

    Most organisations have many people, each carrying expertise that is partially tacit, partially compiled, partially invisible even to themselves. The moment you try to build systems that run on shared expertise, this stops being an offboarding problem and becomes a competitive advantage.

    1Hubert Dreyfus: https://en.wikipedia.org/wiki/Hubert_Dreyfus

  • A Learning Culture is a System, Not A Directive

    A great culture can create the conditions where people are open to growth. But a system is what actually builds a learning organization.

    The difference is deliberate design.

    What deliberate design looks like in practice is three things working together:

    • how talented people experience and learn the craft,
    • how teams hold each other accountable, and
    • how critical moments of work are structured, reviewed, and repeated.

    None of these elements work in isolation. Talent without accountability drifts. Goals without shared outcomes loose meaning. In this environment processes turn into red tape: people follow the steps without understanding what they’re connected to.

    I’m an avid user of Zettelkasten method developed by Nikolas Luhmann, where you express each individual piece of knowledge as a connections between interlinked ideas. The true value of the system isn’t the individual notes but the connections between them, and in the feedback those connections created over time. Learning organisations work the same way.

    Building one starts with leaders setting the agenda by defining the specific behaviours and outcomes that matter. Each function then builds the processes that determine how information flows, what gets reinforced, and what gets discouraged. Those are the feedback loops. And leaders have to visibly embody what they’re asking for. You can’t delegate that part.

    As HBR puts it: “Leaders who treat excellence as a design problem focus less on motivation and more on the conditions that shape behavior every day.”

    That’s the shift. From culture as aspiration to culture as architecture.

  • Good Enough vs Expert Level Knowledge

    In the mid-2000s, a hospital in the American midwest ran a quiet experiment.

    They asked two radiologists to the same sets of scans. One was a staff radiologist with several years into practice. The other was a specialist who had spent twenty years reading a particular kind of scan, had published research on the edge cases, and was regarded by her peers as one of the best in the country at what she did.

    On the straightforward cases they agreed almost always. The staff radiologist was good. His miss rate on standard presentations was low. The hospital was satisfied with his work and had no particular reason to look more closely.

    On the ambiguous cases the specialist was a lot better at catching things that the staff radiologist missed. Over time the difference showed up. Patients whose scans she read did better.

    Using the same technology and tools, the two experts produced very different results. The gap was a knowledge gap, specifically a tacit knowledge gap. One had spent twenty years building a perceptual sensitivity that the other hadn’t had time to develop yet.

    This gap between experts who performs well on standard cases and those who performs at expert level on all of them is the same gap that separates a functional agent from an expert one. And it is almost entirely a knowledge problem.

    What Good Enough Produces

    An agent with general knowledge produces a system that handles the standard cases well. It gets the textbook presentations right. It follows the explicit rules correctly. It produces outputs that are defensible and largely accurate within the range of situations the knowledge was built from.

    In a controlled evaluation against standard cases it performs respectably. The people who built it are satisfied. The system goes into production.

    Then the real cases arrive.

    They are messier, more ambiguous, more contextual than the training examples. They include the presentations that don’t quite fit the criteria, the situations where two rules point in different directions, the cases where the right answer depends on a factor that the knowledge base didn’t think to encode.

    On these cases, good enough knowledge produces outputs that are plausible but wrong.

    Not dramatically wrong — dramatically wrong is easy to catch but subtly wrong. We’ve all had that experience where a chatbot presents a clearly wrong answer with complete confidence.

    That’s the real problem. General agents don’t always understand when they’re operating outside the territory its knowledge covers. This boundary, without specific instructions, is invisible to it.

    The Spectrum From Surface to Causal

    The difference between good enough knowledge and expert-level knowledge is not only a difference in quantity. It is a difference in depth.

    Knowledge exists on a spectrum. At one end is surface knowledge — the explicit rules, the documented procedures, the stated criteria. At the other end is causal knowledge — a deep understanding of why the rules exist, what they are proxies for, how the underlying system actually works.

    Surface knowledge is what most implementations encode. It is what experts can most easily articulate, what documentation captures, what training programs teach.

    It produces correct outputs when the surface pattern matches the training data. It fails when it doesn’t because surface knowledge has no resources for reasoning about novel situations.

    Causal knowledge is what experts use. This causal depth is what allows experts to handle novel situations because they understand the underlying system well enough to extrapolate.

    It is also what allows them to know when they don’t know and to respond appropriately rather than producing a confident answer that happens to be wrong.

    Encoding causal knowledge is harder than encoding surface knowledge. It requires the elicitation process to go deeper, past the rules to the principles behind the rules, past the procedures to the reasoning that generated the procedures.

    It requires a representation that can hold causal relationships. It requires an agent architecture that can reason with the causal model, not just retrieve from it.

    How Domain Expertise Becomes a Structural Moat

    There is a competitive dimension to this that is worth understanding clearly.

    Any organisation can buy access to a capable base model. It is infrastructure, like electricity or cloud computing. What cannot be bought is the knowledge that makes the model perform at expert level in a specific domain.

    That knowledge lives in the people who have spent years developing expertise in that domain. It is distributed across the minds of the people in the organisation, built up through years of experience, and not replicable from the outside.

    A competitor can acquire the same model. They cannot acquire the same knowledge base without acquiring the people who hold it and investing the time to surface it.

    This means that the organisations which will extract the most value from AI agent are not the ones with the most sophisticated technology. They are the ones who most successfully encode their domain expertise into their agents.

    The technology is a commodity. The knowledge is the moat.

    The moat compounds over time in a way that is easy to underestimate. An agent system built on expert-level knowledge improves as the knowledge base is maintained and extended.

    Every failure is an opportunity to identify a gap and close it. Every new case that the system handles adds to the evidence base for what the knowledge does and doesn’t cover.

    The gap between an organisation that treats knowledge engineering as a core capability and one that treats it as a one-time implementation project widens continuously. Not because the laggard’s technology gets worse but because the leader’s knowledge gets better.

    This compounding effect is why the knowledge investment is worth making even when it is slow and expensive.

    A system built on good enough knowledge reaches a performance ceiling quickly and stays there. A system built on expert-level knowledge improves as the knowledge improves, and the knowledge can always improve.

    The Compounding Effect of Knowledge Quality

    Consider two agent systems built for the same function using the same baseline model, say, evaluating marketing briefs against a brand strategy.

    The first is built on good enough knowledge. The team spent a few weeks documenting the brand guidelines, encoding the explicit rules about tone, format, and messaging hierarchy, and building a system that checks briefs against those rules.

    The second is built on expert-level knowledge. The team spent months learning from senior brand strategists. Why does the brand avoid certain kinds of humour? What is it actually trying to protect? What makes a brief that technically follows the guidelines feel wrong anyway? What are the cases where bending a rule serves the brand better than following it? The elicitation goes deep. The context is designed so the agent can reason about intent, not just compliance.

    In the first month, the two systems perform similarly on standard briefs. The first system’s simpler knowledge base is sufficient for the majority of cases. By the end of the first year, the gap is significant.

    The second system is catching the subtle misalignments that the first misses. It has also been maintained and extended. The first system has not been maintained in the same way, because its simpler architecture made it seem like it didn’t need to be.

    By the end of the third year, the first system is a compliance checker. The second is a genuine brand intelligence system. The difference is entirely a knowledge difference. The technology is the same.

    How to Evaluate Whether an Agent’s Knowledge Is Expert-Level

    Expert-level knowledge can be evaluated by looking at what the system does at the edges.

    Test it on the ambiguous cases. The cases where a human expert would pause, where the right answer is not obvious, where two reasonable experts might disagree.

    What does the system do? Does it produce a confident answer that happens to be wrong? Does it recognise the ambiguity and respond appropriately? Does it ask for more information? The behavior at the edges tells you more about the quality of the knowledge than the behaviour at the center.

    Test it on the cases the rules don’t cover. Give it a situation that falls outside the explicit knowledge it was built from and watch what happens.

    Good enough knowledge produces an answer using whatever surface pattern comes closest. Expert-level knowledge recognises that the situation is novel, signals its uncertainty, and either escalates or reasons carefully from principles rather than patterns.

    Ask it to explain itself on a hard case. Expert-level knowledge produces explanations that reflect genuine causal reasoning. They can give you an account of why the rule applies here, what it is a proxy for, and what would change the answer. Good enough knowledge produces explanations that restate the surface pattern. The explanation is the fingerprint of the knowledge depth.

    Compare it to your best human expert on their hardest cases. Such as the ones the expert finds genuinely difficult. The gap between what the system produces and what the expert would produce on those cases is the most direct measure of the knowledge gap. That gap is the roadmap for where the knowledge engineering work needs to go next.

    The Ceiling Is the Knowledge

    The most important thing to understand about agent performance is that it has a ceiling, and the ceiling is set by the knowledge encoded into the system.

    A more powerful base model raises the floor but it does not raise the ceiling.

    The ceiling is not a function of model capability. It is a function of knowledge quality.

    This means that the decision about how much to invest in knowledge engineering is a strategic one. It is a decision about what level of performance you are trying to achieve and whether the investment required to get there is worth making.

    For many applications, good enough is genuinely good enough. A system that handles standard cases correctly and fails gracefully on non-standard ones is valuable. The investment in expert-level knowledge is not always justified.

    For the applications where the cost of a subtle error compounds over time, where the competitive value of genuine expertise is high, the investment is necessary.

    The ceiling is the knowledge. Raise the knowledge and you raise the ceiling. Everything else is infrastructure.

  • From Expertise to Agent Intelligence

    In the late 1980s, a large American bank set out to build a system that could automate the work of its best credit analysts.

    The analysts were good at their jobs – technically competent with genuine expertise. They could look at a loan application, a set of financial statements, an industry context, and form a judgment about credit quality that held up over time. Their approval decisions defaulted less and pricing was more accurate. The bank wanted to scale that judgment without scaling the headcount.

    So, they brought in a team to build the system. The team spent months with the analysts. They documented the process, captured the rules and built a model that encoded everything the analysts had been able to articulate about how they made decisions.

    The system worked. It processed applications faster than any human could. It was consistent in ways humans can’t be. It never had a bad day, never got tired, never let the last application colour its view of the next one.

    On the cases that looked like the training data it performed well. On those that didn’t, it failed quietly at the margins, in the accumulation of small misjudgements that took years to show up in the loss data.

    What the team had built was not an expert system. It was a very sophisticated encoding of what the analysts could say about their work, not what they actually did. The gap between those two things is the subject of this article.

    The Translation Chain

    Moving human expertise into an agent is not a single act. It is a chain of translations, each one introducing the possibility of loss.

    The chain has four links.

    Tacit knowledge is where expertise actually lives — in the perceptual sensitivity, the contextual judgment, the feel for when rules apply and when they don’t that experts develop over years of practice.

    This knowledge is held in the body and the mind in ways that are not fully accessible to conscious reflection. It cannot be directly moved anywhere. It has to be surfaced first.

    Explicit knowledge is tacit knowledge made visible through elicitation. An expert’s judgment about a credit risk, drawn out through careful questioning and case analysis, translated into statements that can be written down and examined. This is the first translation, and it is where the most important losses occur. Everything that the elicitation process fails to surface stays tacit and stays out of the system.

    Structured knowledge is explicit knowledge organised into a form that a system can navigate. Rules, hierarchies, decision trees, ontologies — the architecture that gives knowledge a shape a machine can work with. This is the second translation, and it introduces a different kind of loss.

    Structure imposes boundaries. It decides what counts as a relevant category and what doesn’t. It captures the relationships the designer thought to encode and misses the ones they didn’t think of. Every structured representation is a simplification of the explicit knowledge it was built from.

    Encoded context is the structured knowledge loaded into an agent. It’s the prompts, the retrieval systems, the reasoning frameworks that shape how the agent uses what it knows. This is the third translation.

    Context determines not just what an agent knows but how it applies that knowledge.

    The same structured knowledge encoded into different contextual architectures produces very different agent behaviors.

    Each translation is a compression. Something is always lost. The discipline of knowledge engineering is largely the discipline of minimising those losses by being deliberate about what gets lost where, and building systems that compensate for the losses they cannot avoid.

    What Gets Lost and Where

    At the first translation — from tacit to explicit:

    The losses here are the ones Polanyi identified. The perceptual triggers the expert notices without knowing they’re noticing. The negative knowledge, everything they rule out instantly without deliberation. The contextual judgment that tells them when the standard approach doesn’t apply. The feel for what matters in this situation as distinct from situations that look similar on the surface.

    These losses are invisible by definition. You cannot see what the elicitation process failed to surface. The explicit knowledge you end up with feels complete and the gaps only become visible when the system fails on cases that an expert would have handled differently.

    At the second translation — from explicit to structured:

    The losses here are architectural. Every representation scheme makes choices about what kinds of knowledge it can hold and what kinds it cannot.

    Minimising these losses requires the full toolkit of elicitation methodology.

    • Protocol analysis to capture knowledge in action.
    • Critical incident technique to surface the edge cases where tacit knowledge is most visible.
    • Contrastive questioning to force the precision that description alone never produces.
    • Iteration — returning to the expert with specific failures and asking them to explain the gap.

    Every representation scheme makes choices about what kinds of knowledge it can hold and what kinds it cannot.

    Production rules (IF condition THEN action) are good at encoding procedural knowledge and clear causal relationships. They are poor at encoding the kind of holistic pattern recognition that characterises expert perception. An expert who looks at a loan application and forms an immediate gestalt impression of its quality is not running through a decision tree. Forcing that judgment into a rule structure loses the gestalt.

    Ontologies and semantic networks are good at encoding relationships between concepts and the hierarchical structure of a domain. They are poor at encoding the dynamic, context-sensitive weightings that experts apply.

    Decision trees are interpretable and easy to validate. They are poor at handling the interaction effects between variables that experienced analysts navigate intuitively.

    The choice of representation is a substantive design decision. Pick the wrong one and the knowledge you encoded correctly at the first translation becomes misrepresented at the second. The system is not working with the expert’s knowledge anymore. It is working with an approximation shaped by the limits of the representation.

    At the third translation: from structured to encoded context:

    This is the translation that the current era of AI development has made newly important and newly complex.

    In classical expert systems, the encoded context was the rule base itself. In modern agent architectures, the relationship between knowledge and reasoning is more layered.

    The agent has a base model with its own capabilities and biases. It has retrieval systems that determine what knowledge it accesses and when. It has prompting structures that shape how it frames problems and weighs considerations. The structured knowledge is one input among several, and how it interacts with the others determines what the agent actually does.

    This means that encoding knowledge correctly is necessary but not sufficient. The context has to be designed so that the agent actually retrieves the right things at the right moments, weights the expert’s judgment appropriately relative to its own base capabilities, applies the structured knowledge to the right kinds of cases and recognises when it is outside the boundaries of what the knowledge covers.

    This is a critical capability that separates expert agents from generic GPTs. And getting this wrong is the most common failure mode in current agent development.

    Without it, the knowledge is there. The agent isn’t reasoning with it.

    Difference between Knowledge and Reasoning

    An agent that has knowledge can retrieve relevant information when prompted. It can produce accurate answers to questions within its domain. It can follow the explicit rules it has been given.

    This is useful but it is not expertise.

    An agent that reasons with knowledge does something harder. It applies what it knows to novel situations such as cases that don’t match the training examples, problems that require combining knowledge from different parts of the domain, judgments that depend on understanding not just what the rules say but why they exist and when they stop applying.

    The difference is between having a lot of knowledge and having the experience to know when to apply it and how to reason with it.

    Most agent implementations are optimised for retrieval and rule-following and treat the second as an emergent property that will appear if you load in enough knowledge. It does not.

    Reasoning with knowledge requires that the knowledge be structured in a way that supports reasoning. This certainly harder because the causal relationships need to be encoded so the the agent has access to the underlying principles, not just the derived rules.

    Main Approaches for Knowledge Representation

    How you structure knowledge is one of the most consequential decisions in building an expert system.

    Each approach has a different theory of what knowledge is and how reasoning works.

    Production rules are the oldest and most widely used representation. Each rule encodes a condition and an action:

    IF the debt service coverage ratio is below 1.2 AND the borrower is in a cyclical industry THEN flag for senior review.

    Rules are interpretable too. You can read them and understand what the system will do. They are also composable. You can create complex behaviours by combining many simple rules.

    But such systems are also brittle. They handle the cases they were written for and failon cases they weren’t.

    Semantic networks and ontologies represent knowledge as a graph of concepts and relationships.

    • The concept LOAN is connected to BORROWER, COLLATERAL, INDUSTRY, RISK RATING.
    • The concept BORROWER is connected to FINANCIAL STATEMENTS, CREDIT HISTORY, MANAGEMENT QUALITY.

    The network encodes the structure of what entities exist and how they re related. Ontologies extend this by formalising the relationships and making them machine-interpretable.

    They are powerful for representing taxonomic knowledge and for enabling a system to reason about relationships between concepts. They are less effective at encoding the dynamic, procedural knowledge of how to actually do something.

    Decision trees represent knowledge as a sequence of branching decisions. At each node, a condition is tested and the path branches depending on the answer. The full tree encodes the logic of moving from a starting situation to a conclusion.

    Decision trees are highly interpretable. They are also rigid. The structure of the tree determines what distinctions can be made, and changing the structure requires rebuilding from the root.

    Case-based reasoning takes a different approach entirely. Rather than encoding general rules, it stores specific situations with their contexts, the decisions that were made, and the outcomes that resulted.

    When a new situation arises, the system retrieves the most similar past cases and reasons by analogy. This approach is particularly good at capturing the kind of contextual, experiential knowledge that rule systems miss. It performs well on situations that resemble past cases and poorly on genuinely novel ones.

    Each of these approaches captures something real about how expertise works. Each misses something.

    The most sophisticated systems combine multiple representations using rules for the procedural knowledge, ontologies for the domain structure, case bases for the experiential knowledge that resists generalisation.

    Building that combination well requires understanding the nature of the knowledge you are trying to encode.

    Why Context Is the Differentiator

    Two agents can have access to identical knowledge and produce very different outputs depending on how that knowledge is embedded in context.

    Context determines what the agent attends to. An agent whose context foregrounds the explicit rules of a domain will apply those rules consistently. An agent whose context includes the principles behind the rules, the history of cases where the rules failed, and explicit guidance on how to recognize when a situation is outside the rules’ intended scope will reason more like an expert.

    Context determines what the agent retrieves. In retrieval-augmented systems, the architecture of how knowledge is chunked, indexed, and retrieved shapes what the agent can access in any given moment. Knowledge that is not retrieved is knowledge the agent cannot use, regardless of whether it exists in the knowledge base.

    Context determines how the agent weighs uncertainty.

    Experts can tell the difference between a judgment they are confident in and one they are uncertain about, and they act differently in each case. An agent’s context needs to encode where the knowledge is solid, where it is incomplete, where the expert’s judgment would have been to escalate rather than decide.

    Getting context right is the final and often neglected step in the translation chain. Teams that invest heavily in knowledge acquisition and representation and then deploy carelessly into context are leaving most of the value on the table. The knowledge is there. The agent is not using it.

    Where Most Implementations Break Down

    The pattern of failure is consistent enough to be predictable.

    The first failure is stopping elicitation too early. The explicit account the expert gives in the first session feels complete. It is not. The team builds on the skeleton and discovers the gaps when the system fails on real cases.

    The second failure is choosing the wrong representation for the knowledge being encoded. Teams default to the representation they are most familiar with regardless of whether the knowledge they are encoding is rule-shaped. Pattern recognition forced into rules produces a system that is technically correct and perceptually wrong.

    The third failure is neglecting the context layer. The knowledge is encoded correctly but deployed into a contextual architecture that retrieves the wrong things, weights the explicit rules too heavily against base model judgment, or fails to signal when the system is operating outside the boundaries of its knowledge. The agent performs confidently in cases where it should be uncertain.

    The fourth failure is treating the translation chain as a one-time project rather than an ongoing process. Knowledge ages. Domains evolve. The credit analyst’s judgment from 2019 may not be the right judgment for 2024. Expert systems that are not maintained become expert systems that encode the expertise of the past and apply it to the present. The losses compound quietly until they become visible in outcomes.

    To build better agents we need to avoid these failures. The mindset shift is from treating agentic development as an IT Project to building a discipline: Plan for iteration, build validation into the process, treat gaps as the primary source of information about where the knowledge encoding needs to go next.

  • Elicitation: How to learn from experts

    Imagine that you’re sitting across from a senior underwriter at a large insurance company. The underwriter has thirty years of experience. She can look at a commercial risk and within minutes form a view on whether it’s a good risk or a bad one, what the right price is, and what conditions to attach. Her loss ratio is consistently better than her peers. The company would not want to lose what she knows.

    You’re brought in to capture that knowledge. To encode it into a system that can replicate her judgment at scale. You open your notebook and asks the obvious question:

    How do you decide whether a risk is good or bad?

    She thinks for a moment. Then she talks about financial strength, about management quality, about the physical condition of the assets, about claims history, about industry sector trends. She is articulate, thoughtful, and thorough. An hour later you have pages full of notes and a clear framework.

    You go back to your lab and build it into the system.

    The system is tested and performs reasonably well on straightforward cases. On the complex ones, where judgement is needed, it underperforms. It misses things she would have caught. It prices risks she would have declined. It declines risks she would have taken.

    You go back to her to figure out what the system for wrong. You ask her what she would have done differently.

    She looks at it for a moment. Then she says something that changes everything: I would never have written that risk in the first place. Something about it just feels wrong.

    You ask her what felt wrong but she cannot say.

    This is where naive elicitation ends. And where the discipline of knowledge elicitation begins.

    Why Asking Doesn’t Work

    The failure in that interview room is not a failure of effort. The problem is structural and it goes back to everything Michael Polanyi understood about the nature of expertise.

    When you ask an expert to explain what they know, you are asking them to do something that expertise is specifically designed not to do.

    Expertise is the compression of thousands of experiences into fast, automatic judgment.

    It works because it has moved below the level of conscious deliberation. The expert is not running through a checklist when they evaluate a risk. They are perceiving a situation and responding to it in just the same way a native speaker responds to a sentence without parsing its grammar.

    Asking them to articulate that process is like asking someone to explain how they ride a bicycle while they are riding it. The articulation interferes with the performance. And what comes out is a plausible, well-intentioned, genuinely believed account of how they think they decide, which is not the same thing as how they actually decide.

    This reconstruction has a specific shape. It tends to be more logical, more sequential, and more complete than the real process. It leaves out the perceptual triggers, things the expert notices that they don’t realize they’re noticing.

    It leaves out the negative knowledge: all the things they ruled out instantly without conscious deliberation.

    It leaves out the contextual judgment: the feel for when the standard approach doesn’t apply.

    What remains is the skeleton of expertise without the flesh. Building a system on that skeleton producesomething that works on textbook cases and fails on the ones that matter most.

    The discipline of elicitation exists because the direct approach consistently fails. The lesson: You cannot get at tacit knowledge by asking for it directly. You have to come at it sideways.

    The Toolkit

    Over decades of practice, knowledge engineers developed a set of techniques for surfacing what experts cannot easily volunteer.

    Each one approaches the problem from a different angle. Each one is designed to bypass the reconstructed account and get closer to what the expert actually does.

    Thinking Aloud — Protocol Analysis

    The most direct way to get at tacit knowledge is not to ask about it after the fact but to capture it in real time.

    In protocol analysis, the expert is given a real problem to solve and asked to narrate their thinking as they go. They’re not asked to explain their reasoning, but simply to say whatever is in their mind as they work through it. The knowledge engineer sits alongside and records everything.

    What comes out is nothing like the clean account you get in an interview.

    It is messier, more fragmented, more associative. The expert notices things they don’t explain. They hesitate in places they can’t account for. They reject options for reasons that turn out to be revealing. The noise in the protocol is often more informative than the signal because this where the tacit knowledge leaks through.

    The technique was developed by Herbert Simon and Allen Newell in the 1950s and 1960s as a method for studying problem solving.

    They were interested in the cognitive processes underlying human reasoning and found that verbal protocols gave them access to those processes in a way that no other method could. Knowledge engineering borrowed the technique because it works for the same reason. It captures knowledge in action rather than knowledge in reflection.

    The limitation is that not all expertise is verbal. Some experts go quiet when they are doing their best work. The thinking that produces the best judgment is sometimes the thinking that produces no words at all.

    Laddering — Getting Below the Surface

    Laddering is a technique borrowed from psychology where it was originally developed to understand personal values and how they connect to behavior.

    In a knowledge elicitation context it works like this: the expert gives an account of why they made a decision, and the knowledge engineer asks why that matters. The expert gives another reason, and the knowledge engineer asks why that matters. The conversation moves down through layers of reasoning until it reaches a foundational belief or a principle that the expert holds but has rarely been asked to articulate.

    The value of laddering is that it surfaces the causal structure underneath explicit reasoning.

    Experts can usually tell you what they did. They can often tell you why in immediate terms. What they rarely surface unprompted is the deeper structure of beliefs and judgments that makes their reasoning work the way it does. Laddering pulls that structure into the open.

    The technique requires patience and a degree of persistence that can feel uncomfortable. Asking why repeatedly can seem like you are questioning the expert’s judgment rather than trying to understand it.

    Repertory Grid — Making Implicit Distinctions Explicit

    One of the most powerful things experts do is make distinctions that novices can’t see. The senior underwriter doesn’t just evaluate risks she also categorizes them in ways that carry implicit judgments about quality, reliability, and probability of loss.

    Those categories are often tacit. She uses them fluently without being able to name them.

    Repertory grid technique, developed by the psychologist George Kelly in the 1950s, is designed to surface exactly these implicit distinctions. The process works by presenting the expert with sets of three things (three risks, three clients, three cases)and asking them to identify how two of the three are similar to each other and different from the third.

    The expert names the dimension of difference. Then the knowledge engineer asks them to rate all the items in their domain on that dimension.

    What emerges, across many rounds of this exercise, is a map of the expert’s implicit categorisation system. The dimensions along which they actually organise their domain. The grid makes visible a structure of judgment that exists in the expert’s mind but has never been externalised.

    The technique is particularly useful when the knowledge engineer suspects that the expert’s explicit account of how they decide doesn’t match how they actually decide. The grid bypasses the reconstruction and gets at the actual cognitive structure underneath.

    Critical Incident Technique

    Abstract questions produce abstract answers. Concrete questions produce concrete knowledge.

    The critical incident technique, developed by the psychologist John Flanagan in the 1950s, works by asking experts not to describe their general approach but to recall specific cases where they made a consequential decision, particularly ones where things went well or badly in ways that were not predictable from the standard approach.

    A critical incident interview sounds like this: Tell me about a time when you looked at a risk and knew immediately it was wrong but couldn’t have explained why at the time. What did you eventually figure out? What were you noticing that you didn’t know you were noticing?

    What the technique exploits is the difference between episodic memory and semantic memory.

    Asking experts to describe their general knowledge activates semantic memory which is where the reconstructed, idealised account lives.

    Asking them to recall a specific incident activates episodic memory which is closer to the actual experience, with all its texture and detail intact.

    The incidents that are most valuable are often the ones where the standard approach failed, where the expert’s judgment was later proven right for reasons they didn’t fully understand at the time, cases where they made a mistake and figured out why.

    These edge cases are where the tacit knowledge is most visible, because they are the cases where the expert had to work harder than usual to know what to do.

    Contrastive Questioning

    When you ask an expert to describe what they do, they give you an account of the general case. When you ask them to explain the difference between two specific situations they are forced into a level of precision that general description never reaches.

    Contrastive questioning works because comparison activates a different cognitive process than description. To explain a difference, the expert has to identify the specific features that drove the distinction. Those features are often things they noticed without realising they were noticing them.

    The technique is most powerful when the two cases being compared are superficially similar but produced different judgments.

    The underwriter who approved one manufacturing risk and declined another that looks almost identical on paper is sitting on a piece of tacit knowledge that a contrastive question can help surface.

    What Good Elicitation Looks Like

    All techniques are tools. What makes elicitation work is the way you uses them and the relationship you build with the expert in order to use them well.

    Good elicitation looks like a genuine collaboration.

    Good elicitation is not information extraction. You need to work alongside an expert to surface something that they can’t see clearly alone.

    The expert has the knowledge. As a knowledge engineer you have the methods to make it visible. The work requires both.

    This means the knowledge engineer has to earn the expert’s trust. Experts are often sceptical of the process and not unreasonably. They have been asked before to explain themselves and found the process frustrating or reductive. They are protective of their judgment and wary of having it misrepresented.

    The knowledge engineer who comes in with a clipboard and a fixed agenda will get the reconstructed account. The one who comes with genuine curiosity and patience will get something closer to the truth.

    Good elicitation is also iterative. No single session surfaces everything. Knowledge engineers build a model of what they think the expert knows, tests it against cases, finds where it fails, and goes back to the expert with specific questions about the gaps. The process cycles between between drawing out knowledge and checking whether what has been drawn out is actually what the expert does.

    The Knowledge Engineer as Interviewer

    Knowledge engineers need enough domain knowledge to follow what the expert is saying.

    This is an essential ingredient to being an effective interviewer. This will help you know when an account is incomplete, to recogniwe the gaps, to ask the follow-up question that opens the right door.

    Be mindful not to fill in the gaps yourself and import your own understanding where the expert’s actual knowledge should go.

    Beyond sufficient technical skills you need the ability to listen carefully, to sit with silence, to ask the question that hasn’t been asked yet.

    Above all you need patience for a process that moves slowly and produces results that are often ambiguous.

    In essence, you need a whole lot of intellectual humility to understand how the expert understands the domain. Your own intuitions about how things work are a liability as much as an asset.

    The best knowledge engineers are the ones who can hold their own understanding lightly enough to see what the expert is actually doing rather than what they expect the expert to be doing.

  • The Tacit Knowledge Problem

    In the 1970s, a team of researchers set out to build a computer system that could teach surgery.

    The idea was straightforward. Find the best surgeons in the world. Record everything they did and turn it into a training program that could transmit world-class surgical skill to the next generation of doctors.

    They found the surgeons. They recorded everything. And then they ran into a problem that nobody had anticipated.

    The best surgeons couldn’t explain what made them good.

    When they sat down and tried to articulate what they were doing and why, the accounts they gave were incomplete. They described the mechanics. But they couldn’t describe was the accumulated judgment of ten thousand procedures compressed into instinct that had become invisible even to themselves.

    The researchers had set out to capture expertise. What they discovered instead was that expertise, at its highest levels, resists capture.

    That discovery has a name. It is one of the most important and most underappreciated ideas in the history of human knowledge. And understanding it is the only way to understand why building systems that perform at expert level is so much harder than it looks.

    The Philosopher Who Saw It First

    Michael Polanyi was not the kind of person you would expect to reshape the field of artificial intelligence. He was a Hungarian-born chemist who had fled Nazi Germany in the 1930s, eventually landing at the University of Manchester where he spent the second half of his career not doing chemistry but thinking about what scientific knowledge actually is, how it develops, and how it moves from one generation of scientists to the next.

    By the 1950s Polanyi had become increasingly troubled by a dominant assumption in Western philosophy of knowledge, the idea that genuine knowledge is knowledge that can be made fully explicit.

    That if you truly understand something you should be able to state it clearly, defend it logically, and transmit it to anyone willing to pay attention. Knowledge, in this view, is essentially propositional. It lives in sentences. It can be written down.

    Polanyi thought this was fundamentally wrong.

    And he spent the better part of two decades building the argument against it.

    His most concentrated statement of that argument came in 1966 in a slim book called The Tacit Dimension. The book opens with a single sentence that contains the entire problem: “We can know more than we can tell.”

    It sounds simple. It is not.

    What Polanyi Actually Meant

    Polanyi’s argument begins with perception, the most basic act of knowing something.

    Consider how you recognise a face. You can look at a photograph of someone you know and identify them instantly, across years, across changes in weight and hair and age.

    You are doing something genuinely sophisticated – processing a complex pattern and matching it against memory with remarkable reliability. But you wouldn’t be able to explain ( which features you used, how you weighted them, what the decision rule was) exactly how you did it to someone. Not because the knowledge isn’t there.

    Because the knowledge doesn’t exist in a form that can be stated.

    Polanyi called this tacit knowledge or knowledge that we hold and use reliably but cannot fully articulate. He distinguished it from explicit knowledge i.e. knowledge that can be stated, written down, and transmitted through language and instruction.

    The distinction sounds straightforward but its implications are radical. Because Polanyi’s claim was not just that some knowledge happens to be tacit. His claim was that tacit knowledge is foundational.

    He believed that all explicit knowledge rests on a substrate of tacit knowledge that can never be fully surfaced. You cannot make everything explicit because the act of making something explicit always relies on tacit capacities that are doing the work underneath.

    He illustrated this with what he called the subsidiary-focal distinction. When you use a hammer, your attention is focused on the nail. The hammer itself (its weight, its balance, the feel of it in your hand) is present to you, but subsidiarily – you are not focusing on it. You are focusing through it. If you shift your attention to the hammer itself, you lose your grip on the task. The tacit knowledge that makes you competent with the tool only functions when it stays tacit.

    This is why expertise is so hard to teach and so hard to transfer.

    The expert is not withholding anything. They are focusing through their knowledge, not on it. Asking them to articulate it is like asking them to stare at the hammer instead of the nail. The act of articulation disrupts the very thing you are trying to capture.

    The Iceberg

    The most useful way to think about expertise is as an iceberg.

    Above the surface sits explicit knowledge, everything that can be stated, taught, written down, encoded in manuals and training programs and textbooks. This is the knowledge that moves easily. You can put it in a document and send it across the world. It survives the death of the person who held it. It can be transmitted to ten people as easily as to one.

    Below the surface sits tacit knowledge. It’s vastly larger, and entirely invisible from above. This is the knowledge that makes the difference between someone who knows the rules and someone who can actually perform.

    It includes:

    Perceptual knowledge: the ability to notice what matters. The experienced radiologist who sees something in a scan that a resident misses. The fund manager who reads a room full of executives and knows within minutes whether the business is actually healthy. They are perceiving things that are genuinely there, but their perception has been trained by years of experience into a sensitivity that cannot be directly transmitted.

    Procedural knowledge: knowing how, as distinct from knowing that. You can read every book ever written about riding a bicycle without being able to ride one. The knowledge of how to ride lives in the body, in the calibration of balance and response that only practice builds. Professional skills work the same way. The senior copywriter who reads a brief and knows immediately what angle to take is not applying a rule. They are drawing on something built from thousands of briefs processed over years.

    Contextual judgment: knowing when the rules apply and when they don’t. This is perhaps the most valuable and most elusive dimension of expertise. Textbooks describe how things work in general. Experts know how they work in this situation, with these constraints, given what happened last time. That situational sensitivity is almost impossible to encode because it is not a rule at all.

    The knowledge of what to ignore is perhaps the least discussed but most practically important. Experts are not just better at processing relevant information. They are better at filtering out irrelevant information. They have learned, through experience, what doesn’t matter. That negative knowledge is as hard to transfer as the positive kind.

    When Tacit Knowledge Is Lost

    The organisational implications of tacit knowledge loss are severe and largely invisible until it is too late.

    When an expert leaves an organisation what walks out the door is not just the explicit knowledge they held. That part, if the organisation was reasonably diligent, has probably been documented somewhere. What walks out the door is everything below the surface. The perceptual sensitivity built over decades. The contextual judgment that knew when the documented process didn’t apply. The feel for what mattered and what didn’t.

    This loss is structurally invisible because explicit knowledge is easy to see and tacit knowledge is not.

    Organisations inventory what they can measure. They document processes, capture decisions, build knowledge bases. And then they are surprised when the person who wrote the process document leaves and everything quietly starts going wrong gradually, in the accumulation of small decisions that the documentation doesn’t cover and the new person doesn’t know how to make.

    NASA experienced this in one of its most documented forms. After the Apollo program ended in the early 1970s, the organisation went through waves of restructuring and downsizing.

    When NASA began planning a return to the moon decades later, it discovered that significant tacit knowledge about how to build certain components had simply ceased to exist within the organisation. The documentation was there but embodied, practiced, judgment-laden knowledge was not.

    This pattern repeats across industries and new graduates, however well trained, cannot replicate what the experienced staff did without being able to say why.

    Why This Problem Is Acute Now

    For most of the history of organisations, tacit knowledge loss was a serious but manageable problem. It was addressed, imperfectly, through apprenticeship and practice rather than instruction.

    The medieval guild system was essentially a tacit knowledge transfer mechanism. So is the residency system in medicine or the partnership track in professional services. You spend years watching someone who knows what they’re doing, and eventually some of what they know moves into you.

    Apprenticeship is slow and expensive. But it works, because tacit knowledge can be transferred through observation, practice under guidance, and through the accumulated experience of being in the room when an expert makes a hundred decisions and slowly developing a feel for why.

    The agentic era has broken this in a specific and important way.

    The promise of AI agents is that you can encode expert-level performance into a system and deploy it faster, cheaper, and more consistently than any human expert. The appeal is obvious. The problem is that the entire premise depends on being able to get the expertise into the system in the first place.

    The real question is transferring expertise that is mostly tacit.

    This means that most agent implementations are not actually encoding expertise. They are encoding the explicit layer – the documented processes, the stated rules, the guidebook version of how things work.

    Most organisations have deployed that explicit layer at scale and called it an expert system.

    What they have actually built is a very fast, very consistent, very scalable average.

    It performs well on the cases that the explicit rules cover. It fails, sometimes catastrophically, on the cases that require the judgment, the contextual sensitivity, the feel for when the rules don’t apply that lives below the surface of what any expert can easily say.

    The gap between a competent agent and an expert-level agent is almost entirely a tacit knowledge gap.

    It is not a technology or a model problem. It is the same problem the surgical researchers hit in the 1970s, the same problem Feigenbaum hit sitting with chemists in the 1960s, the same problem Polanyi was describing in 1966.

    We can know more than we can tell. And until you have a method for surfacing what can’t easily be told, you are building on the visible part of the iceberg and wondering why the system keeps running into things it didn’t see coming.

    What This Means in Practice

    Polanyi’s insight makes this problem legible. And this is where every solution begins.

    If tacit knowledge cannot be extracted through direct questioning, it can be approached through other means. Through observation rather than interview. Through cases rather than principles. Through contrast rather than description. Through the careful, patient work of watching experts perform and finding ways to surface the knowledge they are focusing through rather than on.

    That work has a name and a methodology.

    It is the discipline of elicitation and it is where the practical response to the tacit knowledge problem lives.

    But before elicitation can work, you have to understand what you are trying to elicit. You have to know that the knowledge you need is not sitting on the surface waiting to be asked for.

    You have to know that the iceberg is mostly underwater, and that the part you can see is not representative of the part you can’t.

    That understanding is what Polanyi gave us. And it is why, sixty years after he wrote it, his single sentence still contains everything you need to know about why this problem is hard.

    We can know more than we can tell.

  • Knowledge Engineering

    Picture a group of scientists in the summer of 1956. Not just any scientists, these are the sharpest minds of their generation, the kind of people who genuinely believe they can do what nobody has done before. They’ve gathered at Dartmouth College in New Hampshire with a bold idea and a proposal to match.

    The proposal was written by a mathematician named John McCarthy who was the kind of person who believed that if you couldn’t solve a problem it was because you hadn’t thought about it carefully enough.

    The proposal said they could figure out how to make machines think. Not someday. This summer.

    Now, nobody in that room actually believed it would take just two months. But they did believe it was possible.

    Human reasoning, they argued, was essentially a formal system. A set of rules. And if you could identify the rules, you could replicate the reasoning in a machine.

    The summer came and went. They didn’t crack it. Most participants drifted in and out — only McCarthy, Minsky, and a mathematician named Ray Solomonoff stayed for the full eight weeks.

    They argued about approaches, worked mostly on their own ideas, and eventually packed up and went home.

    But something did happen that summer that none of them had planned for. For the first time, all the people working on this problem had been in the same room.

    McCarthy had insisted on calling the field artificial intelligence. Claude Shannon thought the name was too dramatic. He lost that argument. The name stuck.

    They went back to their universities. They kept working. But now they were working on the same named thing, and they knew who else was in the game.

    The summer project failed. The idea it launched and the community that formed around it turned out to be worth more than any result they could have produced that summer.

    The Man Who Asked a Different Question

    Fast forward about a decade. The AI field is now a real institution. DARPA is writing serious checks. The Stanford AI Lab, MIT, Carnegie Mellon are the places to be if you’re brilliant and ambitious and want to work on the hardest problems in science.

    Graduate students are being recruited from the best programs in the country. The culture is intense, competitive, and deeply optimistic. Nobody has built the thinking machine yet, but the prevailing mood is that it’s only a matter of time.

    Into this world arrives Edward Feigenbaum.

    Feigenbaum had studied under Herbert Simon at Carnegie Mellon. Simon is one of the founding figures of AI, a man who believed human decision-making could be modeled as a computational process and had the Nobel Prize to show for a career built on that belief. Feigenbaum absorbed that conviction but added a practical instinct that would eventually set him apart from his peers.

    He thought the field was aiming at the wrong target.

    Building general intelligence (a machine that could think about anything) was too hard and too vague.

    The better move was to go narrow and go deep. Pick a specific domain, find the world’s best experts in it, and build a system that could perform at their level. Prove it works. Then do it again in another domain.

    It sounds obvious now but at the time it was a minority view in a field riding high on the dream of general intelligence. Feigenbaum didn’t care. He was more interested in what actually worked than in what was theoretically elegant.

    He got his first real chance to test the idea when a Nobel Prize-winning geneticist named Joshua Lederberg knocked on his door.

    The Problem in the Lab

    Lederberg had a practical problem. His lab was generating mass spectrometry data faster than his chemists could interpret it.

    A mass spectrometer fires a beam of electrons at a molecule, breaks it apart, and produces a kind of fingerprint that an expert chemist can read to figure out what the original molecule was. It’s painstaking work that requires deep expertise. Lederberg wanted to know if a machine could do it.

    Feigenbaum said yes. And together they began building DENDRAL: the first serious attempt to encode expert-level knowledge into a computer system.

    Chemistry has known rules. Molecules behave according to principles that can be written down. Feigenbaum and his team encoded those principles and the system started producing reasonable results.

    The early work was promising.

    DENDRAL could handle straightforward cases when the molecules behaved exactly the way the textbooks said they would. But chemistry in practice is messier than chemistry in textbooks.

    When they pushed the system into more complex cases it started producing answers that were technically consistent with the rules but wrong. And no amount of refining the existing rules fixed it.

    Something was missing. The knowledge in the published literature wasn’t the whole picture. Expert chemists were doing something beyond what had ever been written down. The only way forward was to go and ask them.

    The Wall

    The chemists could solve the problems. They were reliable, accurate, and fast. You could test them and they’d get it right. But when Feigenbaum and his team asked them to explain exactly how they couldn’t really give a clear answer.

    The accounts they gave were good. They walked through cases, described what they were looking for, traced their reasoning step by step.

    But they were incomplete in ways nobody in the room could see at the time. They skipped steps they didn’t realise they were skipping. They applied judgment they couldn’t fully articulate. They knew things they could not say.

    Feigenbaum had stumbled onto something that would reshape his entire understanding of the problem. The barrier to building an expert system wasn’t computational. The inference engine, the search algorithms, the formal logic wasn’t the hard part. The hard part was getting the knowledge in.

    He called it the knowledge acquisition bottleneck. And naming this challenge reframed the entire question.

    It was no longer just ‘can machines reason?’ It was something deeper and harder:

    • What is knowledge?
    • How do humans actually hold it, and
    • How on earth do you move it from a human mind into a machine?

    That question would take decades to answer. It still hasn’t been fully answered. But asking it clearly was the beginning of a new discipline.

    The Doctor in the Room

    A few years later, a medical student named Edward Shortliffe sat down to do something similar in medicine.

    MYCIN, the system Shortliffe built under Feigenbaum’s supervision in the early 1970s, was designed to diagnose bacterial blood infections and recommend antibiotic treatments.

    The stakes were as high as they get. The wrong antibiotic at the wrong dose can kill a patient. So MYCIN needed to reason at the level of a specialist in infectious disease.

    Shortliffe spent years in sessions with those specialists. They were cooperative, intelligent, and genuinely trying to help. They would walk him through cases, explain their reasoning, describe what they were looking for. And the accounts they gave were good enough to build a working system.

    But Shortliffe noticed the same gaps.

    It wasn’t just that the doctors struggled to explain their reasoning. It was that large parts of their reasoning were no longer available to them as conscious thought.

    Years of seeing thousands of patients had compressed their knowledge into something faster and more automatic than deliberate thinking. They didn’t work through a diagnosis, so much so that they recognised it. The way an experienced driver doesn’t think about changing gears. The way a native speaker doesn’t think about grammar.

    This is what made the problem structural rather than just difficult. You couldn’t solve it by asking better questions or running longer sessions.

    The knowledge had been absorbed so completely into expert intuition that it no longer existed as something that could be directly retrieved and handed over.

    MYCIN eventually encoded the knowledge of those specialists into roughly 600 production rules — IF this condition, THEN this action, with an attached certainty factor. When it was tested against specialist physicians in controlled evaluations, it performed at or above their level.

    A machine, reasoning from encoded rules, was diagnosing infections as well as doctors.

    It was never deployed clinically. Not because it failed, but because nobody could agree on who was responsible if it got something wrong. The liability questions proved harder than the technical ones. But the proof of concept was undeniable. The question now was how to do it reliably, at scale, across domains.

    What Is Knowledge Engineering?

    Knowledge Engineering is the practice of getting expertise out of human minds and into a form that machines can use.

    The knowledge engineer job is to surface what doctors, chemists, financial analysts, and other experts know, both what they’re consciously and unconsciously aware of into a structure a system can reason with.

    Knowledge Engineering works moves through four stages.

    Stage 1: The first is knowledge acquisition: the elicitation sessions, the interviews, the case walkthroughs. This is the hardest stage and the most underestimated.

    Naïve approaches fail consistently. If you just ask experts what they know, you get the idealized version. So knowledge engineers use methods designed to surface what experts do rather than what they say they do.

    One approach is to watch them work. Sit alongside an expert as they solve a real problem and ask them to think out loud as they go. What comes out is messier and more revealing than any interview.

    Another is to work through specific past situations where the expert made a consequential decision. Walking through a real case unlocks detail that abstract questioning never reaches. The expert remembers what they noticed, what they ignored, what made this situation different from the last one.

    A third is contrastive questioning: instead of asking an expert to describe what they do, ask them to explain the difference between two situations. Why did you treat these two cases differently? What did you see in one that you didn’t see in the other? Comparison forces precision in a way that open description rarely does.

    None of these methods extract knowledge perfectly. But they get closer to what experts actually do than asking them to explain themselves ever could.

    The second is knowledge representation: translating what you’ve acquired into a formal structure. Production rules. Semantic networks, decision trees, ontologies, are some examples.

    The third is knowledge validation: testing whether the encoded knowledge actually performs correctly across cases but the easy ones and the edge cases. A system that works on textbook cases and fails on real ones is not an expert system. It is a simulation of one.

    The fourth is knowledge maintenance: managing the knowledge base as a living thing over time. Domains evolve. Expertise updates.

    The Philosopher Nobody Expected

    The person who gave all of this its deepest intellectual foundation wasn’t a computer scientist. He was a Hungarian-British chemist turned philosopher named Michael Polanyi who had been thinking about the nature of expertise since the 1950s.

    In a short book called The Tacit Dimension, published in 1966, Polanyi wrote a sentence that the AI community would eventually find its way to: “We can know more than we can tell.”

    Polanyi’s argument was that expertise is not a large inventory of facts and rules that an expert has accumulated and could, in principle, recite. It is something more integrated than that.

    Years of experience get processed into pattern recognition, intuition, and judgment that operates faster than conscious reflection and is only partially accessible to it. The expert chemist, from the earlier example, when reading a spectrogram is not running through a checklist, they are doing something that feels perceptual and immediate and that immediacy is precisely what makes it so hard to transfer.

    Polanyi never wrote about AI. He was thinking about scientific knowledge and how it gets transmitted between generations of researchers. But when the knowledge engineering community discovered his work, it landed like a precise diagnosis of exactly what they were dealing with.

    That insight didn’t make the problem easier. But it made it legible. And legibility is where every solution begins.

    The Boom, the Collapse, and What Survived

    By the mid-1980s, knowledge engineering had become an industry. XCON was saving Digital Equipment Corporation an estimated $40 million a year. Hundreds of companies were building expert systems. The commercial application of AI, after decades of promise, seemed to have genuinely arrived.

    Then the systems started breaking.

    Early expert systems were brittle. They performed well within the boundaries of their encoded knowledge and failed unpredictably outside them. Maintaining large rule bases as domains evolved was expensive and error-prone. The knowledge that was accurate in 1982 was wrong or incomplete by 1988.

    XCON’s the expert system Digital Equipment Corporation built to configure its VAX computers, eventually grew to over 10,000 rules. Keeping it current required a dedicated team of knowledge engineers working continuously. That ongoing cost was one of the reasons large expert systems eventually became unsustainable.

    The AI Winter arrived. Funding contracted. The expert systems industry collapsed.

    The field responded by turning toward machine learning. Using statistical approaches that could learn patterns from data rather than requiring knowledge to be extracted and encoded by hand. It was an appealing solution to the bottleneck problem. Instead of laboriously eliciting knowledge from experts, you trained systems on large datasets and let them figure it out.

    What was gained in scalability was lost in other ways.

    The key difference: Statistical systems learn patterns, not principles. They can’t explain their reasoning. They generalise within the distribution of their training data and fail outside it. They don’t encode the kind of structured, causal understanding that experts use when they encounter something genuinely new.

    For a while Knowledge engineering fell out of fashion but the questions it was built to answer remained alive.

    Overtime, it evolved into ontology engineering, into organisational learning theory, into research on how experts actually make decisions under uncertainty.

    Why You’re Reading This Now

    Large language models have changed what’s possible. They have absorbed an extraordinary volume of human knowledge and can perform across a remarkable range of domains with genuine fluency. The optimism in the AI field today echoes the optimism of that summer at Dartmouth.

    But the knowledge acquisition bottleneck has not been solved by scale. It has been displaced.

    What LLMs cannot do, without deliberate intervention, is perform at expert level in specific organisational contexts. They have breadth without depth. They produce plausible output, which is not the same thing as correct output. They have patterns without the underlying causal understanding that experts use to reason about situations they’ve never seen before.

    Anyone building a system that needs to perform at expert level is facing the same problem Feigenbaum faced sitting with those chemists in the 1960s.

    How do you move what an expert actually knows into a system that can use it?

    That problem doesn’t yield to better prompts or bigger models. It yields to method. The method has a name. It is sixty years old and more relevant now than it has ever been.

    That’s what knowledge engineering is. And that’s why it’s where this story starts.

  • There is a moment in the history of every major technology shift when access stops being the advantage

    It happened with the internet. In the early years, having a website was enough. Then everyone had one. The advantage moved — to the people who understood how to use it. How to design experiences, build communities, generate demand.

    The technology became table stakes. The thinking became the differentiator.

    We are at that moment with Agentic AI.

    According to McKinsey’s State of AI 2025 report — 88% of McKinsey’s State of AI 2025 report found that 88% of enterprises now regularly use AI in at least one business function. The tools are everywhere. The budgets are being spent. The mandates are coming down from the top.

    And yet only 6% of those organisations are achieving meaningful, measurable bottom-line impact from their AI investments.

    Read that again. 88% have access. 6% are getting results.

    That gap is not a technology problem. It is not a training problem. And it is not an adoption problem — though that is what most organisations have been treating it as for the past three years.

    The problem is the strategy.

    And here is the part that should give marketing leaders pause: the people best positioned to fix it are already in the room. They just haven’t been given the right question yet.

    Why the diagnosis that keeps failing

    The most common response to AI underperformance in marketing functions has been more training. More courses, more workshops, more internal communications about the importance of AI.

    It is a rational response. It is just the wrong one.

    The 2024 State of Marketing AI Report found that 67% of respondents identified lack of education and training as the top barrier to AI adoption. That finding has been remarkably consistent across years of similar research.

    So has the result: marginal improvement, at best.

    Here is what the persistence of that data actually tells us. When the same diagnosis produces the same response and the same result, year after year, the problem is not execution. It is the strategy.

    Training teaches people how to use a tool. But AI is unlike any tool we have had before.

    Every tool that came before it was passive. A spreadsheet does what you tell it. A CRM stores what you give it. A campaign platform executes what you configure. None of them think. None of them react. None of them make decisions.

    You can teach AI Agents to reason, interpret context, weigh inputs, and produce judgments. You can give it a goal and let it find the path.

    Which means deploying AI without deciding what it should think about, react to, and decide on is not just inefficient. It is like hiring a highly capable person and never telling them what their job is.

    That question has a sixty-year history worth understanding.

    Training does not answer that question.

    It does not decide which tasks in the marketing function should be handled by an agent and which require human judgment. It does not design the handoffs, define the outputs, or build the infrastructure for autonomous execution.

    Training only fills a knowledge gap.

    Most organisations have made a category error with AI.

    They treat AI Agents as another software.

    Software is a tool. You buy it, licence it, train people to use it, measure utilisation. The value is in the features. The ROI is in the efficiency gains. It has an implementation timeline and a renewal date.

    Agentic AI is not that.

    Agentic AI — the capability now scaling rapidly across enterprise organisations — does not sit in a stack. It changes what the stack is for.

    Agents can own a process, not just assist with a task.

    They can perceive their environment, set goals, make decisions, take actions, and learn from the results. Continuously. Without human intervention at every step.

    Gartner projects that by 2028, 33% of all enterprise software applications will include agentic AI — up from less than 1% in 2024. That is a 33-fold increase in four years.

    The organisations treating AI as a subscription cost are not just missing the opportunity. They are accumulating a design debt that will compound against them as this capability scales.

    What actually separates the winners

    Let’s go back to the 88% versus 6% gap, because it is worth sitting with.

    McKinsey’s research does not just show the gap. It identifies the single factor that most distinguishes organisations achieving meaningful impact from those that are not.

    Out of 25 attributes tested across organisations of all sizes, one factor has the strongest contribution to EBIT impact from AI.

    The redesign of workflows.

    AI high performers are 2.8 times more likely than their peers to have fundamentally redesigned their workflows around AI.

    That is a design advantage, not a technology advantage.

    Meanwhile, 42% of companies abandoned most of their AI initiatives in 2025. The average organisation scrapped 46% of its AI proof-of-concepts before reaching production. Only 26% have demonstrated the capability to move from pilot to production at all.

    The pilot worked. The design was not there to scale it. The tool performed. The workflow was not built to use it. The training happened. The task ownership was never defined.

    The problem runs deeper than task ownership.

    The silo is not a culture problem

    One of the most common patterns in marketing functions that have been actively investing in AI is fragmentation. Individual teams build their own approaches. Content develops one workflow. Demand generation develops another. Brand does something else entirely.

    A 2025 survey found that 71% of executives report AI applications being created in silos — and 68% report that fragmentation is creating active tension between teams.

    The temptation is to diagnose this as a collaboration problem. It is not.

    When there is no central decision about what AI should own across a function, every team makes its own decision. The silo is not the failure. The silo is the symptom.

    Fix the design, and the silo resolves itself. Not because people suddenly start collaborating better — but because the structure of the work makes alignment natural.

    Adobe’s 2026 Digital Trends Report adds a further dimension. The top challenge causing AI misalignment is not resistance to change, insufficient tools, or lack of budget. It is executive misunderstanding of AI — cited by 61% of respondents.

    In that vacuum, teams improvise. They experiment. They build workflows that do not connect, do not scale, and do not produce the compound value that systematic design would create.

    Why marketing leaders are the right people to lead this

    This is not a technology problem. It is an organisational design problem.

    And marketing leaders — not IT departments, not AI specialists, not consultants — are the right people to solve it.

    Here is why.

    Marketing sits at the intersection of data, customer behaviour, creativity, and commercial outcomes. It is the function that touches the customer most directly, generates the most varied and continuous data, and produces work that spans the full range — from highly repetitive to deeply creative.

    That complexity is what makes marketing hard to design. It is also what makes it the most important function to design well.

    Marketing leaders also bring something that technology teams do not. A deep, intuitive understanding of what human judgment in marketing actually looks like. They know which decisions require a strategist and which do not. They know the difference between a brief that needs a creative director and one that needs a system. They know what good looks like at every stage of the process.

    That knowledge is exactly what you need to design the division of labour correctly.

    Until now, marketing leaders have never had to make that knowledge explicit in this way. Job descriptions, briefs, OKRs — these are all forms of design. But none of them required leaders to specify what a thinking system should own, where its authority ends, and what it should hand back to a human.

    That is a new kind of decision. And it is one that marketing leaders are better equipped to make than anyone else in the organisation.

    The gap is compounding

    The Salesforce State of Marketing 2026 report found that 75% of marketers have adopted AI, yet most are still using it to deliver generic, one-way campaigns. Every marketer has access to the same AI models. What separates the winners is not the tools. It is the design decisions behind how those tools are used.

    The gap between organisations that are designing deliberately and organisations that are not is not a fixed distance. It is a growing one.

    That is the marketing renaissance.

    Not a new tool. Not a new platform. Not a new training programme. A fundamentally different way of thinking about how a marketing function is built — one that starts with the question of what agents should own and what humans should own, and designs everything else from there.

    When agents take over the work that is repetitive, processable, and execution-driven, human capacity does not disappear. It becomes available for the work that only humans can do. Strategic judgment. Creative direction. Stakeholder relationships. Decisions that require context, experience, and accountability.

    The organisations that get this right will not just outperform their competitors. They will build marketing functions that are more resilient, more adaptive, and more satisfying to work in.

    Where to begin

    The scale of this challenge can feel paralysing. Fifty tasks to redesign. Multiple teams to align. A function to rebuild while still running.

    Pick one team. List every recurring task they perform. Apply three questions to each one:

    • Is it repetitive and rule-based?
    • Does it require human judgment that cannot be specified in advance?
    • What is the cost of getting it wrong?

    You will find tasks that shouldn’t exist. You will find tasks that agents could own today. You will find decisions that have been made by habit for years that nobody has ever examined. And you will begin to see the shape of a marketing function that doesn’t just use AI — but is designed around it.

    That is where Renaissance begins. Not with the tools. With the thinking.

    Hi! This site is a digital garden — a growing body of thinking about marketing, AI, and organisational design. Posts connect to each other through ideas, not sequence. Start anywhere. Follow what pulls you.

    Choose your adventure

    Adriaan de Groot Agents Contextual Judgement Domain Expertise Edward Feigenbaum Edward Shortliffe Herbert Simon John McCarthy Joshua Lederberg Knowledge Acquisition Knowledge Elicitation Learning Culture Methods & Tools Michael Polanyi Nikolas Luhmann Perceptual Knowledge Procedural Knowledge Ray Solomonoff Tacit Knowledge Zettelkasten