Picture a group of scientists in the summer of 1956. Not just any scientists, these are the sharpest minds of their generation, the kind of people who genuinely believe they can do what nobody has done before. They’ve gathered at Dartmouth College in New Hampshire with a bold idea and a proposal to match.
The proposal was written by a mathematician named John McCarthy who was the kind of person who believed that if you couldn’t solve a problem it was because you hadn’t thought about it carefully enough.
The proposal said they could figure out how to make machines think. Not someday. This summer.
Now, nobody in that room actually believed it would take just two months. But they did believe it was possible.
Human reasoning, they argued, was essentially a formal system. A set of rules. And if you could identify the rules, you could replicate the reasoning in a machine.
The summer came and went. They didn’t crack it. Most participants drifted in and out — only McCarthy, Minsky, and a mathematician named Ray Solomonoff stayed for the full eight weeks.
They argued about approaches, worked mostly on their own ideas, and eventually packed up and went home.
But something did happen that summer that none of them had planned for. For the first time, all the people working on this problem had been in the same room.
McCarthy had insisted on calling the field artificial intelligence. Claude Shannon thought the name was too dramatic. He lost that argument. The name stuck.
They went back to their universities. They kept working. But now they were working on the same named thing, and they knew who else was in the game.
The summer project failed. The idea it launched and the community that formed around it turned out to be worth more than any result they could have produced that summer.
The Man Who Asked a Different Question
Fast forward about a decade. The AI field is now a real institution. DARPA is writing serious checks. The Stanford AI Lab, MIT, Carnegie Mellon are the places to be if you’re brilliant and ambitious and want to work on the hardest problems in science.
Graduate students are being recruited from the best programs in the country. The culture is intense, competitive, and deeply optimistic. Nobody has built the thinking machine yet, but the prevailing mood is that it’s only a matter of time.
Into this world arrives Edward Feigenbaum.
Feigenbaum had studied under Herbert Simon at Carnegie Mellon. Simon is one of the founding figures of AI, a man who believed human decision-making could be modeled as a computational process and had the Nobel Prize to show for a career built on that belief. Feigenbaum absorbed that conviction but added a practical instinct that would eventually set him apart from his peers.
He thought the field was aiming at the wrong target.
Building general intelligence (a machine that could think about anything) was too hard and too vague.
The better move was to go narrow and go deep. Pick a specific domain, find the world’s best experts in it, and build a system that could perform at their level. Prove it works. Then do it again in another domain.
It sounds obvious now but at the time it was a minority view in a field riding high on the dream of general intelligence. Feigenbaum didn’t care. He was more interested in what actually worked than in what was theoretically elegant.
He got his first real chance to test the idea when a Nobel Prize-winning geneticist named Joshua Lederberg knocked on his door.
The Problem in the Lab
Lederberg had a practical problem. His lab was generating mass spectrometry data faster than his chemists could interpret it.
A mass spectrometer fires a beam of electrons at a molecule, breaks it apart, and produces a kind of fingerprint that an expert chemist can read to figure out what the original molecule was. It’s painstaking work that requires deep expertise. Lederberg wanted to know if a machine could do it.
Feigenbaum said yes. And together they began building DENDRAL: the first serious attempt to encode expert-level knowledge into a computer system.
Chemistry has known rules. Molecules behave according to principles that can be written down. Feigenbaum and his team encoded those principles and the system started producing reasonable results.
The early work was promising.
DENDRAL could handle straightforward cases when the molecules behaved exactly the way the textbooks said they would. But chemistry in practice is messier than chemistry in textbooks.
When they pushed the system into more complex cases it started producing answers that were technically consistent with the rules but wrong. And no amount of refining the existing rules fixed it.
Something was missing. The knowledge in the published literature wasn’t the whole picture. Expert chemists were doing something beyond what had ever been written down. The only way forward was to go and ask them.
The Wall
The chemists could solve the problems. They were reliable, accurate, and fast. You could test them and they’d get it right. But when Feigenbaum and his team asked them to explain exactly how they couldn’t really give a clear answer.
The accounts they gave were good. They walked through cases, described what they were looking for, traced their reasoning step by step.
But they were incomplete in ways nobody in the room could see at the time. They skipped steps they didn’t realise they were skipping. They applied judgment they couldn’t fully articulate. They knew things they could not say.
Feigenbaum had stumbled onto something that would reshape his entire understanding of the problem. The barrier to building an expert system wasn’t computational. The inference engine, the search algorithms, the formal logic wasn’t the hard part. The hard part was getting the knowledge in.
He called it the knowledge acquisition bottleneck. And naming this challenge reframed the entire question.
It was no longer just ‘can machines reason?’ It was something deeper and harder:
- What is knowledge?
- How do humans actually hold it, and
- How on earth do you move it from a human mind into a machine?
That question would take decades to answer. It still hasn’t been fully answered. But asking it clearly was the beginning of a new discipline.
The Doctor in the Room
A few years later, a medical student named Edward Shortliffe sat down to do something similar in medicine.
MYCIN, the system Shortliffe built under Feigenbaum’s supervision in the early 1970s, was designed to diagnose bacterial blood infections and recommend antibiotic treatments.
The stakes were as high as they get. The wrong antibiotic at the wrong dose can kill a patient. So MYCIN needed to reason at the level of a specialist in infectious disease.
Shortliffe spent years in sessions with those specialists. They were cooperative, intelligent, and genuinely trying to help. They would walk him through cases, explain their reasoning, describe what they were looking for. And the accounts they gave were good enough to build a working system.
But Shortliffe noticed the same gaps.
It wasn’t just that the doctors struggled to explain their reasoning. It was that large parts of their reasoning were no longer available to them as conscious thought.
Years of seeing thousands of patients had compressed their knowledge into something faster and more automatic than deliberate thinking. They didn’t work through a diagnosis, so much so that they recognised it. The way an experienced driver doesn’t think about changing gears. The way a native speaker doesn’t think about grammar.
This is what made the problem structural rather than just difficult. You couldn’t solve it by asking better questions or running longer sessions.
The knowledge had been absorbed so completely into expert intuition that it no longer existed as something that could be directly retrieved and handed over.
MYCIN eventually encoded the knowledge of those specialists into roughly 600 production rules — IF this condition, THEN this action, with an attached certainty factor. When it was tested against specialist physicians in controlled evaluations, it performed at or above their level.
A machine, reasoning from encoded rules, was diagnosing infections as well as doctors.
It was never deployed clinically. Not because it failed, but because nobody could agree on who was responsible if it got something wrong. The liability questions proved harder than the technical ones. But the proof of concept was undeniable. The question now was how to do it reliably, at scale, across domains.
What Is Knowledge Engineering?
Knowledge Engineering is the practice of getting expertise out of human minds and into a form that machines can use.
The knowledge engineer job is to surface what doctors, chemists, financial analysts, and other experts know, both what they’re consciously and unconsciously aware of into a structure a system can reason with.
Knowledge Engineering works moves through four stages.
Stage 1: The first is knowledge acquisition: the elicitation sessions, the interviews, the case walkthroughs. This is the hardest stage and the most underestimated.
Naïve approaches fail consistently. If you just ask experts what they know, you get the idealized version. So knowledge engineers use methods designed to surface what experts do rather than what they say they do.
One approach is to watch them work. Sit alongside an expert as they solve a real problem and ask them to think out loud as they go. What comes out is messier and more revealing than any interview.
Another is to work through specific past situations where the expert made a consequential decision. Walking through a real case unlocks detail that abstract questioning never reaches. The expert remembers what they noticed, what they ignored, what made this situation different from the last one.
A third is contrastive questioning: instead of asking an expert to describe what they do, ask them to explain the difference between two situations. Why did you treat these two cases differently? What did you see in one that you didn’t see in the other? Comparison forces precision in a way that open description rarely does.
None of these methods extract knowledge perfectly. But they get closer to what experts actually do than asking them to explain themselves ever could.
The second is knowledge representation: translating what you’ve acquired into a formal structure. Production rules. Semantic networks, decision trees, ontologies, are some examples.
The third is knowledge validation: testing whether the encoded knowledge actually performs correctly across cases but the easy ones and the edge cases. A system that works on textbook cases and fails on real ones is not an expert system. It is a simulation of one.
The fourth is knowledge maintenance: managing the knowledge base as a living thing over time. Domains evolve. Expertise updates.
The Philosopher Nobody Expected
The person who gave all of this its deepest intellectual foundation wasn’t a computer scientist. He was a Hungarian-British chemist turned philosopher named Michael Polanyi who had been thinking about the nature of expertise since the 1950s.
In a short book called The Tacit Dimension, published in 1966, Polanyi wrote a sentence that the AI community would eventually find its way to: “We can know more than we can tell.”
Polanyi’s argument was that expertise is not a large inventory of facts and rules that an expert has accumulated and could, in principle, recite. It is something more integrated than that.
Years of experience get processed into pattern recognition, intuition, and judgment that operates faster than conscious reflection and is only partially accessible to it. The expert chemist, from the earlier example, when reading a spectrogram is not running through a checklist, they are doing something that feels perceptual and immediate and that immediacy is precisely what makes it so hard to transfer.
Polanyi never wrote about AI. He was thinking about scientific knowledge and how it gets transmitted between generations of researchers. But when the knowledge engineering community discovered his work, it landed like a precise diagnosis of exactly what they were dealing with.
That insight didn’t make the problem easier. But it made it legible. And legibility is where every solution begins.
The Boom, the Collapse, and What Survived
By the mid-1980s, knowledge engineering had become an industry. XCON was saving Digital Equipment Corporation an estimated $40 million a year. Hundreds of companies were building expert systems. The commercial application of AI, after decades of promise, seemed to have genuinely arrived.
Then the systems started breaking.
Early expert systems were brittle. They performed well within the boundaries of their encoded knowledge and failed unpredictably outside them. Maintaining large rule bases as domains evolved was expensive and error-prone. The knowledge that was accurate in 1982 was wrong or incomplete by 1988.
XCON’s the expert system Digital Equipment Corporation built to configure its VAX computers, eventually grew to over 10,000 rules. Keeping it current required a dedicated team of knowledge engineers working continuously. That ongoing cost was one of the reasons large expert systems eventually became unsustainable.
The AI Winter arrived. Funding contracted. The expert systems industry collapsed.
The field responded by turning toward machine learning. Using statistical approaches that could learn patterns from data rather than requiring knowledge to be extracted and encoded by hand. It was an appealing solution to the bottleneck problem. Instead of laboriously eliciting knowledge from experts, you trained systems on large datasets and let them figure it out.
What was gained in scalability was lost in other ways.
The key difference: Statistical systems learn patterns, not principles. They can’t explain their reasoning. They generalise within the distribution of their training data and fail outside it. They don’t encode the kind of structured, causal understanding that experts use when they encounter something genuinely new.
For a while Knowledge engineering fell out of fashion but the questions it was built to answer remained alive.
Overtime, it evolved into ontology engineering, into organisational learning theory, into research on how experts actually make decisions under uncertainty.
Why You’re Reading This Now
Large language models have changed what’s possible. They have absorbed an extraordinary volume of human knowledge and can perform across a remarkable range of domains with genuine fluency. The optimism in the AI field today echoes the optimism of that summer at Dartmouth.
But the knowledge acquisition bottleneck has not been solved by scale. It has been displaced.
What LLMs cannot do, without deliberate intervention, is perform at expert level in specific organisational contexts. They have breadth without depth. They produce plausible output, which is not the same thing as correct output. They have patterns without the underlying causal understanding that experts use to reason about situations they’ve never seen before.
Anyone building a system that needs to perform at expert level is facing the same problem Feigenbaum faced sitting with those chemists in the 1960s.
How do you move what an expert actually knows into a system that can use it?
That problem doesn’t yield to better prompts or bigger models. It yields to method. The method has a name. It is sixty years old and more relevant now than it has ever been.
That’s what knowledge engineering is. And that’s why it’s where this story starts.