What if we allowed dynamic evolution of programs mid-way through their execution? A question I had often mused about but could never figure out how to do. Given no prior knowledge in PLT, digging deeper into this seemed like a herculean possibility. With the advent of LLMs and the general clamor for ambitious projects, I decided to give this a try. Having an unused weekly Claude quota helped. I ended up building a small project and describe some of the interesting observations below. The complete code is at llm-clojure .
The intent is to have programs that can dynamically evolve from within, based on some arbitrary instructions. A few questions arise
On initial exploration, it looked like I would be creating a new language and answering the questions from the ground up, with all the associated complexities of building a grammar, lexer, and maybe even a VM. Then I remembered reading about Clojure/Lisp and its ability to change at runtime using macros1. That got me thinking — if I had a macro that could transform code at runtime, that would alleviate the need for building my own stuff2. The flip side was that I have only a sophomoric knowledge of Clojure, from what I remember from doing a couple of tutorials almost a decade ago and going to a few meetups. This means I would be relying on an LLM to generate code in a language that is not usually held up as a poster child for LLM-generated code. But given this is a weekend project and not a professional undertaking, it might be fun to explore riskier paths.
There were a couple of key decision points on how to build a working system.
How developers will interact with the system is something of a sticking point. The intent is to make it as seamless as possible and fit within the existing host language paradigm. So, having a macro system that allows seamless transformation while keeping the rest of the programming model similar3, I decided to add a macro with the form ^, which allows for writing interwoven code, e.g.
(defn summarize [document max-words]
^{:intent "summarize document in plain language, stay under max-words"
:max-iterations 2}
(summarize-text-placeholder document max-words))
(summarize "Some long winded text..." 20)
or below which shows how it can be easily interwoven with a regular switch (cond / conditional) style code
(defn process-payment [payment]
(let [status ^{:intent "classify payment as :approved :declined :fraud or :pending"}
(classify-payment payment)]
(cond
(= status :approved) (complete-payment payment)
(= status :declined) (notify-declined payment)
(= status :fraud) (flag-and-block payment)
:else (queue-for-review payment))))
The main intent is to keep both the host and the LLM as interoperable as possible so that developers don't have to do mental context switching when programming. The system has a symbol lookup map for built-in functions like map, +, and filter, and uses apply to execute them. This currently has issues when importing/requiring from other namespaces, but works with a baseline set of functions pre-loaded in the base environment
In addition, the REPL usability is not great and will need more debugging tooling support for daily use.
Providing the correct context is important for LLMs to be able to do their job. This comes in two parts: current context and historical memory. The current context is a must-have, while the historical memory is good to have to provide guidance to LLMs.
Provided as part of the form itself and passed to the LLM which helps it steer
^{:intent "extract product name"
:language "detect automatically, normalize output to English"}
(extract-name (:description raw))
The keys are not enforced but serve as guidance for the LLM. In this case, extract-name is just a placeholder. The LLM responses are sent to a judge LLM, which adds error information before retrying in case of an error. This helps the next call get better guidance. The LLM evaluator keeps trying to resolve up to a fixed number of iterations4.
In addition, the system needs to identify when to expand/apply the forms inside ^ and when to pass them as literals. This is similar to template and backtick concerns in bash. A few examples:
before: (map (:name input) :to category-ref)
after: (map "iPhone 15 Pro" :to {:electronics 1 :clothing 2 :books 3 ...})
The long form flow chart of when forms are resolved and when passed as is
node is a nested ^/^^ form?
→ leave intact (will be resolved in its own llm-eval call)
node is a seq?
→ first element is a keyword?
→ evaluate it: (:name input) → "iPhone 15 Pro"
→ otherwise?
→ walk each element recursively (keep structure, substitute data)
node is a map or vector?
→ walk keys/values/elements recursively
node is a symbol?
→ resolve it in env
→ resolves to a function (ifn?, not map/set/vector/keyword)?
→ keep the symbol name ← LLM sees `map`, `classify`, `generate-predicate`
→ resolves to data?
→ substitute the value ← LLM sees {:USA 1 :GBR 2}, #{:new :used}, "iPhone"
→ unresolvable?
→ keep the symbol name
node is a literal (string, number, keyword, boolean)?
→ pass as-is
Each run or expansion is also captured and used for future guidance. The simplest form is keeping track of all historical expansions and picking the best one based on a score. In the future, this can be extended to use more complex optimization, including prompt and intent optimization. Another advantage of storing past resolutions is that the runtime memory details are stored in a file and can be shipped. The memory has a structured log interface as follows:
{:form "(map \"UK\" :to {:USA 1 :GBR 2 :CAN 3})" ; (pr-str form) — string key
:resolved 1
:confidence 0.85
:context {...}
:status :resolved
:at #inst "2026-05-27T..."}
:context {...}
:status :resolved
:at #inst "2026-05-27T..."}
The memory interface supports methods to interact with the memory log. Some of the methods built are:
This allows for theoretically reflecting on memory methods within a running program
(defn most-common-resolution [runtime form]
(let [last-10 (->> (runtime/mem-history runtime form) ;; Read enteries matching this form
(filter #(= (:status %) :resolved)) ;; find the filtered ones
(take-last 10)) ;; take last 10
freq-map (frequencies (map :resolved last-10))
[value count] (first (sort-by val > freq-map))]
{:most-common value
:count count
:out-of (clj/count last-10)
:all-counts freq-map}))
This sample program demonstrates some of the key concepts and what an actual program looks like
(def category-set #{:clothing :electronics :books :food :none})
(defn fix-missing-category [data-row]
;; into data-row is a normal Clojure function call to merge into dictionary data-row
;; the value returned by ^ form which is of type {:category :electronics}.
;; The into works like normal code while missing-data is just a place-holder and not applied
(into data-row ^{:intent "Generate a Clojure map {} .
The key is :category and value needs to be identified from the name of item.
Ensure it is a valid category value from the category set only.
If cannot identify with high confidence use the value :none .
Return only the map form - no explanation , no markdown"}
(missing-data data-row category-set)))
(defn check-fix-data [data-row]
(if (contains? data-row :category)
;; If category is missing use an LLM to assign it ,here using regular and LLM functions together
data-row
(fix-missing-category data-row)))
(defn make-filter [description sample]
^{:intent "generate a Clojure (fn [item] ...) predicate that implements the described filter.
Use the sample map to understand available keys and value types.
Return only the fn form — no explanation, no markdown."}
(generate-predicate description sample))
(def catalog-data [{:name "MacBook" :category :electronics :condition :new :price 1299}
{:name "Novel" :category :books :condition :new :price 18}
{:name "Old Phone" :category :electronics :condition :used :price 89}
;;Sample for a missing category above in headphones
{:name "Headphones" :condition :new :price 249}
{:name "Jacket" :category :clothing :condition :new :price 320}])
(defn run-filter [condition data]
(let [fixed-data (map check-fix-data data)
;;Check if the shape of data is good and add any missing attributes
sample-data (second fixed-data)
pred (make-filter
condition
fixed-data)] ;; Use LLM to generate a filter based on input criteria
(filter pred fixed-data)))
(pprint
(run-filter "items that are electronics in new condition costing more than 200" catalog-data))
The LLM have a bootstrapping issue in terms of doing a deep vs. wide search.They tend usually tend to suggest the most common acceptable answer unless forced to do wide search. This might be an interesting topic for a future post on how to steer deep vs. wide
↩This was also a token minimization effort not to eat up my Claude limits by generating lesser code
↩(((Insert your lisp syntax joke)))
↩Judge prompt
System message (static, strict):
You are a strict resolution judge for a homoiconic LLM runtime called LLM-Lisp.
Your job is to evaluate whether a resolved value is a correct answer for the given form and intent.
Rules:
- Be strict: a plausible-but-wrong answer must be rejected
- Consider the intent, the form structure, and any prior failed attempts
- Return ONLY a raw EDN map — no backticks, no markdown, no explanation outside the map
Response format (EDN map):
{:accepted true or false :confidence 0.0 to 1.0 :reason "one sentence explanation" :correction nil-or-the-correct-value}