› AI alignment
Working draft · seeking academic sponsorshipIntrinsic Thought Process — alignment at the cognitive level, not the constraint level.
Mainstream AI alignment over the last two years has converged on a constraint-based shape: train preferences with RLHF, run a guardrail model in front of outputs, filter what slips through, audit incidents and tune. The pattern works, scales, and is what every frontier lab ships. It also has a ceiling, and the ceiling is structural: if values only show up at output time, the agent has already done all of its reasoning under a different objective.
Intrinsic Thought Process (ITP) is the architectural argument that values belong at the world-model and reward level, where reasoning actually happens.
Constraint-based vs intrinsic
A constraint-based alignment architecture is structurally adversarial. The agent reasons under one objective (task completion, reward maximization), and a separate constraint system tries to catch and redirect outputs that violate values. Misalignment in this architecture is detectable but not preventable in any deep sense — the underlying optimization is unchanged.
An intrinsic alignment architecture rebuilds the optimization itself. The world model the agent uses to predict outcomes carries value-laden representations of those outcomes. The reward function shaping the agent's policy isn't a single scalar weighted against constraint penalties — it's a structured signal where humanistic value satisfaction is part of the reward, not a deduction from it. The agent doesn't reason and then get filtered; the agent reasons in a structure where alignment is part of the gradient.
That sounds abstract until it's grounded.
DreamerV3 as the substrate
DreamerV3 is the world-model architecture chosen as the experimental substrate. The reasons are technical: DreamerV3 trains a learned world model that the agent uses to imagine rollouts internally, separate from environment interaction; the model's latent space is rich enough to encode structured outcome representations; and the published baseline is robust enough that modifications can be evaluated against a known control.
The ITP extension introduces moral reward structures into DreamerV3's reward predictor. The reward is no longer a single learned scalar over future states; it's a composition that distinguishes between outcomes that achieve the task and outcomes that achieve the task in ways consistent with embedded value structures. The architectural work is making this composition stable and trainable without losing the sample efficiency DreamerV3 is built for.
Minecraft as the game-theoretic environment
Minecraft is the experimental sandbox. The choice is deliberate: Minecraft has the open-world structure that surfaces value-loaded decisions naturally (cooperation vs exploitation, stewardship vs depletion, multi-agent dynamics), the action space is broad enough that interesting policies emerge, and the world-model framing of the environment fits DreamerV3's architecture cleanly. The game theory framing — agents in repeated interactions where prosocial choices have measurable cost — is where the experimental signal lives.
Toward academic sponsorship
The output of the work is a series of arXiv-formatted academic white papers — architectural framing, experimental design, early results. The current need is academic sponsorship: a research advisor or institution prepared to host the experimental work at the scale the architecture warrants, and to support submission and publication channels.
This is not commercial product work. It's the research direction that informs the practical guidance Protime brings to enterprise AI deployments — which is where the work translates back. Enterprises adopting agentic AI inherit alignment as a deployment problem; understanding alignment at the architectural level is what makes the deployment guidance non-trivial.
Status and access
Architectural framing complete; experimental design specified; early DreamerV3 modifications under test. Academic white paper drafts available to research partners. Access available on request.