1
Extract
// Live DOM
<form id="login">
<input type="email"
placeholder="Email"
class="field-3x..."
data-testid="email"
aria-label="Email" />
<input type="password"
placeholder="Password" />
<button type="submit">
Log In
</button>
</form>
Full DOM with all attributes, classes, and nesting
2
Dehydrate
// Simplified HTML for LLM
// (text-based, no screenshots)
[14]<input Email />
[15]<input Password />
[16]<button>Log In</button>
// Each interactive element
// gets a numeric index.
// Non-interactive nodes
// are stripped away.
Stripped to indexed interactive elements only
3
LLM Thinks
// LLM response (MacroToolInput)
{
"reflection": "I see a login form.
I need to click Log In.",
"action": {
"name": "click_element_by_index",
"args": {
"index": 16
}
}
}
Reflects on state, picks a tool and target index
4
Execute
// PageController executes
pageController
.clickElement(16)
// Resolves index 16 back
// to the real DOM node
// and fires a click event.
// SimulatorMask shows a
// visual highlight on the
// clicked element so the
// user sees what happened.
Index maps back to real DOM node, click fires