DOM Pipeline

1

Extract

// Live DOM
<form id="login">
  <input type="email"
    placeholder="Email"
    class="field-3x..."
    data-testid="email"
    aria-label="Email" />
  <input type="password"
    placeholder="Password" />
  <button type="submit">
    Log In
  </button>
</form>
      

Full DOM with all attributes, classes, and nesting

2

Dehydrate

// Simplified HTML for LLM
// (text-based, no screenshots)

[14]<input Email />
[15]<input Password />
[16]<button>Log In</button>


// Each interactive element
// gets a numeric index.
// Non-interactive nodes
// are stripped away.
      

Stripped to indexed interactive elements only

3

LLM Thinks

// LLM response (MacroToolInput)
{
  "reflection": "I see a login form.
  I need to click Log In.",
  "action": {
    "name": "click_element_by_index",
    "args": {
      "index": 16
    }
  }
}


      

Reflects on state, picks a tool and target index

4

Execute

// PageController executes

pageController
  .clickElement(16)

// Resolves index 16 back
// to the real DOM node
// and fires a click event.

// SimulatorMask shows a
// visual highlight on the
// clicked element so the
// user sees what happened.
      

Index maps back to real DOM node, click fires

The DOM Pipeline: From Webpage to LLM and Back