Adding an AI Chat Assistant to a Static Website

Cloudflare

Quarto

Web

Published

March 8, 2026

Modified

April 8, 2026

Abstract

Most personal websites are one-way streets: you publish, visitors read. What if they could ask instead? In this post I walk through adding a small AI assistant — powered by Cloudflare Workers AI — that lets anyone query my experience and résumé in plain English, with zero backend server to maintain and, under typical free-tier usage, zero API costs.

Why bother?

A PDF résumé is a frozen snapshot. Visitors have to skim it, guess what is relevant to them, and hope they find the right section. A recruiter looking for Python experience and a professor curious about your teaching history need very different things — but they both have to read the same document.

An AI assistant changes that dynamic. Instead of making the visitor do the work, you let them ask in plain language: “Does he have any experience with time-series forecasting?” or “What level does he teach at?” and get a direct, personalised answer in seconds.

The goal of this post is to build exactly that, in a way that is:

Free — Cloudflare Workers has a generous free tier (100,000 requests/day); Cloudflare Workers AI also runs on the free tier, so this stack costs nothing to operate.
Maintenance-free — no server to patch, no database, no uptime to worry about.
Portable — the approach works with any static site generator (Quarto, Hugo, Jekyll, plain HTML…).

The big picture

Three moving parts, one simple flow:

Visitor's browser
      │  1. types a question
      ▼
Chat widget (HTML/JS)
      │  2. POST { message, history } to Worker URL
      ▼
Cloudflare Worker  ──►  Cloudflare Workers AI
      │                     │
      │  3. system prompt   │  4. returns answer
      │     + message       │
      ◄─────────────────────┘
      │  5. forwards { reply }
      ▼
Chat widget displays reply

The widget is an inline HTML+JS block in index.qmd. It captures the visitor’s message and sends it to the Worker.

The Worker is a tiny JavaScript function running on Cloudflare’s global edge network. It prepends a system prompt containing your full background, then forwards the conversation to Cloudflare Workers AI using the env.AI binding — no external API key required.

The key insight is that the browser never talks to the AI model directly — everything goes through your Worker, where you can add rate limits or filters later if you need them.

Part 1 — The Cloudflare Worker

Wrangler is the official Cloudflare CLI. Install it once, authenticate, and you’re ready to deploy:

npm install -g wrangler
wrangler login   # opens a browser window for OAuth

The project needs two files inside a worker/ directory at your site root.

website/
├── assets/
│   └── dark.scss          # site theme — includes chat widget styles
├── posts/
│   └── resume-chat/
│       └── index.qmd      # this post
├── worker/
│   ├── index.js           # Cloudflare Worker — calls Workers AI, returns reply
│   └── wrangler.toml      # Worker config — AI binding, model var, deployment
├── _quarto.yml            # site-wide Quarto config
└── index.qmd              # landing page — embeds the chat widget

wrangler.toml is the configuration:

name = "resume-chat"
main = "index.js"
compatibility_date = "2024-09-01"

[ai]
binding = "AI"

[vars]
AI_MODEL = "@cf/meta/llama-3.1-8b-instruct"

name sets the subdomain (resume-chat.<account>.workers.dev); main points to the Worker entry-point. The [ai] binding gives the Worker access to env.AI — Cloudflare’s built-in AI runtime, no external API key needed. AI_MODEL in [vars] makes the model configurable: change it in wrangler.toml and redeploy without touching any code.

The real work happens in index.js. Scroll through the walkthrough below.

Here is a walkthrough of worker/index.js. Every section has a job — let’s go through them one by one.

The system prompt is the most important part of the setup. It is prepended to every conversation before the visitor’s message. The instruction “based strictly on the information below” keeps the model from hallucinating anything that isn’t in your résumé. Keep this section updated whenever your situation changes.

CORS headers tell the browser that your Worker accepts cross-origin requests. Your widget lives on ggilles.dev; the Worker lives on workers.dev — without these headers, the browser would silently block every fetch. Setting https://ggilles.dev restricts access to the Worker only from your exact domain.

The OPTIONS preflight is an automatic browser check sent before every cross-origin POST: “Are you OK with this?” The Worker must respond to it correctly, or the real request never goes through. Parsing the body is straightforward but worth wrapping in a try/catch — you want clear error responses rather than uncaught exceptions if something malformed hits the endpoint.

Building the messages array is where context lives. The system prompt anchors every conversation. history.slice(-6) keeps the last three exchanges (six messages: three user, three assistant) so the model can handle follow-up questions without ballooning the token count.

Calling Cloudflare Workers AI uses env.AI.run() with the model read from env.AI_MODEL (falling back to @cf/meta/llama-3.1-8b-instruct if the variable is unset). Keeping the model name in wrangler.toml means you can swap it without touching the Worker code — just edit [vars] and redeploy. max_tokens: 400 caps reply length. The call is wrapped in a try/catch so failures return a clean, user-friendly error. On success, the Worker sends the browser a clean { reply: "…" } JSON object with CORS headers attached so the browser actually accepts it.

const SYSTEM_PROMPT = `You are an AI assistant representing Guillaume Gilles.
Answer questions based strictly on the information below.
Be concise, friendly, and professional.
If asked about something not covered, say you don't have that information.

--- PROFILE / EXPERIENCE / EDUCATION / SKILLS ---
…`;

const CORS_HEADERS = {
  "Access-Control-Allow-Origin": "https://ggilles.dev",
  "Access-Control-Allow-Methods": "POST, OPTIONS",
  "Access-Control-Allow-Headers": "Content-Type",
};

export default {
  async fetch(request, env) {

    if (request.method === "OPTIONS") {
      return new Response(null, { headers: CORS_HEADERS });
    }

    let body;
    try {
      body = await request.json();
    } catch {
      return new Response("Invalid JSON", { status: 400 });
    }

    const { message, history = [] } = body;
  },
};

const messages = [
  { role: "system", content: SYSTEM_PROMPT },
  ...history.slice(-6),
  { role: "user", content: message },
];

let reply;
try {
  const aiRes = await env.AI.run(
    env.AI_MODEL ?? "@cf/meta/llama-3.1-8b-instruct",
    { messages, max_tokens: 400 }
  );
  reply = aiRes.response;
} catch (err) {
  return new Response(
    JSON.stringify({ error: "AI service unavailable. Please try again later." }),
    { status: 502, headers: { "Content-Type": "application/json", ...CORS_HEADERS } }
  );
}

return new Response(JSON.stringify({ reply }), {
  headers: { "Content-Type": "application/json", ...CORS_HEADERS },
});

Deploying

No secrets to manage — the AI binding is configured in wrangler.toml and Cloudflare handles everything else. Just deploy:

cd worker/
wrangler deploy
# → Deployed: https://resume-chat.<account>.workers.dev

Test it immediately from the terminal:

curl -X POST https://resume-chat.<account>.workers.dev \
  -H "Content-Type: application/json" \
  -d '{"message": "What is his current job?"}'
# → {"reply":"Guillaume is currently a Regional Economist at the Banque de France…"}

Part 2 — The chat widget

The widget is a raw HTML block embedded directly in index.qmd using Quarto’s {=html} fence. It has two parts: markup and JavaScript.

The complete widget — HTML structure up top, self-contained JavaScript below. Let’s walk through it.

The HTML structure is minimal on purpose. #chat-box is a flex column: #chat-messages scrolls vertically as the conversation grows; #chat-input-row stays pinned at the bottom. No toggles, no overlays — the box lives inline in the page, right below the landing page text.

WORKER_URL is the only value you need to change when you adapt this for your own site. history accumulates the conversation turn by turn; busy is a simple mutex that prevents double-sends while a fetch is in flight.

appendMsg is a small factory: it creates a div, stamps it with the right CSS class (user or bot), appends it to the message area, and scrolls to the bottom. It returns the element so the caller can mutate it later — e.g. replace “Thinking…” with the actual reply in place.

sendMessage opens with a guard clause — if the input is empty or a fetch is already running, it exits immediately. It then clears the input, renders the user bubble, and shows a “Thinking…” placeholder that gets replaced in-place once the response arrives. Both sides of the exchange are pushed into history for the next round. Error handling covers the case where the Worker is unreachable: after either path the busy flag is reset, the Send button re-enabled, and focus returned to the input.

<div id="chat-box">
  <div id="chat-messages">
    <div class="chat-msg bot">
      Hi! I'm an AI assistant. Ask me anything
      about Guillaume's background. 👋
    </div>
  </div>
  <div id="chat-input-row">
    <input id="chat-input" type="text"
           placeholder="e.g. What's his Python experience?"
           autocomplete="off" />
    <button id="chat-send">Send</button>
  </div>
</div>

(function () {
  const WORKER_URL = "https://resume-chat.<account>.workers.dev";

  const msgs  = document.getElementById("chat-messages");
  const input = document.getElementById("chat-input");
  const send  = document.getElementById("chat-send");
  let history = [];
  let busy    = false;

  send.addEventListener("click", sendMessage);
  input.addEventListener("keydown", (e) => {
    if (e.key === "Enter" && !e.shiftKey) {
      e.preventDefault(); sendMessage();
    }
  });
})();

function appendMsg(text, role) {
  const div = document.createElement("div");
  div.className = `chat-msg ${role}`;
  div.textContent = text;
  msgs.appendChild(div);
  msgs.scrollTop = msgs.scrollHeight;
  return div;
}

async function sendMessage() {
  const text = input.value.trim();
  if (!text || busy) return;
  busy = true; send.disabled = true; input.value = "";

  appendMsg(text, "user");
  const thinking = appendMsg("Thinking…", "bot typing");

  try {
    const res = await fetch(WORKER_URL, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message: text, history }),
    });
    const data = await res.json();
    const reply = data.reply || data.error || "Something went wrong.";
    thinking.className = "chat-msg bot";
    thinking.textContent = reply;
    history.push({ role: "user",      content: text  });
    history.push({ role: "assistant", content: reply });
  } catch {
    thinking.className = "chat-msg bot";
    thinking.textContent = "⚠️ Could not reach the assistant.";
  }

  busy = false; send.disabled = false;
  input.focus(); msgs.scrollTop = msgs.scrollHeight;
}

Part 3 — Integrating with Quarto

Quarto’s {=html} raw block passes content through to the rendered page verbatim. Drop the widget anywhere in a .qmd file:

```{=html}
<div id="chat-box">
  …
</div>
<script>
(function () {
  const WORKER_URL = "https://resume-chat.<account>.workers.dev";
  …
})();
</script>
```

Because the block is scoped to index.qmd, the widget only appears on the landing page — not on blog posts or teaching pages, where it would be out of place. If you later want it site-wide, Quarto’s include-after-body option in _quarto.yml can inject an HTML file into every page instead.

Putting it all together

# 1. Install Wrangler
npm install -g wrangler

# 2. Authenticate with Cloudflare (creates a free account if needed)
wrangler login

# 3. Deploy the Worker (the [ai] binding in wrangler.toml is all you need)
cd worker/
wrangler deploy
# Note the URL printed: https://resume-chat.<subdomain>.workers.dev

# 4. Update the widget with your Worker URL
# Open index.qmd and change:
#   const WORKER_URL = "https://resume-chat.YOUR-SUBDOMAIN.workers.dev";
# to your actual URL.

# 5. Rebuild the landing page
cd ..
quarto render index.qmd

# 6. Publish
git add -A
git commit -m "Add AI résumé chat assistant"
git push

What to customise

Update the résumé content — open worker/index.js and edit the SYSTEM_PROMPT string. Then run wrangler deploy again to push the update. The website itself does not need to be rebuilt.

Show the widget on every page — instead of embedding it in index.qmd, use Quarto’s include-after-body option in _quarto.yml:

format:
  html:
    include-after-body: _partials/chat-widget.html

Move the HTML+JS block to _partials/chat-widget.html and quarto render will inject it at the bottom of every page.

Update the CORS origin — the Worker is already restricted to a single origin. When you adapt this for your own site, change ggilles.dev to your domain in worker/index.js:

"Access-Control-Allow-Origin": "https://your-domain.com",

Swap the model — edit AI_MODEL in wrangler.toml and run wrangler deploy. No code change needed:

[vars]
AI_MODEL = "@cf/mistral/mistral-7b-instruct-v0.1"

Browse the full catalogue at developers.cloudflare.com/workers-ai/models.

Closing thoughts

The whole implementation is roughly 150-200 lines of code spread across three files (worker/index.js, assets/dark.scss, and the raw HTML block in index.qmd). Despite its small footprint, it covers several important concepts that show up in many real-world projects: serverless functions, CORS, prompt engineering, and progressive enhancement (the site works perfectly for visitors who never open the chat).

If you replicate this on your own site, feel free to open a GitHub issue or reach out on LinkedIn — I’d be happy to help you debug it.