Auto-tagging feedback: category, priority, tags — none typed by the user

Structured outputs with Zod schemas, Claude Haiku on Vercel AI Gateway, project-aware tag hints, and a hand-written priority rubric that keeps the model from either under- or overreacting.

A user submits “checkout button is broken on mobile.” Before that row is fully committed, we want: a category (bug), a priority (high — revenue path), tags (checkout, mobile, ui), and a short machine-readable title. All derived, none typed by the user.

The whole pipeline runs in about 400ms on Vercel AI Gateway, doesn’t block the submission response, and costs roughly $0.0004 per feedback item. The interesting part is how we got it to stop making things up.

Structured outputs, not freeform

The first draft asked Claude “summarize this feedback and suggest tags.” That works for a demo and fails in production: the model returns prose, or extra commentary, or tags in three different casings. The fix is to treat the model as a function that returns a typed object.

We use the Vercel AI SDK’s generateObject with a Zod schema:

import { generateObject } from "ai";
import { z } from "zod";

const FeedbackClassification = z.object({
  category: z.enum(["bug", "feature", "improvement", "question", "other"]),
  priority: z.enum(["low", "medium", "high", "urgent"]),
  tags: z.array(z.string()).max(5),
  title: z.string().max(80),
});

const { object } = await generateObject({
  model: "anthropic/claude-haiku-4-5",
  schema: FeedbackClassification,
  prompt: buildClassifyPrompt(content, projectContext),
});

The model is forced to output a shape the SDK can parse. If it drifts, the call errors and we retry once. If it errors twice, we fall back to heuristics (first-noun-phrase as title, no tags, “medium” priority) so the row still gets saved in a reasonable shape.

Why Haiku instead of Sonnet

Claude Haiku 4.5 at ~$0.80 per million input tokens is well below the noise floor of our other costs, and for classification tasks on short (~200 token) inputs it’s essentially indistinguishable from Sonnet. We reserve Sonnet for the step where Claude Code actually writes the PR — a task where you can feel the difference between models.

Priority is the hard one

Category and tags are stable. Priority is where models either underreact (“medium” for everything) or overreact (“urgent” for cosmetic bugs). We gave the model a concrete rubric in the prompt:

- urgent: breaks core user flow (checkout, signup, login, data loss)
- high:   affects revenue or retention, visible to many users
- medium: usability issue, non-blocking, one-page scope
- low:    polish, copy, nice-to-have

With that in the system prompt, priority predictions are reasonable ~85% of the time in our (small) hand-labeled eval set. Enough that the dashboard’s default sort by priority is actually useful; humans override it when the model is off.

Project context makes tags sharper

Tags are much better when the model knows what products/surfaces exist in the project. Each project in FeedbackIQ has an optional “surfaces” list (checkout, pricing, dashboard, admin). We inject those into the prompt as “prefer these tags when they fit.” The result: tags are sharable across the inbox and consistent week over week, which is what lets the roadmap actually cluster.

Running after the response

Like everything else in the inbox pipeline, classification runs after the POST returns. The widget sees a 200 in ~200ms; the dashboard sees the tagged row ~1s later. The UX cost of doing this synchronously was not worth the latency it added for a step whose result is only consumed by a refresh of the dashboard.

// Inside /api/v1/feedback route handler
const feedback = await prisma.feedback.create({ ... });
classifyFeedback(feedback.id).catch((err) =>
  console.error("classifyFeedback failed:", err)
);
dedupeFeedback(feedback.id).catch((err) =>
  console.error("dedupeFeedback failed:", err)
);
return Response.json({ id: feedback.id }, { status: 201 });

What we got wrong

Early on we asked the model to score “sentiment” alongside priority. It was a number 0-100 that no one ever looked at — a textbook example of shipping a feature because it was easy to prompt for instead of because it was useful. We deleted the column two weeks later and nobody noticed.

Next up: the PR pipeline. Once an item is tagged and prioritized, Claude Code actually opens the pull request. That step is where the product earns its name.

Auto-tagging feedback: category, priority, tags — none typed by the user

Structured outputs, not freeform

Why Haiku instead of Sonnet

Priority is the hard one

Project context makes tags sharper

Running after the response

What we got wrong

Keep reading

Deduping user feedback with pgvector and Vercel AI Gateway

SEO without bloat: llms.txt, nano-banana OG images, a sitemap that knows about blog posts

URL-synced filters for a PR list that actually scales

Drop a widget on your site, ship PRs from feedback