Deduping user feedback with pgvector and Vercel AI Gateway

How we kill '500 error' spam on the public roadmap: cosine similarity on OpenAI embeddings, HNSW indexes on Neon, auto-link above 0.92, and a dashboard review queue for the gray zone.

Every feedback-collection tool eventually hits the same wall: one broken API endpoint produces forty variations of “500 error on checkout,” thirty of them show up on the public roadmap, and the rest sit in the inbox waiting for a human to merge them. Upvotes split. The top-voted item is never the real top-voted item. The backlog looks like noise.

We just shipped the dedupe layer for FeedbackIQ. The rule is simple: every submission gets an embedding, we check cosine similarity against recent items in the same project, auto-link above 0.92, surface the 0.80–0.92 band to the dashboard for a human to confirm, and ignore below. This post walks through how it’s wired.

Why embeddings and not keyword matching

Keyword matching catches “500 error” vs “500 error.” It does not catch “checkout is broken” vs “can’t complete purchase” vs “pay button does nothing.” Those are semantically identical and syntactically different, which is exactly the case embeddings are for.

We use openai/text-embedding-3-small — 1536 dimensions, $0.02 per million tokens. At ~200 tokens per feedback submission, that’s five million submissions per dollar. The cost is a rounding error.

Routing through Vercel AI Gateway

We call the model through the Vercel AI SDK’s embed helper, which routes to Vercel AI Gateway when AI_GATEWAY_API_KEY is set:

import { embed } from "ai";

export async function generateEmbedding(text: string): Promise<number[]> {
  const { embedding } = await embed({
    model: "openai/text-embedding-3-small",
    value: text.trim().slice(0, 8000),
  });
  return embedding;
}

Gateway gives us observability, per-model rate limiting, and $5/mo free credits on any Vercel account. In production on Vercel, we don’t even set the key — VERCEL_OIDC_TOKEN is injected into every deployment and used as a fallback. Zero config.

pgvector + HNSW on Neon

Neon runs stock Postgres with pgvector available as an extension. We declared it in the Prisma schema with the postgresqlExtensions preview feature, added an Unsupported("vector(1536)") column to Feedback, and put an HNSW index on it for fast approximate nearest-neighbor search:

CREATE INDEX feedback_embedding_hnsw_idx
ON feedbackiq."Feedback"
USING hnsw (embedding vector_cosine_ops);

HNSW is overkill until you have tens of thousands of rows per project, but adding it now is cheaper than retrofitting later. Until then the planner falls back to a sequential scan and still finishes in a few milliseconds.

The similarity query

Prisma doesn’t (yet) know about vector operators, so we drop to $queryRaw:

const rows = await prisma.$queryRaw<
  Array<{ id: string; similarity: number; duplicateOfId: string | null }>
>`
  SELECT
    id,
    "duplicateOfId",
    1 - (embedding <=> ${literal}::vector) AS similarity
  FROM feedbackiq."Feedback"
  WHERE "projectId" = ${projectId}
    AND id <> ${excludeId}
    AND embedding IS NOT NULL
  ORDER BY embedding <=> ${literal}::vector
  LIMIT 5
`;

The <=> operator is pgvector’s cosine distance. 1 - distance gives similarity. We order by distance so the closest neighbor is row zero.

Thresholds and the gray zone

Three buckets:

> 0.92 — auto-link as duplicate. Upvotes flow to the parent. The new item is hidden from the public roadmap.
0.80–0.92 — surface as a “Possible duplicate” in the dashboard with both snippets side-by-side. Owner confirms (upvotes merge) or rejects (link clears).
< 0.80 — treat as a fresh item.

We picked 0.92 by staring at actual submissions. Cosine similarity on text-embedding-3-small saturates fast — 0.85 is “same topic,” 0.92 is “same bug.” Higher thresholds miss real dupes; lower thresholds start merging unrelated-but-similar requests.

Upvote routing for confirmed duplicates

The widget’s upvote endpoint ignores child duplicates and votes on the parent instead. Same for un-upvote. The user never knows they clicked a child — they clicked the item that was showing, and their vote landed where it belongs:

const targetId =
  feedback.duplicateConfirmed && feedback.duplicateOfId
    ? feedback.duplicateOfId
    : feedback.id;

await prisma.feedbackUpvote.create({
  data: { feedbackId: targetId, voterHash: hash },
});

What we got wrong on the first pass

Our first draft auto-linked above 0.85. It merged features that were “both about notifications” but meant different things — one person wanted email digests, another wanted push alerts. We raised the threshold to 0.92 and moved the gray zone into the dashboard. False positives vanished; real dupes still caught.

The other miss: we originally blocked the submission endpoint on the embed call. Round-trips to the Gateway added 200-400ms. Made it fire-and-forget instead — embed + dedupe runs after the response is already back to the widget. The user never waits for our backend.

What’s next

The dedupe layer is the foundation for expanding the inbox beyond the widget. Coming up: Sentry errors auto-filed as feedback, support tickets ingested from Intercom/email, server logs through a generic HTTP endpoint. Every new input source multiplies the noise — dedupe is the reason it won’t overwhelm the roadmap.

Deduping user feedback with pgvector and Vercel AI Gateway

Why embeddings and not keyword matching

Routing through Vercel AI Gateway

pgvector + HNSW on Neon

The similarity query

Thresholds and the gray zone

Upvote routing for confirmed duplicates

What we got wrong on the first pass

What’s next

Keep reading

SEO without bloat: llms.txt, nano-banana OG images, a sitemap that knows about blog posts

URL-synced filters for a PR list that actually scales

A liquid-glass UI pass: snake borders, noise, conic gradients

Drop a widget on your site, ship PRs from feedback