Skip to main content

@tinyclaw/matcher

Three-dimensional text matching that handles synonyms, typos, and partial matches without external embedding APIs.

Installation

bun add @tinyclaw/matcher

Overview

The hybrid matcher combines three scoring dimensions:
  1. Keyword overlap - TF-IDF-like weighting with stop-word filtering
  2. Fuzzy matching - Levenshtein distance for typo tolerance
  3. Synonym expansion - Built-in + user-extensible synonym groups
Designed as a drop-in replacement for simple Jaccard keyword overlap used in delegation lifecycle.

API Reference

createHybridMatcher(config?)

Create a hybrid matcher instance. Parameters:
config.minScore
number
Minimum combined score to consider a match.
config.weights
object
Weights for each scoring dimension. Must sum to 1.0.
  • keyword - Keyword overlap weight
  • fuzzy - Fuzzy matching weight
  • synonym - Synonym expansion weight
Returns: HybridMatcher
import { createHybridMatcher } from '@tinyclaw/matcher';

const matcher = createHybridMatcher({
  minScore: 0.4,
  weights: { keyword: 0.5, fuzzy: 0.2, synonym: 0.3 },
});

HybridMatcher Interface

score
(query: string, target: string) => MatchResult
Score how well two text strings match semantically.Returns a MatchResult with:
  • score - Combined weighted score (0.0–1.0)
  • keywordScore - Keyword overlap sub-score
  • fuzzyScore - Fuzzy/Levenshtein sub-score
  • synonymScore - Synonym expansion sub-score
findBest
(query: string, candidates: Array<{id: string, text: string}>) => {id: string, result: MatchResult} | null
Find the best match from a list of candidates.Returns null if no candidate exceeds minScore.
addSynonyms
(group: string[]) => void
Register a custom synonym group. All words in the group are considered equivalent.

Usage Examples

Basic Scoring

const matcher = createHybridMatcher();

const result = matcher.score(
  'fix authentication bug',
  'repair login security issue'
);

console.log(result);
// {
//   score: 0.75,
//   keywordScore: 0.0,    // No exact keyword overlap
//   fuzzyScore: 0.0,       // Words are different
//   synonymScore: 1.0      // 'fix' ↔ 'repair', 'authentication' ↔ 'security'
// }

Finding Best Match

const matcher = createHybridMatcher({ minScore: 0.5 });

const query = 'optimize database performance';
const candidates = [
  { id: 'agent-1', text: 'research API endpoints' },
  { id: 'agent-2', text: 'improve database speed' },
  { id: 'agent-3', text: 'write documentation' },
];

const best = matcher.findBest(query, candidates);
if (best) {
  console.log(`Best match: ${best.id} (score: ${best.result.score})`);
  // Best match: agent-2 (score: 0.85)
}

Custom Synonyms

const matcher = createHybridMatcher();

// Add domain-specific synonyms
matcher.addSynonyms(['react', 'preact', 'solid', 'svelte']);
matcher.addSynonyms(['postgres', 'postgresql', 'pg']);

const result = matcher.score(
  'build react component',
  'create svelte widget'
);

console.log(result.synonymScore); // High score due to react ↔ svelte

Built-in Synonyms

The matcher includes 20 synonym groups covering common agent task vocabulary:
  • developer, engineer, coder, programmer
  • research, analyze, investigate, study, examine
  • write, compose, draft, author, create
  • design, architect, blueprint, plan, layout
  • test, verify, validate, check, assess
  • fix, repair, patch, resolve, debug
  • review, evaluate, critique, audit, inspect
  • document, describe, explain, annotate, record
  • optimize, improve, enhance, refine, tune
  • deploy, release, ship, publish, launch
Plus 10 more groups for: database, frontend, backend, security, performance, monitoring, configuration, migration, summarization, and comparison tasks.

Scoring Algorithm

1. Tokenization

Text is tokenized by:
  • Converting to lowercase
  • Removing punctuation
  • Splitting on whitespace
  • Filtering stop words (“the”, “is”, “and”, etc.)
  • Removing tokens shorter than 3 characters

2. Keyword Score

keywordScore = matches / min(queryTokens.length, targetTokens.length)
Counts exact token matches, normalized by the smaller set.

3. Fuzzy Score

for each queryToken:
  bestSim = max(tokenFuzzySimilarity(queryToken, targetToken))
  if bestSim > 0.5: totalSim += bestSim
fuzzyScore = totalSim / queryTokens.length
Uses Levenshtein distance with substring match bonus (0.8 for 4+ char substrings).

4. Synonym Score

for each queryToken:
  if queryToken in synonymGroup:
    if any synonym from group exists in target:
      synonymMatches++
synonymScore = synonymMatches / queryTokens.length

5. Combined Score

score = clamp(
  keywordScore * weights.keyword +
  fuzzyScore * weights.fuzzy +
  synonymScore * weights.synonym,
  0, 1
)

Performance

  • Tokenization: O(n) where n = text length
  • Keyword scoring: O(min(q, t)) where q, t = token counts
  • Fuzzy scoring: O(q × t × L) where L = average token length
  • Memory: O(synonyms) - constant per matcher instance
  • Typical latency: 1-5ms for short texts (<200 chars)

Use Cases

Delegation Reuse

Find existing sub-agents that can handle similar tasks:
const matcher = createHybridMatcher({ minScore: 0.6 });

const existingAgents = [
  { id: 'agent-1', text: 'research TypeScript best practices' },
  { id: 'agent-2', text: 'write API documentation' },
];

const newTask = 'investigate JavaScript coding standards';
const match = matcher.findBest(newTask, existingAgents);

if (match) {
  console.log(`Reusing ${match.id}`);
} else {
  console.log('Creating new agent');
}

Template Matching

Match tasks to role templates:
const templates = [
  { id: 'researcher', text: 'research analyze investigate study' },
  { id: 'developer', text: 'code implement build develop engineer' },
  { id: 'writer', text: 'write document compose draft author' },
];

const task = 'study authentication methods';
const best = matcher.findBest(task, templates);
// Returns: researcher (high synonym match on 'study' → 'research')
FeatureHybrid MatcherVector/Embedding Search
DependencyZero (built-in)OpenAI API or local model
CostFree$0.0001/token (OpenAI)
Latency1-5ms50–200ms (API) or 10–50ms (local)
Offline✅ Fully offline❌ Requires API or model
Typo tolerance✅ Levenshtein-based❌ Embedding-dependent
Interpretability✅ Clear scoring breakdown❌ Opaque vector similarity