@tinyclaw/matcher

Three-dimensional text matching that handles synonyms, typos, and partial matches without external embedding APIs.

Installation

bun add @tinyclaw/matcher

Overview

The hybrid matcher combines three scoring dimensions:

Keyword overlap - TF-IDF-like weighting with stop-word filtering
Fuzzy matching - Levenshtein distance for typo tolerance
Synonym expansion - Built-in + user-extensible synonym groups

Designed as a drop-in replacement for simple Jaccard keyword overlap used in delegation lifecycle.

API Reference

`createHybridMatcher(config?)`

Create a hybrid matcher instance. Parameters:

config.minScore

number

Minimum combined score to consider a match.

config.weights

object

Weights for each scoring dimension. Must sum to 1.0.

keyword - Keyword overlap weight
fuzzy - Fuzzy matching weight
synonym - Synonym expansion weight

Returns: HybridMatcher

import { createHybridMatcher } from '@tinyclaw/matcher';

const matcher = createHybridMatcher({
  minScore: 0.4,
  weights: { keyword: 0.5, fuzzy: 0.2, synonym: 0.3 },
});

HybridMatcher Interface

score

(query: string, target: string) => MatchResult

Score how well two text strings match semantically.Returns a MatchResult with:

score - Combined weighted score (0.0–1.0)
keywordScore - Keyword overlap sub-score
fuzzyScore - Fuzzy/Levenshtein sub-score
synonymScore - Synonym expansion sub-score

findBest

(query: string, candidates: Array<{id: string, text: string}>) => {id: string, result: MatchResult} | null

Find the best match from a list of candidates.Returns null if no candidate exceeds minScore.

addSynonyms

(group: string[]) => void

Usage Examples

Basic Scoring

const matcher = createHybridMatcher();

const result = matcher.score(
  'fix authentication bug',
  'repair login security issue'
);

console.log(result);
// {
//   score: 0.75,
//   keywordScore: 0.0,    // No exact keyword overlap
//   fuzzyScore: 0.0,       // Words are different
//   synonymScore: 1.0      // 'fix' ↔ 'repair', 'authentication' ↔ 'security'
// }

Finding Best Match

const matcher = createHybridMatcher({ minScore: 0.5 });

const query = 'optimize database performance';
const candidates = [
  { id: 'agent-1', text: 'research API endpoints' },
  { id: 'agent-2', text: 'improve database speed' },
  { id: 'agent-3', text: 'write documentation' },
];

const best = matcher.findBest(query, candidates);
if (best) {
  console.log(`Best match: ${best.id} (score: ${best.result.score})`);
  // Best match: agent-2 (score: 0.85)
}

Custom Synonyms

const matcher = createHybridMatcher();

// Add domain-specific synonyms
matcher.addSynonyms(['react', 'preact', 'solid', 'svelte']);
matcher.addSynonyms(['postgres', 'postgresql', 'pg']);

const result = matcher.score(
  'build react component',
  'create svelte widget'
);

console.log(result.synonymScore); // High score due to react ↔ svelte

Built-in Synonyms

The matcher includes 20 synonym groups covering common agent task vocabulary:

Developer roles

developer, engineer, coder, programmer

Research & analysis

research, analyze, investigate, study, examine

Content creation

write, compose, draft, author, create

Design & planning

design, architect, blueprint, plan, layout

Testing & validation

test, verify, validate, check, assess

Bug fixing

fix, repair, patch, resolve, debug

Review processes

review, evaluate, critique, audit, inspect

Documentation

document, describe, explain, annotate, record

Optimization

optimize, improve, enhance, refine, tune

Deployment

deploy, release, ship, publish, launch

Plus 10 more groups for: database, frontend, backend, security, performance, monitoring, configuration, migration, summarization, and comparison tasks.

Scoring Algorithm

1. Tokenization

Text is tokenized by:

Converting to lowercase
Removing punctuation
Splitting on whitespace
Filtering stop words (“the”, “is”, “and”, etc.)
Removing tokens shorter than 3 characters

2. Keyword Score

keywordScore = matches / min(queryTokens.length, targetTokens.length)

Counts exact token matches, normalized by the smaller set.

3. Fuzzy Score

for each queryToken:
  bestSim = max(tokenFuzzySimilarity(queryToken, targetToken))
  if bestSim > 0.5: totalSim += bestSim
fuzzyScore = totalSim / queryTokens.length

Uses Levenshtein distance with substring match bonus (0.8 for 4+ char substrings).

4. Synonym Score

for each queryToken:
  if queryToken in synonymGroup:
    if any synonym from group exists in target:
      synonymMatches++
synonymScore = synonymMatches / queryTokens.length

5. Combined Score

score = clamp(
  keywordScore * weights.keyword +
  fuzzyScore * weights.fuzzy +
  synonymScore * weights.synonym,
  0, 1
)

Performance

Tokenization: O(n) where n = text length
Keyword scoring: O(min(q, t)) where q, t = token counts
Fuzzy scoring: O(q × t × L) where L = average token length
Memory: O(synonyms) - constant per matcher instance
Typical latency: 1-5ms for short texts (<200 chars)

Use Cases

Delegation Reuse

Find existing sub-agents that can handle similar tasks:

const matcher = createHybridMatcher({ minScore: 0.6 });

const existingAgents = [
  { id: 'agent-1', text: 'research TypeScript best practices' },
  { id: 'agent-2', text: 'write API documentation' },
];

const newTask = 'investigate JavaScript coding standards';
const match = matcher.findBest(newTask, existingAgents);

if (match) {
  console.log(`Reusing ${match.id}`);
} else {
  console.log('Creating new agent');
}

Template Matching

Match tasks to role templates:

const templates = [
  { id: 'researcher', text: 'research analyze investigate study' },
  { id: 'developer', text: 'code implement build develop engineer' },
  { id: 'writer', text: 'write document compose draft author' },
];

const task = 'study authentication methods';
const best = matcher.findBest(task, templates);
// Returns: researcher (high synonym match on 'study' → 'research')

Comparison with Vector Search

Feature	Hybrid Matcher	Vector/Embedding Search
Dependency	Zero (built-in)	OpenAI API or local model
Cost	Free	$0.0001/token (OpenAI)
Latency	1-5ms	50–200ms (API) or 10–50ms (local)
Offline	✅ Fully offline	❌ Requires API or model
Typo tolerance	✅ Levenshtein-based	❌ Embedding-dependent
Interpretability	✅ Clear scoring breakdown	❌ Opaque vector similarity

@tinyclaw/delegation - Uses matcher for sub-agent reuse
@tinyclaw/memory - Uses FTS5 instead for full-text semantic search

Documentation Index

​@tinyclaw/matcher

​Installation

​Overview

​API Reference

​createHybridMatcher(config?)

​HybridMatcher Interface

​Usage Examples

​Basic Scoring

​Finding Best Match

​Custom Synonyms

​Built-in Synonyms

​Scoring Algorithm

​1. Tokenization

​2. Keyword Score

​3. Fuzzy Score

​4. Synonym Score

​5. Combined Score

​Performance

​Use Cases

​Delegation Reuse

​Template Matching

​Comparison with Vector Search

​Related Packages