Back to portfolioEntity Matching Engine

Multi-Signal Entity Matching

This demo shows how combining multiple matching signals — ID-based exact matches and n-gram cosine similarity for names — dramatically improves entity resolution compared to basic string distance alone.

The algorithm uses character bigram vectors and cosine distance to handle abbreviations, reorderings, and typos that break simple Levenshtein matching. Toggle between modes and adjust the threshold to see the difference.

50% (permissive)100% (strict)
32%
Match Rate
99%
Avg Confidence
8
Matched
17
Unmatched

Multi-signal matching achieves 32% match rate vs basic matching at 12% — a 20 percentage point improvement.

Input EntityMatched ToSignalsConfidenceDetails
J.S. Bach
IPI: 00012345678
Johann Sebastian Bach
ID100%IPI: 100%
Mozart, Wolfgang A.
ID: M1002
Wolfgang Amadeus Mozart
ID100%ID: 100%
Beethoven, Ludwig Van
Ludwig van Beethoven
NAME95%Name: 95%
F. Chopin
No matchNo signals above threshold
Debussy Claude
No matchNo signals above threshold
Universal Music Publ.
No matchNo signals above threshold
Sony Music Ent.
IPI: 00078901234
Sony Music Entertainment
ID100%IPI: 100%
Warner Chapel Music
Warner Chappell Music
NAME95%Name: 95%
Kobalt Music Grp
No matchNo signals above threshold
Peer Musik Publishing
IPI: 00001234567
Peer Music Publishing
IDNAME100%IPI: 100%, Name: 90%
Vivaldi Antonio
No matchNo signals above threshold
Tchaikovsky P.
No matchNo signals above threshold
Stravinsky Igor
No matchNo signals above threshold
S. Rachmaninoff
No matchNo signals above threshold
BMG Rights Mgmt
IPI: 00055667788
BMG Rights Management
ID100%IPI: 100%
Concord Music Pub
ID: P2007
Concord Music Publishing
ID100%ID: 100%
Downtown Music
No matchNo signals above threshold
Reservoir Media
No matchNo signals above threshold
Hal Leonard Corp
No matchNo signals above threshold
Schubert Franz
No matchNo signals above threshold
Acme Publishing Inc
IPI: 00099999999
No matchNo signals above threshold
John Williams
No matchNo signals above threshold
Hans Zimmer Music
No matchNo signals above threshold
Max Martin Publishing
No matchNo signals above threshold
Billie Eilish Songs
No matchNo signals above threshold

How it works

1. Text Normalization

Input names are lowercased, stripped of accents and special characters, and normalized to handle Unicode variations. "Frédéric François Chopin" becomes "frederic francois chopin".

2. N-gram Vectorization

Each name is converted into a set of character bigrams (2-character sequences). "bach" → {ba, ac, ch}. This creates a vector representation that's robust to word reordering and partial matches.

3. Cosine Similarity

The angle between two n-gram vectors gives a similarity score from 0–100%. Unlike Levenshtein distance, this handles transpositions and abbreviations gracefully — "Vivaldi Antonio" still matches "Antonio Lucio Vivaldi" highly.

4. Multi-Signal Fusion

ID-based signals (IPI numbers, internal IDs) are combined with name similarity. When "J.S. Bach" fails name matching but the IPI number matches exactly, the system still identifies the correct entity with full confidence.