VRTCLS.AI
Methodology · Research

Identity Graphing: From Signals to Individuals

An identity graph resolves disconnected behavioral signals — a mobile session, an email open, a household address, a hashed phone — into a unified individual or household entity. This is the unglamorous infrastructure that determines whether everything else works. This article describes how the graph is built, how its confidence is calibrated, and what its limits are.

Updated 2026-05-13 · v4.7 model

What an identity graph actually is

An identity graph is a probabilistic, time-varying mapping of identifiers (device IDs, hashed contact records, addresses, behavioral fingerprints) to a unified entity (a person, or in some cases a household). It is probabilistic because linkage is not certain — most observed identifiers do not include a deterministic key. It is time-varying because identifiers change: people move, change phones, change emails, marry, divorce.

Construction

Edges in the graph are scored by Fellegi-Sunter linkage logic adapted for high-cardinality identifier classes. Strong-edge candidates (matched hashed PII fields, deterministic shared keys) anchor the graph; weak-edge candidates (co-occurrence, shared IP-time patterns, behavioral similarity) are scored and added with explicit confidence. Edges are continuously refreshed; identifiers that have not been observed within their decay window are demoted but not deleted.

Confidence calibration

Every node in the graph carries an aggregate linkage confidence in [0,1]. Calibration is performed against a labeled panel where ground-truth linkage is known. v4.7 of the graph holds at 97.4% on the panel — meaning when the graph claims linkage with > 0.9 confidence, it is correct 97.4% of the time across the test cohort. Downstream scoring weights signal contribution by node confidence; a high-signal observation from a low-confidence linkage contributes less than the same observation from a high-confidence linkage.

Household resolution

For many verticals — real estate, finance, insurance, healthcare — household resolution matters more than individual. Household entities are constructed by clustering individuals on address co-location, name patterns, and shared behavioral indicators. Household confidence is reported separately from individual confidence.

Privacy and compliance

All operations are hashed-first. The graph ingests hashed identifiers; raw PII is not stored or transacted on. Consent provenance is tracked per identifier; identifiers without verifiable consent are excluded from outputs. The architecture supports GDPR Article 17 (right to erasure) at the entity level — a verified request removes the entity and all derived signals.

Calibrated decay reference

Signal half-life — production model

Conversion velocity reference

Predictive cohort vs. cold list

Citations

  • · Fellegi, I., & Sunter, A. — A Theory for Record Linkage. JASA, 1969.
  • · Christen, P. — Data Matching: Concepts and Techniques for Record Linkage. Springer, 2012.
  • · Steorts, R. C., et al. — Performance bounds for graphical record linkage. AISTATS, 2014.

Predictive intelligence · enterprise onboarding

Move from list-buying to probability-buying.

Engage your account team for a calibrated intelligence estimate, methodology walkthrough, and a sandbox environment scored against your own audience.