Beyond the Depth Wall: Solving Deep Graph Traversal

When you ask a graph database to find every user reachable within six hops of a starting node, you are asking it to walk an exponentially expanding tree of relationships. On a real social graph, each additional hop multiplies the working set, and most graph engines hit a wall somewhere between hop four and hop seven. The wall is visible as response times measured in minutes rather than milliseconds, or as queries that simply never complete.

We have been benchmarking DataGraphs (DG) against Neo4j on the Pokec social network dataset, and the depth-of-query results reveal an architectural difference with implications well beyond benchmark headlines. This post walks through the data, the underlying mathematics, and what it means for product builders working with deeply connected graph data.

Bottom line

Across the depths that matter for real product features (3 to 7 hops), DG returns in tens of milliseconds where Neo4j takes minutes, hours, or days, and does so in a memory footprint roughly 178× smaller on the medium dataset. The compound effect across latency, memory, and concurrent-load percentiles changes what is buildable:

Recommendation and social-graph products (friends-of-friends, "people you may know") move from yesterday's overnight batch to live suggestions on the response path.
Fraud and anti-abuse signal extraction at 3-to-6 hops out from a suspect node moves from offline reports to an inline check at the moment of a flagged transaction.
Knowledge-graph entity expansion at typical 3-to-5-hop depths moves from analyst tool to interactive product feature.
Real-time API serving from a graph backend becomes practical: sub-millisecond p50 latency under 32-way concurrent load fits inside any customer-facing SLA budget.
Deployment economics shift by an order of magnitude; a 1.6M-node graph fits in 4 GB of RAM, where Neo4j needs at least 48 GB to operate comfortably. Low-latency graphs for on-device AI are a reality.

The detailed case analysis is at the end of the post.

The setup

Pokec is the real-world Slovak social network. We run two sizes:

Dataset	Nodes	Edges (approx.)
Medium	~100K	~1.5M
Large	~1.6M	~30M

These are modest by production standards. Most real graphs are orders of magnitude larger. That matters for how to read the results below: the depth wall does not appear at "big data" scale; it appears almost immediately. Even a 100K-node graph is large enough that deep traversal is simply not achievable for most graph databases.

All measurements run on an identical AWS EC2 i4i.2xlarge instance: 8 vCPU Intel Xeon Platinum 8375C at 2.9 GHz, 64 GB RAM, NVMe SSD. Two engines under test:

DG (datagraphs-rs-mmap v2.25.0): the DataGraphs memory-mapped engine, backed by the OS page cache
Neo4j 2026.02.2: configured per Neo4j's recommended production defaults

The query mix has 30 queries covering five categories: aggregations (node and edge counts), point lookups, scan-and-filter operations (distinct ages, demographic filters), shallow traversal (1-hop and 2-hop), and deep traversal (2 to 7 hops). Each query has a warm-median timing computed from 20 post-warmup iterations. The full GQL text for every query referenced below is in the addendum.

The math of deep traversal

To understand why depth is the killer dimension, consider the branching structure of a social graph. If every user has, on average, k friends, then a 1-hop traversal touches k nodes; 2-hop touches up to k²; 3-hop touches up to k³; and so on. With the typical social-network branching factor in the tens, the working set grows from tens to hundreds to tens of thousands of candidate paths inside four hops. The small-world property of human social graphs (six-degrees-of-separation) means that by depth five or six, you have effectively reached most of the connected component.

Growth regimes for query time as a function of depth d include:

Polynomial growth: T(d) ∝ d^k for some small constant k. The engine bounds work-per-hop through structural reasoning. Manageable at all depths.
Exponential growth: T(d) ∝ b^d. Each hop multiplies cost by a constant factor. Tractable to depth five or six, expensive beyond.
Super-exponential growth: T(d) ∝ b^(d^c) for c > 1. The effective branching factor itself accelerates with depth. Intractable beyond depth four or five.

When we fit these models to the measured depth curves for each engine, the qualitative picture is clear. Data Graphs fits a power-law model with k in the 2 to 3 range, well-behaved linear polynomial growth. Neo4j fits a super-exponential model with c notably above 1 on the single-user series, meaning the effective branching factor is not just multiplying the working set per hop, it is multiplying it by an increasing factor each hop.

This is not a constant-factor difference. It is a different complexity class.

All-users distinct N-hop queries on Pokec medium, log-scale latency. Solid lines = measured points (hops 2–7); dashed lines = best-fit projection. Data Graphs holds polynomial growth; Neo4j fits a super-exponential model.

Latency: orders of magnitude on deep traversals

On the medium dataset (100K nodes) the warm-median latency comparison at depth 6 and 7 looks like this:

Query	DG	Neo4j	Speedup
All-users distinct 6-hop	82 ms	158,023 s (≈ 1.83 days)	1,936,558 ×
Single-user distinct 7-hop	70 ms	33,445 s (≈ 9.3 hr)	477,985 ×
Single-user 4-hop (women)	26 ms	374 s (≈ 6.2 min)	14,525 ×
Single-user distinct 4-hop	29 ms	366 s (≈ 6.1 min)	12,759 ×
All-users distinct 5-hop	69 ms	718 s (≈ 12 min)	10,340 ×
Single-user distinct 6-hop	54 ms	110 s (≈ 1.8 min)	2,039 ×
Single-user distinct 3-hop	13 ms	12.2 s	939 ×
cver_same_age_3hop	23 ms	18.5 s	802 ×
All-users distinct 4-hop	54 ms	22.5 s	414 ×

These are not typos. Neo4j takes minutes to days to traverse depths where DG is back under 100 milliseconds. The all-users 6-hop on the medium dataset takes Neo4j nearly two days; the seven-hop variant is omitted because it does not complete in reasonable time at all.
On the 1.6M-node large dataset, the directionality is the same: DG is 2 to 168 times faster on multi-hop queries that Neo can still complete, and Neo cannot complete five-or-more-hop all-users queries or six-or-more-hop single-user queries.

Query (large dataset)	DG	Neo4j	Speedup
Single-user 4-hop (women)	90 ms	15.1 s	168 ×
Single-user 4-hop	105 ms	15.6 s	148 ×
All-users 5-hop	1.19 s	10.4 s	8.7 ×
Single-user 3-hop	30 ms	205 ms	6.7 ×
All-users 4-hop	1.03 s	2.15 s	2.1 ×

The version 2.25.0 release of the DG engine alone halved deep-traversal latency on medium relative to the previous month's run, applying uniformly across multi-hop queries (45 to 55 percent reduction at every depth from 3 to 7). The story matters in operational terms: at the higher end of these deltas, the engine moved from "this query takes longer than a typical HTTP timeout" to "this query is well under the response budget for an interactive page."

Memory: a different category of footprint

Performance is academic if you cannot afford the deployment. By profiling each query in a freshly restarted process, we can attribute peak memory cleanly and compare resident set size directly.

Engine, dataset	Import (loaded baseline)	Peak (worst query)
DG, medium	101 MB	238 MB
DG, large	1,451 MB	3,719 MB
Neo4j, medium	42,138 MB	42,291 MB
Neo4j, large	42,118 MB	43,283 MB

Three observations stand out.

Neo4j's footprint is essentially constant across dataset size. The JVM claims its heap budget upfront and barely grows under query load. Medium and large produce the same ~42 GB working set. This is not "memory needed" so much as "memory occupied," but on a host the distinction is moot: that RAM is unavailable to anything else.

On the medium dataset the gap is 178×. A 100K-node graph that fits in ~240 MB for DG consumes 42 GB for Neo4j.

DG's working set is roughly half evictable. The peak RSS for DG on large splits as 1.86 GB anonymous (heap, pinned) and 1.86 GB file-backed (mmap, OS page cache, reclaimable under memory pressure). Neo4j has effectively no file-backed component: its 43 GB is all anonymous heap. So the truly pinned working set on the 1.6M-node graph is ~1.8 GB for DG versus ~43 GB for Neo4j, a 24-fold difference in non-evictable footprint.

Peak resident set size vs a 64 GB host. Data Graphs bars (medium 238 MB, large 3.6 GB) are slivers; Neo4j bars (~42 GB) consume two-thirds of the host.

This translates directly to deployment economics. On a typical 64 GB cloud host:

Neo4j large: 68% of RAM consumed at idle
DG large: 5.8% of RAM consumed at idle
DG could host a dataset roughly 12× larger in the same RAM envelope, or could run on a 4 GB instance where Neo4j requires at least 48 GB to operate comfortably

Per-query Data Graphs peak RAM on Pokec large (log scale). Darker portion = anonymous heap (pinned); lighter portion = file-backed page cache (OS-evictable). Dashed red line = Neo4j whole-process peak (42 GB).

Throughput: real-time versus interactive

Single-query latency tells you the best-case response time for an isolated request. Production workloads have many concurrent users, and tail-latency under load is what governs user experience.
We ran each engine through a 21-query mix at four concurrency levels (8, 16, 24, 32 parallel clients) over a sustained 30-second window. Aggregate throughput at saturation (concurrency = 8, matching the host vCPU count):

Engine	Medium QPS	Large QPS
DG	122,068	116,712
Neo4j	16,083	29,958
DG / Neo ratio	7.6 ×	3.9 ×

Both engines plateau at concurrency = 8. They are host-CPU-bound, not engine-contended. Adding more parallel clients beyond the vCPU count gains nothing for either of them.

Aggregate queries-per-second across concurrency levels 8 → 32 on Pokec medium. Both engines plateau at c=8 (host-CPU-bound on 8 vCPUs). Data Graphs ~122K QPS, Neo4j ~16K QPS, a 7.6× ratio that holds across the entire concurrency sweep.

The latency-under-load picture is sharper. At concurrency = 32, aggregate latencies on the medium dataset:

Percentile	DG	Neo4j	Neo / DG
p50	0.83 ms	8.8 ms	10.6 ×
p95	26 ms	239 ms	9.2 ×
p99	85 ms	399 ms	4.7 ×
p99.9	253 ms	1,344 ms	5.3 ×

DG's p50 of 0.83 ms means half of all queries return in under one millisecond even under 32-way contention. That is a different product category. Sub-millisecond p50 is the latency profile of a memory cache or an in-process call, not of a database query. On the large dataset the p50 holds at 0.83 ms. Concurrency and graph size do not move the median.

Per-query p95 latency on Pokec medium under 32-way concurrent load (log scale). Data Graphs (blue) stays in the low tens of ms across the workload mix; Neo4j (red) shows multi-hundred-ms tails on most queries.

What this means for deep-query graph use cases

The benchmark numbers above are not abstract. Each headline corresponds to a class of product behavior that becomes possible or impossible depending on engine choice.

Fraud and anti-abuse signal extraction. Money-laundering, sybil attacks, coordinated inauthentic behavior: these all operate on the assumption that the graph structure 3 to 6 hops out from a suspect node carries the signal. If a six-hop query takes 1.83 days (Neo4j's all-users 6-hop on medium) it is not even a batch report; it is unachievable. If it takes 82 milliseconds (DG) it is an inline check at the moment of a flagged transaction.

Recommendation and social-graph products. Friends-of-friends, second-degree connections, "people you might know" features all need 2-to-4 hop traversals over the active user graph. With Neo4j's super-exponential scaling, every additional hop multiplies the response budget. Product teams reach for ad-hoc workarounds: precomputed reachability tables, cached suggestion lists, batch-mode generation overnight. With DG, the same queries return inside the response budget at runtime. The product can serve second-degree suggestions live, not from yesterday's batch.

Knowledge graphs and entity expansion. Walking from an entity to its N-hop neighborhood is the bread-and-butter operation of knowledge-graph products. The useful depth for entity expansion is typically 3 to 5. At those depths, DG's 100x to 10,000x speedups translate into the difference between an interactive entity explorer and a research tool used by data analysts.

Real-time API serving from a graph backend. This is the operational implication of the sub-millisecond p50 under load. With a serving SLA in the 10 to 50 ms range (typical for a customer-facing API), Neo4j's median latency consumes most of the budget before the application does any work of its own. DG's 0.83 ms p50 leaves the budget effectively untouched.

Capacity and deployment economics. The 10× to 180× memory advantage means a DG-backed graph service can ship in container limits that would not accommodate Neo4j at all. A 1.6M-node graph fits in 4 GB of RAM with DG, leaving headroom on any developer laptop. The same graph in Neo4j requires a host class that costs several times more per hour and is rarely co-locatable with other services on a shared box.
Low latency and low-footprint graphs can be deployed at the edge, for on-device AI context layers, with consistent and predictable performance.

Conclusion

The story the data tells is consistent across all three measured dimensions: latency, memory, throughput. At trivial query depths the engines look superficially comparable. Beyond that, the gap is not 2× or 10×. It is compound, multi-order-of-magnitude, and growing with both depth and graph size.

Pokec's medium and large datasets are small by production standards, and that is part of the point: the depth wall does not appear at "big data" scale, it appears almost immediately. A 100K-node graph is already large enough to make six-hop traversal a days-long batch job on the established engine, and still well under a tenth of a second on DG.

For products built on deep graph queries, the choice of engine is not a tuning parameter. It defines what features are possible to ship at all. The combination of polynomial-scaling traversal, sub-millisecond median latency under load, and a non-evictable memory footprint 10 to 180 times smaller than the established benchmark reframes graph databases from "interactive analytical store" to "real-time serving infrastructure."

If you are building anything that depends on traversal at depth, the depth wall is the most important property of your stack. The data here suggests it can be pushed out by several orders of magnitude with the right engine architecture, and that the gains compound as your graph grows.

Addendum: GQL query reference

The queries below are the exact GQL/OpenCypher text issued to both engines, as referenced from the latency tables above. The user id 34885 is the single fixed starting node used across all "single user" queries; the "all users" variants traverse from every node in the graph.

Single-user distinct 3-hop adult friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(n:User)
WHERE s.id = 34885 AND n.age >= 18
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

Single-user distinct 4-hop adult friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(c)-[]->(n:User)
WHERE s.id = 34885 AND n.age >= 18
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

Single-user 4-hop adult woman friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(c)-[]->(n:User)
WHERE s.id = 34885 AND n.age >= 18 AND n.gender = 'woman'
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

Single-user distinct 6-hop adult friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(c)-[]->(d)-[]->(e)-[]->(n:User)
WHERE s.id = 34885 AND n.age >= 18
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

Single-user distinct 7-hop adult friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(c)-[]->(d)-[]->(e)-[]->(f)-[]->(n:User)
WHERE s.id = 34885 AND n.age >= 18
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

All-users distinct 4-hop adult friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(c)-[]->(n:User)
WHERE n.age >= 18
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

All-users distinct 5-hop adult friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(c)-[]->(d)-[]->(n:User)
WHERE n.age >= 18
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

All-users distinct 6-hop adult friend

MATCH (s:User)-[]->(a)-[]->(b)-[]->(c)-[]->(d)-[]->(e)-[]->(n:User)
WHERE n.age >= 18
RETURN distinct n.id as id, n.age as age
ORDER BY n.age DESC, n.id DESC LIMIT 10

cver_same_age_3hop

MATCH (s:User)-[]->(a)-[]->(b)-[]->(n:User)
WHERE s.id = 34885 AND s.age = n.age
RETURN DISTINCT n.id as id, n.age as age
ORDER BY n.id LIMIT 20