[PATCH] D59407: [clangd] Add RelationSlab

Sam McCall via Phabricator via cfe-commits Thu, 28 Mar 2019 11:11:53 -0700

sammccall added a comment.

(Sorry to arrive late at this discussion, I'm just back from leave.
I have some suggestions and would appreciate your thoughts, but if simply this 
feels too much like going around in circles I'm happy to work out how we can 
address this after this patch lands instead.
Have discussed offline with @gribozavr and I think we're roughly on the same 
page. @kadircet is now out on leave)


I think the data model might be overly complicated here.
I can see how the discussion got here: it mirrors the other `*Slab` types quite 
closely. But:

- those types were designed to solve specific problems that we don't have here 
(string deduplication and symbol merging)
- I think the class-parent relation is pretty sparse (maybe 1 edge per 10 
symbols?) so lots of options will work fine
- I don't think we yet know what the more resource-critical (denser) relations 
and queries are, so it's unclear what to optimize for

I think the simplest model that can work here is something like:

- `Relation` is `struct { SymbolID Subject; SymbolRole Relation; SymbolID 
Object }`
- this is a value type, so could be passed around simply as 
`std::vector<Relation>`. If `RelationSlab` is desired for symmetry, it could be 
an alias or simple wrapper around `std::vector<Relation>`.

This has a few advantages:

- the `Relation` value_type is self-contained, has clearer semantics, and works 
as the result of any type of query: based on subject and/or predicate and/or 
object. (Similar to how it's convenient that Symbol contains SymbolID).
- if lookup is desired, lookup by subject, subject+predicate, or full-triple is 
possible by sorting in SPO order with binary search. Return type is simple 
`ArrayRef<Relation>`. (Though fancy lookup maybe better belongs in MemIndex/Dex 
rather than in the slab). For more query types, storing a copies in OSP and/or 
POS order is pretty cheap too.
- memory efficiency: it costs `20*distinct(subject, predicate, object)` bytes, 
vs `28*distinct(subject, predicate) + 8 * distinct(subject, predicate, 
object)`. Unless the average number of objects for a `(subject, predicate)` 
pair (that has at least one object) is >2.3, the former is smaller. For the 
current use case, this average is certainly <1.1.
- simplicity: no arenas, easy mental model.

I see discussion of (something like) this option stalled around the index 
constructors, but I don't see a fundamental block there. The concept that the 
index constructor should be templated over would be `Iterable<Relation>`. LMK 
if I'm missing something.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59407/new/

https://reviews.llvm.org/D59407



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59407: [clangd] Add RelationSlab

Reply via email to