Mikhail,
On 6/2/22 04:16, Mikhail Khludnev wrote:
Hi, Chris.
Usually I'm concerned when values (enums) leak into field names. It's
better to control the set of fields.
I rather go with separate values for faster searching and faceting and
concatenations:
"locations" : [ "denver", "chicago", "washington" ],
"roles : [ "admin", "staff","basic" ]
"location_roles : [ "denver_admin", "chicago_staff",
"washington_basic" ]
so:
users who are admins in denver?
q=location_roles:denver_admin
That's a really great idea. Unfortunately, my _actual_ use-case is
arbitrary integral numeric data and not a set of well-defined enums. On
the other hand, upon re-examination, the only thing I really care about
for /searching/ is the presence of the item in a specific list, not
necessarily its value. So I believe I *can* add a multi-valued field like:
"locations" : [ "denver", "chicago", "washington" ]
"flagged_locations" : [ "chicago" ]
"location_roles : [ "denver_admin", "chicago_staff",
> "washington_basic" ]
And search for flagged_locations:chicago
Once I have found my "chicago" user, I can display the roles that user
has in Chicago using the stored-field "location_roles" which I don't
actually have to use for searching.
Apologies for the imprecise analogy as the example. I incorrectly stated
that I needed to search for users who were staff in chicago. When I
build the index, I will know if "chicago" is an important location and
can put *that* into the index.
The next level of complexity is dependent facets: ie count facets roles in
denver.
For this case Chis shouldn't contribute "staff", "basic" as facet values.
However, they are counted by facet.field.
Yon still can count such dependent faces with these concatenations via
tricky post-processing.
FWIW, bulletproof solution for this is indexing roles as user's
subdocuments. It's really performant, but deadly complex.
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-nested-documents.html
https://solr.apache.org/guide/solr/latest/query-guide/json-query-dsl.html#faceting-nested-documents
I've known about subdocuments for years, but never had a use-case for
that seemed to fix the complexity. I may be getting close to that, here,
but I think I have a simpler solution brewing in my mind.
Thanks a ton for your feedback.
Thanks,
-chris
On Thu, Jun 2, 2022 at 1:04 AM Christopher Schultz <
[email protected]> wrote:
All,
Since Solr / Lucene can't define arbitrary fields in documents, I wonder
what the recommended technique is for storing structured-information in
a document?
I'd like to store information about an entity and is specific to related
entities (not stored in the index). This isn't my actual use-case, but
let's take an example of a user who has different privileges in
different locations.
If Solr / Lucene allowed me to do so, I might model it like this:
users : [
{
"username": "chris",
"locations" : [ "denver", "chicago", "washington" ],
"location_denver_role" : "admin",
"location_chicago_role" : "staff",
"location_washington_role" : "basic"
},
{ ... }
]
Since I can't have a field called "location_denver_role",
"location_chicago_role", etc. (well, I can, but I have a huge number of
locations to deal with and defining a field for each seems stupid), I
was thinking of maybe something like this:
users : [
{
"username": "chris",
"locations" : [ "denver", "chicago", "washington" ],
"location_roles : [ { "denver" : "admin", "chicago" : "staff",
"washington" : "basic" ]
},
{ ... }
]
So now I have a single field called "location_roles" but it's got
"structure" inside of it. I could obviously search the other fields
directly in Solr and then filter-out the records I want manually, but
what's the best way to structure the index so that I can tell Solr I
only care about users who are admins in denver?
Lest you think I can invert the index and use:
"admin" : [ "denver" ],
"staff" : [ "chicago" ],
"basic" : [ "washington" ]
... I can't. The "role" is just a proxy for a bunch of user metadata
that may need to grow over time, including a large range of possible
values, so I can't just invert the index.
Thanks,
-chris