What about using the defaults in requestHandlers
along with SOLR-3191 to accomplish this? Let's
say that there was an fl-exclusion parameter. Now
you'd be able to define an exclusion default that
would exclude your field(s) unless overridden in your
request handler. This could be either a default or
invariant depending on how strictly you wanted to
enforce not being able to retrieve the field.

I'm not entirely sure how I feel about this option, but
wanted to throw it out for discussion. It does seem
easier to keep track of than another schema field
option.

I see no reason to make a distinction between
docValues only and stored-only though.

And one thing about your notion. docValues are only
primitive types, i.e. string in this case. There's a limit
I believe on how big these can be, 32K? Which seems
rather restrictive in this case so we're back to stored.

Not sure if that limit is configurable or not.

Erick



On Fri, Jan 13, 2017 at 11:40 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> I've got an idea for a feature that I think could be very useful.  I'd
> like to get some community feedback about it, see whether it's worth
> opening an issue for discussion.
>
> First, some background info:
>
> As I understand it, the fact that stored fields are compressed means
> that even if a particular stored field is not requested in the fl
> parameter, the data on disk for that field must still be read, in order
> to decompress the data and find the fields that ARE desired.  If one of
> the stored fields that's NOT requested is really large, that would
> pollute the OS disk cache with useless data.
>
> If the data for a field in the results comes from docValues instead of
> stored fields, I don't think it is compressed, which hopefully means
> that if a field is NOT requested, the corresponding docValues data is
> never read.
>
> And now for the idea:
>
> What if there were a schema option that would skip docValue retrieval
> for a field unless the fl parameter were to *explicitly* ask for that
> field?  With a typical wildcard value in fl, fields with this option
> enabled would not be retrieved.  If the field is not stored, not
> indexed, but has docValues, I *think* its presence on the disk would not
> affect performance (OS disk cache efficiency) unless its data is
> returned in results.
>
> One practical application, should my theory about docValues prove to be
> accurate:  Implementing a field that contains all the data sent for
> indexing, which could then be used for completely internal reindexing.
> A field like this would probably be detrimental to performance unless it
> could be automatically excluded without the client asking for the exclusion.
>
> SOLR-3191 is a sort-of related issue.  This links to SOLR-9467, which
> made me think of another potential use -- making it so certain fields
> are semi-secure because they aren't returned unless they are explicitly
> requested.  It wouldn't be TRULY secure, of course.
>
> Thanks,
> Shawn
>

Reply via email to