Re: Is there a way to tell if multivalued field actually contains multiple values?

Michael McCandless Fri, 11 Nov 2016 03:09:51 -0800

I think you can use the term stats that Lucene tracks for each field.

Compare Terms.getSumTotalTermFreq and Terms.getDocCount.  If they are
equal it means every document that had this field, had only one token.


Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 11, 2016 at 5:50 AM, Mikhail Khludnev <m...@apache.org> wrote:
> I suppose it's needless to remind that norm(field) is proportional (but not
> precisely by default) to number of tokens in a doc's field (although not
> actual text values).
>
> On Fri, Nov 11, 2016 at 5:08 AM, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Say I indexed a large dataset against a schemaless configuration. Now
>> I have a bunch of multivalued fields. Is there any way to say which of
>> these (text) fields have (for given data) only single values? I know I
>> am supposed to look at the original data, and all that, but this is
>> more for debugging/troubleshooting.
>>
>> Turning termOffsets/termPositions would make it easy, but that's a bit
>> messy for troubleshooting purposes.
>>
>> I was thinking that one giveaway is the positionIncrementGap causing
>> the second value's token to start at number above a hundred. But I am
>> not sure how to craft a query against a field to see if such a token
>> is generically present.
>>
>>
>> Any ideas?
>>
>> Regards,
>>     Alex.
>>
>> ----
>> Solr Example reading group is starting November 2016, join us at
>> http://j.mp/SolrERG
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

Re: Is there a way to tell if multivalued field actually contains multiple values?

Reply via email to