Check. The problem is they don't encode the exact length. I _think_ this patch shows you'd be OK with shorter lengths, but check: https://issues.apache.org/jira/browse/LUCENE-7730.
Note it's not the patch that counts here, just look at the table of lengths. Best, Erick On Wed, Oct 4, 2017 at 4:25 AM, John Blythe <johnbly...@gmail.com> wrote: > interesting idea. > > the field in question is one that can have a good deal of stray zeros based > on distributor skus for a product and bad entries from those entering them. > part of the matching logic for some operations look for these discrepancies > by having a simple regex that removes zeroes. so 400010 can match with > 40010 (and rightly so). issues come in the form of rare cases where 41 is a > sku by the same distributor or manufacturer and thus can end up being an > erroneous match. having a means of looking at the length would help to know > that going from 6 characters to 2 is too far a leap to be counted as a > match. > > -- > John Blythe > > On Wed, Oct 4, 2017 at 6:22 AM, alessandro.benedetti <a.benede...@sease.io> > wrote: > >> Are the norms a good approximation for you ? >> If you preserve norms at indexing time ( it is a configuration that you can >> operate in the schema.xml) you can retrieve them with this specific >> function >> query : >> >> *norm(field)* >> Returns the "norm" stored in the index for the specified field. This is the >> product of the index time boost and the length normalization factor, >> according to the Similarity for the field. >> norm(fieldName) >> >> This will not be the exact length of the field, but it can be a good >> approximation though. >> >> Cheers >> >> >> >> ----- >> --------------- >> Alessandro Benedetti >> Search Consultant, R&D Software Engineer, Director >> Sease Ltd. - www.sease.io >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >>