Read the Lucene analysis package summary section entitled "Field Section Boundaries":
http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/analysis/package-summary.html

TL;DR - if you leave it as the default, then a word at the end of one section and a word at the start of the next section would be an exact phrase match. You might ask why Lucene chose that default - I don't know, but Solr "best practice" is the opposite. I suspect that Solr chose a large number like 100 so that a phrase query could use a significant slop like 10 and still not match across sections.

In my e-book I have a section entitled "Position Increment Gap" in Chapter 2 "Analyzers Overview" that details the reasoning as well. There is also another section with the same title in the Term Vector Component chapter that runs through an example in more detail.

See:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-----Original Message----- From: Alexandre Rafalovitch
Sent: Sunday, October 12, 2014 7:40 PM
To: solr-user
Subject: What happens if you don't set positionIncrementGap

Hello,

I am working on - yet another - minimal schema, which involves the
settings that are matching defaults (or non-harming if defaults are
used). The one I am trying to figure out now is: positionIncrementGap

We set it to a 100 in all text field definitions. Does it mean it is
NOT some reasonable number by default?

I tried to trace it and all I can find is a default value in
SolrAnalyzer, which is 0.

But if it is 0 (zero), then why do we explicitly define to be 0 in all
non-text fields? Would seem to be redundant and - frankly - confusing.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

Reply via email to