Re: Exact substring search with ngrams

2015-08-27 Thread Christian Ramseyer
On 26/08/15 18:05, Erick Erickson wrote: > bq: my dog > has fleas > I wouldn't want some variant of "og ha" to match, > > Here's where the mysterious "positionIncrementGap" comes in. If you > make this field "multiValued", and index this like this: > > my dog > has fleas > > > then the positi

Re: Exact substring search with ngrams

2015-08-26 Thread Upayavira
analysis tab does not support multi-valued fields. It only analyses a single field value. On Wed, Aug 26, 2015, at 05:05 PM, Erick Erickson wrote: > bq: my dog > has fleas > I wouldn't want some variant of "og ha" to match, > > Here's where the mysterious "positionIncrementGap" comes in. If you

Re: Exact substring search with ngrams

2015-08-26 Thread Erick Erickson
bq: my dog has fleas I wouldn't want some variant of "og ha" to match, Here's where the mysterious "positionIncrementGap" comes in. If you make this field "multiValued", and index this like this: my dog has fleas or equivalently in SolrJ just doc.addField("blah", "my dog"); doc.addField("blah

Re: Exact substring search with ngrams

2015-08-26 Thread Christian Ramseyer
On 26/08/15 00:24, Erick Erickson wrote: > Hmmm, this sounds like a nonsensical question, but "what do you mean > by arbitrary substring"? > > Because if your substrings consist of whole _tokens_, then ngramming > is totally unnecessary (and gets in the way). Phrase queries with no slop > fulfill

Re: Exact substring search with ngrams

2015-08-25 Thread Erick Erickson
Hmmm, this sounds like a nonsensical question, but "what do you mean by arbitrary substring"? Because if your substrings consist of whole _tokens_, then ngramming is totally unnecessary (and gets in the way). Phrase queries with no slop fulfill this requirement. But let's assume you need to march

Exact substring search with ngrams

2015-08-25 Thread Christian Ramseyer
Hi I'm trying to build an index for technical documents that basically works like "grep", i.e. the user gives an arbitray substring somewhere in a line of a document and the exact matches will be returned. I specifically want no stemming etc. and keep all whitespace, parentheses etc. because they