Re: Help on matching a shingle in a query to a shingle in the document

2014-11-25 Thread vit
Erick, What you are saying of course makes perfect sense. But in our particular situation there is a high probability that an essential part of the query will match a meaningful part or a business name in a short description indexed as shingle. Also it is better than just a broad match. Besides I

Re: Help on matching a shingle in a query to a shingle in the document

2014-11-25 Thread Alexandre Rafalovitch
Sounds like an attempt to identify stable Multi Word Units, sometimes used in Natural Language Processing. In that case, a Shingle factory plus using the field as a facet might do the trick. The shingle will generate a "token" that is "this kind of winter" and facet will give back a count for it.

Re: Help on matching a shingle in a query to a shingle in the document

2014-11-25 Thread Erick Erickson
Tokenizers, filters and the like have no real way to figure out that some words in the query are to be ignored. In your example, how would one algorithmically determine that "this kind of winter" is important and that "Hi", "likes" and "weather" aren't? What's different about like/likes that indica