Re: ngrams with position

2016-03-11 Thread elisabeth benoit
Jack, Emir, Thanks for your answers. Moving ngram logic to client side would be a fast and easy way to test the solution and compare it with the phonetic one. Best regards, Elisabeth 2016-03-11 10:52 GMT+01:00 Emir Arnautovic : > Hi Elizabeth, > In order to see if you will get better results, y

Re: ngrams with position

2016-03-11 Thread Emir Arnautovic
Hi Elizabeth, In order to see if you will get better results, you can move ngram logic outside of analysis chain - simplest solution is to move it to client. In such setup, you should be able to use pf2 and pf3 and see if that produces desired result. Regards, Emir On 10.03.2016 13:47, elisa

Re: ngrams with position

2016-03-10 Thread Jack Krupansky
I suspect that what you really want is analogous to PF2/PF3, but based on the ngram terms that come out of query token analysis rather than using pairs/triples of source terms before analysis that are then analyzed as phrases so that all of the ngrams for a PF2/PF3 phrase must be in order rather po

Re: ngrams with position

2016-03-10 Thread elisabeth benoit
oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost proximity between words, not between ngrams. Thanks again, Elisabeth 2016-03-10 12:31 GMT+01:00 Alessandro Benedetti : > The reason pf2 and pf3 seems not a good solution to me is the fact that the > edismax query parser ca

Re: ngrams with position

2016-03-10 Thread Alessandro Benedetti
The reason pf2 and pf3 seems not a good solution to me is the fact that the edismax query parser calculate those grams on top of words shingles. So it takes the query in input, and produces the shingle based on the white space separator. i.e. if you search : "white tiger jumping" and pf2 configur

Re: ngrams with position

2016-03-10 Thread elisabeth benoit
That's the use cas, yes. Find Amsterdam with Asmtreadm. And yes, we're only doing approximative search if we get 0 result. I don't quite get why pf2 pf3 not a good solution. We're actually testing a solution close to phonetic. Some kind of word reduction. Thanks for the suggestion (and the link

Re: ngrams with position

2016-03-10 Thread Alessandro Benedetti
If I followed your use case is: I type Asmtreadm and I want document matching Amsterdam ( even if the edit distance is greater than 2) . First of all is something I hope you do only if you get 0 results, if not the overhead can be great and you are going to lose a lot of precision causing con

Re: ngrams with position

2016-03-10 Thread elisabeth benoit
I am trying to do approximative search with solr. We've tried fuzzy search, and spellcheck search, it's working ok but edit distance is limited (to 2 for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've had performance issues, and I don't think you can have an edit distance more t

Re: ngrams with position

2016-03-09 Thread Alessandro Benedetti
if you store the positions for your tokens ( and it is by default if you don't omit them), you have the relative position in the index. [1] I attach a blog post of mine, describing a little bit more in details the lucene internals. Apart from that, can you explain the problem you are trying to sol

Re: ngrams with position

2016-03-09 Thread elisabeth benoit
Hello Alessandro, You may be right. What would you use to keep relative order between, for instance, grams __a _am ams mst ste ter erd rda dam am_ of amsterdam? pf2 and pf3? That's all I can think about. Please let me know if you have more insights. Best regards, Elisabeth 2016-03-08 17:46 GMT

Re: ngrams with position

2016-03-08 Thread Alessandro Benedetti
Elizabeth, out of curiousity, could we know what you are trying to solve with that complex way of tokenisation ? Solr is really good in storing positions along with token, so I am curious to know why your are mixing the things up. Cheers On 8 March 2016 at 10:08, elisabeth benoit wrote: > Thank

Re: ngrams with position

2016-03-08 Thread elisabeth benoit
Thanks for your answer Emir, I'll check that out. Best regards, Elisabeth 2016-03-08 10:24 GMT+01:00 Emir Arnautovic : > Hi Elisabeth, > I don't think there is such token filter, so you would have to create your > own token filter that takes token and emits ngram token of specific length. > It

Re: ngrams with position

2016-03-08 Thread Emir Arnautovic
Hi Elisabeth, I don't think there is such token filter, so you would have to create your own token filter that takes token and emits ngram token of specific length. It should not be too hard to create such filter - you can take a look how nagram filter is coded - yours should be simpler than th

ngrams with position

2016-03-07 Thread elisabeth benoit
Hello, I'm using solr 4.10.1. I'd like to index words with ngrams of fix lenght with a position in the end. For instance, with fix lenght 3, Amsterdam would be something like: a0 (two spaces added at beginning) am1 ams2 mst3 ste4 ter5 erd6 rda7 dam8 am9 (one more space in the end) The number a