oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost proximity between words, not between ngrams.
Thanks again, Elisabeth 2016-03-10 12:31 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>: > The reason pf2 and pf3 seems not a good solution to me is the fact that the > edismax query parser calculate those grams on top of words shingles. > So it takes the query in input, and produces the shingle based on the white > space separator. > > i.e. if you search : > "white tiger jumping" > and pf2 configured on field1. > You are going to end up searching in field1 : > "white tiger", "tiger jumping" . > This is really useful in full text search oriented to phrases and partial > phrases match. > But it has nothing to do with the analysis type associated at query time at > this moment. > First it is used the query parser tokenisation to build the grams and then > the query time analysis is applied. > This according to my remembering, > I will double check in the code and let you know. > > Cheers > > > On 10 March 2016 at 11:02, elisabeth benoit <elisaelisael...@gmail.com> > wrote: > > > That's the use cas, yes. Find Amsterdam with Asmtreadm. > > > > And yes, we're only doing approximative search if we get 0 result. > > > > I don't quite get why pf2 pf3 not a good solution. > > > > We're actually testing a solution close to phonetic. Some kind of word > > reduction. > > > > Thanks for the suggestion (and the link), this makes me think maybe > > phonetic is the good solution. > > > > Thanks for your help, > > Elisabeth > > > > 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>: > > > > > mmmm If I followed your use case is: > > > > > > I type Asmtreadm and I want document matching Amsterdam ( even if the > > edit > > > distance is greater than 2) . > > > First of all is something I hope you do only if you get 0 results, if > not > > > the overhead can be great and you are going to lose a lot of precision > > > causing confusion in the customer. > > > > > > Pf2 and Pf3 is ngram of white space separated tokens, to make partial > > > phrase query to affect the scoring. > > > Not a good fit for your problem. > > > > > > More than grams, have you considered using some sort of phonetic > > matching ? > > > Could this help : > > > https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching > > > > > > Cheers > > > > > > On 10 March 2016 at 08:47, elisabeth benoit <elisaelisael...@gmail.com > > > > > wrote: > > > > > > > I am trying to do approximative search with solr. We've tried fuzzy > > > search, > > > > and spellcheck search, it's working ok but edit distance is limited > > (to 2 > > > > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, > we've > > > had > > > > performance issues, and I don't think you can have an edit distance > > more > > > > than 2. > > > > > > > > What we used to do with a database was more efficient: storing > trigrams > > > > with position, and then searching arround that position (not > precisely > > at > > > > that position, since it's approximative search) > > > > > > > > Position is to avoid for a trigram like ams (amsterdam) to get > answers > > > > where the same trigram is for instance at the end of the word. I > would > > > like > > > > answers with the same relative position between trigrams to score > > higher. > > > > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see > any > > > > other way. Please tell me if you do. > > > > > > > > From you're answer, I get that position is stored, but I dont > > understand > > > > how I can preserve relative order between trigrams, apart from using > > pf2 > > > > pf3. > > > > > > > > Best regards, > > > > Elisabeth > > > > > > > > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti < > abenede...@apache.org > > >: > > > > > > > > > if you store the positions for your tokens ( and it is by default > if > > > you > > > > > don't omit them), you have the relative position in the index. [1] > > > > > I attach a blog post of mine, describing a little bit more in > details > > > the > > > > > lucene internals. > > > > > > > > > > Apart from that, can you explain the problem you are trying to > solve > > ? > > > > > The high level user experience ? > > > > > What kind of search/autocompletion/relevancy tuning are you trying > to > > > > > achieve ? > > > > > Maybe we can help better if we start from the problem :) > > > > > > > > > > Cheers > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html > > > > > > > > > > On 9 March 2016 at 15:02, elisabeth benoit < > > elisaelisael...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hello Alessandro, > > > > > > > > > > > > You may be right. What would you use to keep relative order > > between, > > > > for > > > > > > instance, grams > > > > > > > > > > > > __a > > > > > > _am > > > > > > ams > > > > > > mst > > > > > > ste > > > > > > ter > > > > > > erd > > > > > > rda > > > > > > dam > > > > > > am_ > > > > > > > > > > > > of amsterdam? pf2 and pf3? That's all I can think about. Please > let > > > me > > > > > know > > > > > > if you have more insights. > > > > > > > > > > > > Best regards, > > > > > > Elisabeth > > > > > > > > > > > > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti < > > > abenede...@apache.org > > > > >: > > > > > > > > > > > > > Elizabeth, > > > > > > > out of curiousity, could we know what you are trying to solve > > with > > > > that > > > > > > > complex way of tokenisation ? > > > > > > > Solr is really good in storing positions along with token, so I > > am > > > > > > curious > > > > > > > to know why your are mixing the things up. > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > On 8 March 2016 at 10:08, elisabeth benoit < > > > > elisaelisael...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for your answer Emir, > > > > > > > > > > > > > > > > I'll check that out. > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Elisabeth > > > > > > > > > > > > > > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic < > > > > > > emir.arnauto...@sematext.com > > > > > > > >: > > > > > > > > > > > > > > > > > Hi Elisabeth, > > > > > > > > > I don't think there is such token filter, so you would have > > to > > > > > create > > > > > > > > your > > > > > > > > > own token filter that takes token and emits ngram token of > > > > specific > > > > > > > > length. > > > > > > > > > It should not be too hard to create such filter - you can > > take > > > a > > > > > look > > > > > > > how > > > > > > > > > nagram filter is coded - yours should be simpler than that. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Emir > > > > > > > > > > > > > > > > > > > > > > > > > > > On 08.03.2016 08:52, elisabeth benoit wrote: > > > > > > > > > > > > > > > > > >> Hello, > > > > > > > > >> > > > > > > > > >> I'm using solr 4.10.1. I'd like to index words with ngrams > > of > > > > fix > > > > > > > lenght > > > > > > > > >> with a position in the end. > > > > > > > > >> > > > > > > > > >> For instance, with fix lenght 3, Amsterdam would be > > something > > > > > like: > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> a0 (two spaces added at beginning) > > > > > > > > >> am1 > > > > > > > > >> ams2 > > > > > > > > >> mst3 > > > > > > > > >> ste4 > > > > > > > > >> ter5 > > > > > > > > >> erd6 > > > > > > > > >> rda7 > > > > > > > > >> dam8 > > > > > > > > >> am9 (one more space in the end) > > > > > > > > >> > > > > > > > > >> The number at the end being the position. > > > > > > > > >> > > > > > > > > >> Does anyone have a clue how to achieve this? > > > > > > > > >> > > > > > > > > >> Best regards, > > > > > > > > >> Elisabeth > > > > > > > > >> > > > > > > > > >> > > > > > > > > > -- > > > > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log > > > > > > Management > > > > > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > -------------------------- > > > > > > > > > > > > > > Benedetti Alessandro > > > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > > In the forests of the night, > > > > > > > What immortal hand or eye > > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -------------------------- > > > > > > > > > > Benedetti Alessandro > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > "Tyger, tyger burning bright > > > > > In the forests of the night, > > > > > What immortal hand or eye > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >