Hi Alessandro, Thank you so much for the info. Will try that out.
Regards, Edwin On 8 May 2015 17:27, "Alessandro Benedetti" <benedetti.ale...@gmail.com> wrote: > 2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > > > Hi Alessandro, > > > > I'm using Solr 5.0.0, but it is still able to work. Actually I found this > > to be better than <query>~1 or <query>~2, as it can automatically detect > > and allow the 20% error rate that I want. > > > I don't think that the "double" param is supported anymore, so we should > take a look to tricky formula underline to understand how the exact edits > are calculated > > > > > > For this <query>~1 or <query>~2, does it mean that I'll have to manually > > detect how many characters did I enter, before I assign the suitable > > ~(tilde) > > param in order to achieve the 20% error rate? > > > Yes > > > I'll probably need an edit distance of 0 for words with 3 or less > > characters, 1 for words with 4 to 9 characters, edit distance of 2 for > > words with 10 to 14 characters, and edit distance of 3 for words with > more > > than 15 characters. > > > This would be quite easy, just check the length and assign the proper edit > accordingly to your requirements. > > > > > Yes, for the performance I'm checking if the length check will affect the > > query time. Thanks for your info on that. Currently my index is small, so > > everything seems to run quite fast and the delay is un-noticeable. But > not > > so sure if it will slow down till it is noticeable by the user if I have > > tens of collections with millions of records. > > > I think the length check will be constant time for any string ( if you are > using java , most likely to be constant in all other languages) > So i would say it won't be a problem in comparison with the actual query > time. > > > > > > > Regards, > > Edwin > > > > > > > > On 8 May 2015 at 16:53, Alessandro Benedetti <benedetti.ale...@gmail.com > > > > wrote: > > > > > Hi Zheng, > > > actually that version of the fuzzy search is deprecated! > > > Currently the fuzzy search syntax is : > > > <query>~1 or <query>~2 > > > The ~(tilde) param is the number of edit we provide to generate all > the > > > expanded query to run. > > > Can I ask you which version of Solr are you using ? > > > > > > This article from 2011 shows the biggest change in fuzzy query, and I > > guess > > > it's still the current approach! > > > Related the performance, what do you mean ? > > > Are you worried if the length check will affect the query time ? > > > The answer is yes, but the delay will be un-noticeable as you simply > > check > > > the length and apply the proper fuzzy param related. > > > Regarding the fact fuzzy query being slower than a normal query, that > is > > > true, but the FST approach guarantee really fast fuzzy query. > > > So if you do need the fuzziness, it's something you can cope with. > > > > > > Cheers > > > > > > 2015-05-08 3:12 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > > > > > > > Thank you for the information. > > > > > > > > I've currently using the fuzzy search and set the edit distance value > > to > > > > ~0.79, and this has allowed a 20% error rate. (ie for words with 5 > > > > characters, it allows 1 mis-spelled character, and for words with 10 > > > > characters, it allows 2 mis-speed characters). > > > > > > > > However, for words with 4 characters, I'll need to set the value to > > ~0.75 > > > > to allow 1 mis-spelled character, as in order to accommodate 4 > > characters > > > > word, it requires a 25% error rate for 1 mis-spelled character. We > > > probably > > > > will not accommodate for 3 characters word. > > > > > > > > I've gotten the information from here: > > > > > > > > > > http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches > > > > > > > > > > > Just to check, will this affect the performance of the system? > > > > > > > > Regards, > > > > Edwin > > > > > > > > > > > > On 7 May 2015 at 20:00, Alessandro Benedetti < > > benedetti.ale...@gmail.com > > > > > > > > wrote: > > > > > > > > > Hi ! > > > > > Currently Solr builds FST to provide proper fuzzy search or > > spellcheck > > > > > suggestions based on the string distance . > > > > > The current default algorithm is the Levenstein distance ( that > > returns > > > > the > > > > > number of edit as distance metric). > > > > > In your case you should calculate client side, the edit you want to > > > apply > > > > > to your search. > > > > > In your client code, should be not difficult to process the query > and > > > > apply > > > > > the proper number of edit depending on the length. > > > > > > > > > > Anyway the max edit for the levenstein default distance is fixed to > > 2 . > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo < > edwinye...@gmail.com > > >: > > > > > > > > > > > Hi, > > > > > > > > > > > > Would like to check, how do we implement character proximity > > > searching > > > > > > that's in terms of percentage with regards to the length of the > > word, > > > > > > instead of a fixed number of edit distance (characters)? > > > > > > > > > > > > For example, if we have a proximity of 20%, a word with 5 > > characters > > > > will > > > > > > have an edit distance of 1, and a word with 10 characters will > > > > > > automatically have an edit distance of 2. > > > > > > > > > > > > Will Solr be able to do that for us? > > > > > > > > > > > > Regards, > > > > > > Edwin > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -------------------------- > > > > > > > > > > Benedetti Alessandro > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > "Tyger, tyger burning bright > > > > > In the forests of the night, > > > > > What immortal hand or eye > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >