Hi Alessandro,

Thank you so much for the info. Will try that out.

Regards,
Edwin
On 8 May 2015 17:27, "Alessandro Benedetti" <benedetti.ale...@gmail.com>
wrote:

> 2015-05-08 10:14 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
>
> > Hi Alessandro,
> >
> > I'm using Solr 5.0.0, but it is still able to work. Actually I found this
> > to be better than <query>~1 or <query>~2, as it can automatically detect
> > and allow the 20% error rate that I want.
> >
> I don't think that the "double" param is supported anymore, so we should
> take a look to tricky formula underline to understand how the exact edits
> are calculated
>
>
> >
> > For this <query>~1 or <query>~2, does it mean that I'll have to manually
> > detect how many characters did I enter, before I assign the suitable
> > ~(tilde)
> > param in order to achieve the 20% error rate?
> >
> Yes
>
> > I'll probably need an edit distance of 0 for words with 3 or less
> > characters, 1 for words with 4 to 9 characters, edit distance of 2 for
> > words with 10 to 14 characters, and edit distance of 3 for words with
> more
> > than 15 characters.
> >
> This would be quite easy, just check the length and assign the proper edit
> accordingly to your requirements.
>
> >
> > Yes, for the performance I'm checking if the length check will affect the
> > query time. Thanks for your info on that. Currently my index is small, so
> > everything seems to run quite fast and the delay is un-noticeable. But
> not
> > so sure if it will slow down till it is noticeable by the user if I have
> > tens of collections with millions of records.
> >
> I think the length check will be constant time for any string ( if you are
> using java , most likely to be constant in all other languages)
> So i would say it won't be a problem in comparison with the actual query
> time.
>
> >
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 8 May 2015 at 16:53, Alessandro Benedetti <benedetti.ale...@gmail.com
> >
> > wrote:
> >
> > > Hi Zheng,
> > > actually that version of the fuzzy search is deprecated!
> > > Currently the fuzzy search syntax is :
> > > <query>~1 or <query>~2
> > > The ~(tilde)  param is the number of edit we provide to generate all
> the
> > > expanded query to run.
> > > Can I ask you which version of Solr are you using ?
> > >
> > > This article from 2011 shows the biggest change in fuzzy query, and I
> > guess
> > > it's still the current approach!
> > > Related the performance, what do you mean ?
> > > Are you worried if the length check will affect the query time ?
> > > The answer is yes, but the delay will be un-noticeable as you simply
> > check
> > > the length and apply the proper fuzzy param related.
> > > Regarding the fact fuzzy query being slower than a normal query, that
> is
> > > true, but the FST approach guarantee really fast fuzzy query.
> > > So if you do need the fuzziness, it's something you can cope with.
> > >
> > > Cheers
> > >
> > > 2015-05-08 3:12 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> > >
> > > > Thank you for the information.
> > > >
> > > > I've currently using the fuzzy search and set the edit distance value
> > to
> > > > ~0.79, and this has allowed a 20% error rate. (ie for words with 5
> > > > characters, it allows 1 mis-spelled character, and for words with 10
> > > > characters, it allows 2 mis-speed characters).
> > > >
> > > > However, for words with 4 characters, I'll need to set the value to
> > ~0.75
> > > > to allow 1 mis-spelled character, as in order to accommodate 4
> > characters
> > > > word, it requires a 25% error rate for 1 mis-spelled character. We
> > > probably
> > > > will not accommodate for 3 characters word.
> > > >
> > > > I've gotten the information from here:
> > > >
> > >
> >
> http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches
> > >
> > > >
> > > > Just to check, will this affect the performance of the system?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 7 May 2015 at 20:00, Alessandro Benedetti <
> > benedetti.ale...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi !
> > > > > Currently Solr builds FST to provide proper fuzzy search or
> > spellcheck
> > > > > suggestions based on the string distance .
> > > > > The current default algorithm is the Levenstein distance ( that
> > returns
> > > > the
> > > > > number of edit as distance metric).
> > > > > In your case you should calculate client side, the edit you want to
> > > apply
> > > > > to your search.
> > > > > In your client code, should be not difficult to process the query
> and
> > > > apply
> > > > > the proper number of edit depending on the length.
> > > > >
> > > > > Anyway the max edit for the levenstein default distance is fixed to
> > 2 .
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > >
> > > > > 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> > >:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Would like to check, how do we implement character proximity
> > > searching
> > > > > > that's in terms of percentage with regards to the length of the
> > word,
> > > > > > instead of a fixed number of edit distance (characters)?
> > > > > >
> > > > > > For example, if we have a proximity of 20%, a word with 5
> > characters
> > > > will
> > > > > > have an edit distance of 1, and a word with 10 characters will
> > > > > > automatically have an edit distance of 2.
> > > > > >
> > > > > > Will Solr be able to do that for us?
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --------------------------
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to