Re: ngrams with position

elisabeth benoit Thu, 10 Mar 2016 04:48:40 -0800

oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost
proximity between words, not between ngrams.


Thanks again,
Elisabeth

2016-03-10 12:31 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>:

> The reason pf2 and pf3 seems not a good solution to me is the fact that the
> edismax query parser calculate those grams on top of words shingles.
> So it takes the query in input, and produces the shingle based on the white
> space separator.
>
> i.e. if you search :
> "white tiger jumping"
>  and pf2 configured on field1.
> You are going to end up searching in field1 :
> "white tiger", "tiger jumping" .
> This is really useful in full text search oriented to phrases and partial
> phrases match.
> But it has nothing to do with the analysis type associated at query time at
> this moment.
> First it is used the query parser tokenisation to build the grams and then
> the query time analysis is applied.
> This according to my remembering,
> I will double check in the code and let you know.
>
> Cheers
>
>
> On 10 March 2016 at 11:02, elisabeth benoit <elisaelisael...@gmail.com>
> wrote:
>
> > That's the use cas, yes. Find Amsterdam with Asmtreadm.
> >
> > And yes, we're only doing approximative search if we get 0 result.
> >
> > I don't quite get why pf2 pf3 not a good solution.
> >
> > We're actually testing a solution close to phonetic. Some kind of word
> > reduction.
> >
> > Thanks for the suggestion (and the link), this makes me think maybe
> > phonetic is the good solution.
> >
> > Thanks for your help,
> > Elisabeth
> >
> > 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti <abenede...@apache.org>:
> >
> > > mmmm If I followed your use case is:
> > >
> > > I type Asmtreadm and I want document matching Amsterdam ( even if the
> > edit
> > > distance is greater than 2) .
> > > First of all is something I hope you do only if you get 0 results, if
> not
> > > the overhead can be great and you are going to lose a lot of precision
> > > causing confusion in the customer.
> > >
> > > Pf2 and Pf3 is ngram of white space separated tokens, to make partial
> > > phrase query to affect the scoring.
> > > Not a good fit for your problem.
> > >
> > > More than grams, have you considered using some sort of phonetic
> > matching ?
> > > Could this help :
> > > https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
> > >
> > > Cheers
> > >
> > > On 10 March 2016 at 08:47, elisabeth benoit <elisaelisael...@gmail.com
> >
> > > wrote:
> > >
> > > > I am trying to do approximative search with solr. We've tried fuzzy
> > > search,
> > > > and spellcheck search, it's working ok but edit distance is limited
> > (to 2
> > > > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator,
> we've
> > > had
> > > > performance issues, and I don't think you can have an edit distance
> > more
> > > > than 2.
> > > >
> > > > What we used to do with a database was more efficient: storing
> trigrams
> > > > with position, and then searching arround that position (not
> precisely
> > at
> > > > that position, since it's approximative search)
> > > >
> > > > Position is to avoid  for a trigram like ams (amsterdam) to get
> answers
> > > > where the same trigram is for instance at the end of the word. I
> would
> > > like
> > > > answers with the same relative position between trigrams to score
> > higher.
> > > > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see
> any
> > > > other way. Please tell me if you do.
> > > >
> > > > From you're answer, I get that position is stored, but I dont
> > understand
> > > > how I can preserve relative order between trigrams, apart from using
> > pf2
> > > > pf3.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti <
> abenede...@apache.org
> > >:
> > > >
> > > > > if you store the positions for your tokens ( and it is by default
> if
> > > you
> > > > > don't omit them), you have the relative position in the index. [1]
> > > > > I attach a blog post of mine, describing a little bit more in
> details
> > > the
> > > > > lucene internals.
> > > > >
> > > > > Apart from that, can you explain the problem you are trying to
> solve
> > ?
> > > > > The high level user experience ?
> > > > > What kind of search/autocompletion/relevancy tuning are you trying
> to
> > > > > achieve ?
> > > > > Maybe we can help better if we start from the problem :)
> > > > >
> > > > > Cheers
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
> > > > >
> > > > > On 9 March 2016 at 15:02, elisabeth benoit <
> > elisaelisael...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello Alessandro,
> > > > > >
> > > > > > You may be right. What would you use to keep relative order
> > between,
> > > > for
> > > > > > instance, grams
> > > > > >
> > > > > > __a
> > > > > > _am
> > > > > > ams
> > > > > > mst
> > > > > > ste
> > > > > > ter
> > > > > > erd
> > > > > > rda
> > > > > > dam
> > > > > > am_
> > > > > >
> > > > > > of amsterdam? pf2 and pf3? That's all I can think about. Please
> let
> > > me
> > > > > know
> > > > > > if you have more insights.
> > > > > >
> > > > > > Best regards,
> > > > > > Elisabeth
> > > > > >
> > > > > > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti <
> > > abenede...@apache.org
> > > > >:
> > > > > >
> > > > > > > Elizabeth,
> > > > > > > out of curiousity, could we know what you are trying to solve
> > with
> > > > that
> > > > > > > complex way of tokenisation ?
> > > > > > > Solr is really good in storing positions along with token, so I
> > am
> > > > > > curious
> > > > > > > to know why your are mixing the things up.
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On 8 March 2016 at 10:08, elisabeth benoit <
> > > > elisaelisael...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for your answer Emir,
> > > > > > > >
> > > > > > > > I'll check that out.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Elisabeth
> > > > > > > >
> > > > > > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic <
> > > > > > emir.arnauto...@sematext.com
> > > > > > > >:
> > > > > > > >
> > > > > > > > > Hi Elisabeth,
> > > > > > > > > I don't think there is such token filter, so you would have
> > to
> > > > > create
> > > > > > > > your
> > > > > > > > > own token filter that takes token and emits ngram token of
> > > > specific
> > > > > > > > length.
> > > > > > > > > It should not be too hard to create such filter - you can
> > take
> > > a
> > > > > look
> > > > > > > how
> > > > > > > > > nagram filter is coded - yours should be simpler than that.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Emir
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > > > > > > > >
> > > > > > > > >> Hello,
> > > > > > > > >>
> > > > > > > > >> I'm using solr 4.10.1. I'd like to index words with ngrams
> > of
> > > > fix
> > > > > > > lenght
> > > > > > > > >> with a position in the end.
> > > > > > > > >>
> > > > > > > > >> For instance, with fix lenght 3, Amsterdam would be
> > something
> > > > > like:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> a0 (two spaces added at beginning)
> > > > > > > > >> am1
> > > > > > > > >> ams2
> > > > > > > > >> mst3
> > > > > > > > >> ste4
> > > > > > > > >> ter5
> > > > > > > > >> erd6
> > > > > > > > >> rda7
> > > > > > > > >> dam8
> > > > > > > > >> am9 (one more space in the end)
> > > > > > > > >>
> > > > > > > > >> The number at the end being the position.
> > > > > > > > >>
> > > > > > > > >> Does anyone have a clue how to achieve this?
> > > > > > > > >>
> > > > > > > > >> Best regards,
> > > > > > > > >> Elisabeth
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > > --
> > > > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > > > > > Management
> > > > > > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > --------------------------
> > > > > > >
> > > > > > > Benedetti Alessandro
> > > > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > > > >
> > > > > > > "Tyger, tyger burning bright
> > > > > > > In the forests of the night,
> > > > > > > What immortal hand or eye
> > > > > > > Could frame thy fearful symmetry?"
> > > > > > >
> > > > > > > William Blake - Songs of Experience -1794 England
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --------------------------
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: ngrams with position

Reply via email to