Re: Extending solr analysis in index time

Ali Nazemian Tue, 13 Jan 2015 07:45:06 -0800

I decided to go for function query and implementing function query to read
term frequency for each document from index. Anyway I did not find any
tutorial which is matched my problem well. I really appreciate if somebody
could provide me some useful tutorial or example for this case.
Thank you very much.


On Tue, Jan 13, 2015 at 4:21 PM, Jack Krupansky <[email protected]>
wrote:

> A function query or an update processor to create a separate field are
> still your best options.
>
> -- Jack Krupansky
>
> On Tue, Jan 13, 2015 at 4:18 AM, Ali Nazemian <[email protected]>
> wrote:
>
> > Dear Markus,
> >
> > Unfortunately I can not use payload since I want to retrieve this score
> to
> > each user as a simple field alongside other fields. Unfortunately payload
> > does not provide that. Also I dont want to change the default similarity
> > method of Lucene, I just want to have this filed to do the sorting in
> some
> > cases.
> > Best regards.
> >
> > On Mon, Jan 12, 2015 at 10:26 PM, Markus Jelsma <
> > [email protected]>
> > wrote:
> >
> > > Hi - You mention having a list with important terms, then using
> payloads
> > > would be the most straightforward i suppose. You still need a custom
> > > similarity and custom query parser. Payloads work for us very well.
> > >
> > > M
> > >
> > >
> > >
> > > -----Original message-----
> > > > From:Ahmet Arslan <[email protected]>
> > > > Sent: Monday 12th January 2015 19:50
> > > > To: [email protected]
> > > > Subject: Re: Extending solr analysis in index time
> > > >
> > > > Hi Ali,
> > > >
> > > > Reading your example, if you could somehow replace idf component with
> > > your "importance weight",
> > > > I think your use case looks like TFIDFSimilarity. Tf component
> remains
> > > same.
> > > >
> > > >
> > >
> >
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > >
> > > > I also suggest you ask this in lucene mailing list. Someone familiar
> > > with similarity package can give insight on this.
> > > >
> > > > Ahmet
> > > >
> > > >
> > > >
> > > > On Monday, January 12, 2015 6:54 PM, Jack Krupansky <
> > > [email protected]> wrote:
> > > > Could you clarify what you mean by "Lucene reverse index"? That's
> not a
> > > > term I am familiar with.
> > > >
> > > > -- Jack Krupansky
> > > >
> > > >
> > > > On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian <[email protected]
> >
> > > wrote:
> > > >
> > > > > Dear Jack,
> > > > > Thank you very much.
> > > > > Yeah I was thinking of function query for sorting, but I have to
> > > problems
> > > > > in this case, 1) function query do the process at query time which
> I
> > > dont
> > > > > want to. 2) I also want to have the score field for retrieving and
> > > showing
> > > > > to users.
> > > > >
> > > > > Dear Alexandre,
> > > > > Here is some more explanation about the business behind the
> question:
> > > > > I am going to provide a field for each document, lets refer it as
> > > > > "document_score". I am going to fill this field based on the
> > > information
> > > > > that could be extracted from Lucene reverse index. Assume I have a
> > > list of
> > > > > terms, called important terms and I am going to extract the term
> > > frequency
> > > > > for each of the terms inside this list per each document. To be
> > honest
> > > I
> > > > > want to use the term frequency for calculating "document_score".
> > > > > "document_score" should be storable since I am going to retrieve
> this
> > > field
> > > > > for each document. I also want to do sorting on "document_store" in
> > > case of
> > > > > preferred by user.
> > > > > I hope I did convey my point.
> > > > > Best regards.
> > > > >
> > > > >
> > > > > On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky <
> > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Won't function queries do the job at query time? You can add or
> > > multiply
> > > > > > the tf*idf score by a function of the term frequency of arbitrary
> > > terms,
> > > > > > using the tf, mul, and add functions.
> > > > > >
> > > > > > See:
> > > > > >
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
> > > > > >
> > > > > > -- Jack Krupansky
> > > > > >
> > > > > > On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Jack,
> > > > > > > Hi,
> > > > > > > I think you misunderstood my need. I dont want to change the
> > > default
> > > > > > > scoring behavior of Lucene (tf-idf) I just want to have another
> > > field
> > > > > to
> > > > > > do
> > > > > > > sorting for some specific queries (not all the search
> business),
> > > > > however
> > > > > > I
> > > > > > > am aware of Lucene payload.
> > > > > > > Thank you very much.
> > > > > > >
> > > > > > > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > You would do that with a custom similarity (scoring) class.
> > > That's an
> > > > > > > > expert feature. In fact a SUPER-expert feature.
> > > > > > > >
> > > > > > > > Start by completely familiarizing yourself with how TF*IDF
> > > > > similarity
> > > > > > > > already works:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > > > > > >
> > > > > > > > And to use your custom similarity class in Solr:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > > > > > > >
> > > > > > > >
> > > > > > > > -- Jack Krupansky
> > > > > > > >
> > > > > > > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian <
> > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everybody,
> > > > > > > > >
> > > > > > > > > I am going to add some analysis to Solr at the index time.
> > > Here is
> > > > > > > what I
> > > > > > > > > am considering in my mind:
> > > > > > > > > Suppose I have two different fields for Solr schema, field
> > "a"
> > > and
> > > > > > > field
> > > > > > > > > "b". I am going to use the created reverse index in a way
> > that
> > > some
> > > > > > > terms
> > > > > > > > > are considered as important ones and tell lucene to
> > calculate a
> > > > > value
> > > > > > > > based
> > > > > > > > > on these terms frequency per each document. For example let
> > the
> > > > > word
> > > > > > > > > "hello" considered as important word with the weight of
> > "2.0".
> > > > > > Suppose
> > > > > > > > the
> > > > > > > > > term frequency for this word at field "a" is 3 and at field
> > > "b" is
> > > > > 6
> > > > > > > for
> > > > > > > > > document 1. Therefor the score value would be 2*3+(2*6)^2.
> I
> > > want
> > > > > to
> > > > > > > > > calculate this score based on these fields and put it in
> the
> > > index
> > > > > > for
> > > > > > > > > retrieving. My question would be how can I do such thing?
> > > First I
> > > > > did
> > > > > > > > > consider using term component for calculating this value
> from
> > > > > outside
> > > > > > > and
> > > > > > > > > put it back to Solr index, but it seems it is not efficient
> > > enough.
> > > > > > > > >
> > > > > > > > > Thank you very much.
> > > > > > > > > Best regards.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > A.Nazemian
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > A.Nazemian
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > A.Nazemian
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>



-- 
A.Nazemian

Re: Extending solr analysis in index time

Reply via email to