Solr for noSQL

2011-01-27 Thread Jianbin Dai
Hi,

 

Do we have data import handler to fast read in data from noSQL database,
specifically, MongoDB I am thinking to use? 

Or a more general question, how does Solr work with noSQL database?

Thanks.

 

Jianbin

 



embeded solrj doesn't refresh index

2011-07-19 Thread Jianbin Dai
Hi,

 

I am using embedded solrj. After I add new doc to the index, I can see the
changes through solr web, but not from embedded solrj. But after I restart
the embedded solrj, I do see the changes. It works as if there was a cache.
Anyone knows the problem? Thanks.

 

Jianbin



RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.


-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai 

> Hi,
>
>
>
> I am using embedded solrj. After I add new doc to the index, I can see the
> changes through solr web, but not from embedded solrj. But after I restart
> the embedded solrj, I do see the changes. It works as if there was a
cache.
> Anyone knows the problem? Thanks.
>
>
>
> Jianbin
>
>



RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.

-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai 

> Hi,
>
>
>
> I am using embedded solrj. After I add new doc to the index, I can see the
> changes through solr web, but not from embedded solrj. But after I restart
> the embedded solrj, I do see the changes. It works as if there was a
cache.
> Anyone knows the problem? Thanks.
>
>
>
> Jianbin
>
>



RE: embeded solrj doesn't refresh index

2011-07-29 Thread Jianbin Dai
Thanks Marc.  
Guess I was not clear about my previous statement. So let me rephrase.

I use DIH to import data into solr and do indexing. Everything works fine.

I have another embedded solr server setting to the same index files. I use
embedded solrj to search the index file.

So the first solr is for indexing purpose, it can be turned off once the
indexing is done.

However the changes in the index files cannot show up from embedded solrj,
that is, once the new index is built, from embedded solrj, I still get the
old results. Only after I restart the embedded solr server, the new changes
are reflected from solrj.  The embedded solrj works like there was a caching
that it always goes to first.

Thanks.

JB


-Original Message-
From: Marc Sturlese [mailto:marc.sturl...@gmail.com] 
Sent: Friday, July 22, 2011 1:57 AM
To: solr-user@lucene.apache.org
Subject: RE: embeded solrj doesn't refresh index

Are u indexing with full import? In case yes and the resultant index has
similar num of docs (that the one you had before) try setting reopenReaders
to false in solrconfig.xml
* You have to send the comit, of course.

--
View this message in context:
http://lucene.472066.n3.nabble.com/embeded-solrj-doesn-t-refresh-index-tp318
4321p3190892.html
Sent from the Solr - User mailing list archive at Nabble.com.



Help needed on DataImportHandler to index xml files

2009-05-19 Thread Jianbin Dai

Hi All,
I am new here. Thanks for reading my question.
I want to use DataImportHandler to index my tons of xml files (7GB total) 
stored in my local disk. My data-config.xml is attached below. It works fine 
with one file (abc.xml), but how can I index all xml files at one time? Thanks!



















  


How to index large set data

2009-05-20 Thread Jianbin Dai

Hi,

I have about 45GB xml files to be indexed. I am using DataImportHandler. I 
started the full import 4 hours ago, and it's still running
My computer has 4GB memory. Any suggestion on the solutions?
Thanks!

JB


  



Re: How to index large set data

2009-05-21 Thread Jianbin Dai

Hi Paul,

Thank you so much for answering my questions. It really helped.
After some adjustment, basically setting mergeFactor to 1000 from the default 
value of 10, I can finished the whole job in 2.5 hours. I checked that during 
running time, only around 18% of memory is being used, and VIRT is always 
1418m. I am thinking it may be restricted by JVM memory setting. But I run the 
data import command through web, i.e.,
http://:/solr/dataimport?command=full-import, how can I set the 
memory allocation for JVM? 
Thanks again!

JB

--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Thursday, May 21, 2009, 9:57 PM
> check the status page of DIH and see
> if it is working properly. and
> if, yes what is the rate of indexing
> 
> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai 
> wrote:
> >
> > Hi,
> >
> > I have about 45GB xml files to be indexed. I am using
> DataImportHandler. I started the full import 4 hours ago,
> and it's still running
> > My computer has 4GB memory. Any suggestion on the
> solutions?
> > Thanks!
> >
> > JB
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 






Re: How to index large set data

2009-05-22 Thread Jianbin Dai

about 2.8 m total docs were created. only the first run finishes. In my 2nd 
try, it hangs there forever at the end of indexing, (I guess right before 
commit), with cpu usage of 100%. Total 5G (2050) index files are created. Now I 
have two problems:
1. why it hangs there and failed?
2. how can i speed up the indexing?


Here is my solrconfig.xml

false
3000
1000
2147483647
1
false




--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Thursday, May 21, 2009, 10:39 PM
> what is the total no:of docs created
> ?  I guess it may not be memory
> bound. indexing is mostly amn IO bound operation. You may
> be able to
> get a better perf if a SSD is used (solid state disk)
> 
> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
> wrote:
> >
> > Hi Paul,
> >
> > Thank you so much for answering my questions. It
> really helped.
> > After some adjustment, basically setting mergeFactor
> to 1000 from the default value of 10, I can finished the
> whole job in 2.5 hours. I checked that during running time,
> only around 18% of memory is being used, and VIRT is always
> 1418m. I am thinking it may be restricted by JVM memory
> setting. But I run the data import command through web,
> i.e.,
> >
> http://:/solr/dataimport?command=full-import,
> how can I set the memory allocation for JVM?
> > Thanks again!
> >
> > JB
> >
> > --- On Thu, 5/21/09, Noble Paul നോബിള്‍
>  नोब्ळ् 
> wrote:
> >
> >> From: Noble Paul നോബിള്‍
>  नोब्ळ् 
> >> Subject: Re: How to index large set data
> >> To: solr-user@lucene.apache.org
> >> Date: Thursday, May 21, 2009, 9:57 PM
> >> check the status page of DIH and see
> >> if it is working properly. and
> >> if, yes what is the rate of indexing
> >>
> >> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai
> 
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I have about 45GB xml files to be indexed. I
> am using
> >> DataImportHandler. I started the full import 4
> hours ago,
> >> and it's still running
> >> > My computer has 4GB memory. Any suggestion on
> the
> >> solutions?
> >> > Thanks!
> >> >
> >> > JB
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >>
> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 





Re: How to index large set data

2009-05-22 Thread Jianbin Dai

I dont know exactly what is this 3G Ram buffer used. But what I noticed was 
both index size and file number were keeping increasing, but stuck in the 
commit. 

--- On Fri, 5/22/09, Otis Gospodnetic  wrote:

> From: Otis Gospodnetic 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Friday, May 22, 2009, 7:26 AM
> 
> Hi,
> 
> Those settings are a little "crazy".  Are you sure you
> want to give Solr/Lucene 3G to buffer documents before
> flushing them to disk?  Are you sure you want to use
> the mergeFactor of 1000?  Checking the logs to see if
> there are any errors.  Look at the index directory to
> see if Solr is actually still writing to it? (file sizes are
> changing, number of files is changing).  kill -QUIT the
> JVM pid to see where things are "stuck" if they are
> stuck...
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: Jianbin Dai 
> > To: solr-user@lucene.apache.org;
> noble.p...@gmail.com
> > Sent: Friday, May 22, 2009 3:42:04 AM
> > Subject: Re: How to index large set data
> > 
> > 
> > about 2.8 m total docs were created. only the first
> run finishes. In my 2nd try, 
> > it hangs there forever at the end of indexing, (I
> guess right before commit), 
> > with cpu usage of 100%. Total 5G (2050) index files
> are created. Now I have two 
> > problems:
> > 1. why it hangs there and failed?
> > 2. how can i speed up the indexing?
> > 
> > 
> > Here is my solrconfig.xml
> > 
> >     false
> >     3000
> >     1000
> >     2147483647
> >     1
> >     false
> > 
> > 
> > 
> > 
> > --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍  नोब्ळ् wrote:
> > 
> > > From: Noble Paul നോബിള്‍ 
> नोब्ळ् 
> > > Subject: Re: How to index large set data
> > > To: solr-user@lucene.apache.org
> > > Date: Thursday, May 21, 2009, 10:39 PM
> > > what is the total no:of docs created
> > > ?  I guess it may not be memory
> > > bound. indexing is mostly amn IO bound operation.
> You may
> > > be able to
> > > get a better perf if a SSD is used (solid state
> disk)
> > > 
> > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
> > > wrote:
> > > >
> > > > Hi Paul,
> > > >
> > > > Thank you so much for answering my
> questions. It
> > > really helped.
> > > > After some adjustment, basically setting
> mergeFactor
> > > to 1000 from the default value of 10, I can
> finished the
> > > whole job in 2.5 hours. I checked that during
> running time,
> > > only around 18% of memory is being used, and VIRT
> is always
> > > 1418m. I am thinking it may be restricted by JVM
> memory
> > > setting. But I run the data import command
> through web,
> > > i.e.,
> > > >
> > > http://:/solr/dataimport?command=full-import,
> > > how can I set the memory allocation for JVM?
> > > > Thanks again!
> > > >
> > > > JB
> > > >
> > > > --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍
> > >  नोब्ळ् 
> > > wrote:
> > > >
> > > >> From: Noble Paul നോബിള്‍
> > >  नोब्ळ् 
> > > >> Subject: Re: How to index large set
> data
> > > >> To: solr-user@lucene.apache.org
> > > >> Date: Thursday, May 21, 2009, 9:57 PM
> > > >> check the status page of DIH and see
> > > >> if it is working properly. and
> > > >> if, yes what is the rate of indexing
> > > >>
> > > >> On Thu, May 21, 2009 at 11:48 AM,
> Jianbin Dai
> > > 
> > > >> wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > I have about 45GB xml files to be
> indexed. I
> > > am using
> > > >> DataImportHandler. I started the full
> import 4
> > > hours ago,
> > > >> and it's still running.
> > > >> > My computer has 4GB memory. Any
> suggestion on
> > > the
> > > >> solutions?
> > > >> > Thanks!
> > > >> >
> > > >> > JB
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > >
> -
> > > >> Noble Paul | Principal Engineer| AOL |
> http://aol.com
> > > >>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > 
> > > 
> > > 
> > > -- 
> > >
> -
> > > Noble Paul | Principal Engineer| AOL | http://aol.com
> > > 
> 
> 






Re: How to index large set data

2009-05-22 Thread Jianbin Dai

If I do the xml parsing by myself and use embedded client to do the push, would 
it be more efficient than DIH?


--- On Fri, 5/22/09, Grant Ingersoll  wrote:

> From: Grant Ingersoll 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Friday, May 22, 2009, 5:38 AM
> Can you parallelize this?  I
> don't know that the DIH can handle it,  
> but having multiple threads sending docs to Solr is the
> best  
> performance wise, so maybe you need to look at alternatives
> to pulling  
> with DIH and instead use a client to push into Solr.
> 
> 
> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
> 
> >
> > about 2.8 m total docs were created. only the first
> run finishes. In  
> > my 2nd try, it hangs there forever at the end of
> indexing, (I guess  
> > right before commit), with cpu usage of 100%. Total 5G
> (2050) index  
> > files are created. Now I have two problems:
> > 1. why it hangs there and failed?
> > 2. how can i speed up the indexing?
> >
> >
> > Here is my solrconfig.xml
> >
> >   
> false
> >   
> 3000
> >   
> 1000
> >   
> 2147483647
> >   
> 1
> >   
> false
> >
> >
> >
> >
> > --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍  नो 
> > ब्ळ् 
> wrote:
> >
> >> From: Noble Paul നോബിള്‍ 
> नोब्ळ्  
> >> 
> >> Subject: Re: How to index large set data
> >> To: solr-user@lucene.apache.org
> >> Date: Thursday, May 21, 2009, 10:39 PM
> >> what is the total no:of docs created
> >> ?  I guess it may not be memory
> >> bound. indexing is mostly amn IO bound operation.
> You may
> >> be able to
> >> get a better perf if a SSD is used (solid state
> disk)
> >>
> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai
> 
> >> wrote:
> >>>
> >>> Hi Paul,
> >>>
> >>> Thank you so much for answering my questions.
> It
> >> really helped.
> >>> After some adjustment, basically setting
> mergeFactor
> >> to 1000 from the default value of 10, I can
> finished the
> >> whole job in 2.5 hours. I checked that during
> running time,
> >> only around 18% of memory is being used, and VIRT
> is always
> >> 1418m. I am thinking it may be restricted by JVM
> memory
> >> setting. But I run the data import command through
> web,
> >> i.e.,
> >>>
> >>
> http://:/solr/dataimport?command=full-import,
> >> how can I set the memory allocation for JVM?
> >>> Thanks again!
> >>>
> >>> JB
> >>>
> >>> --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍
> >>  नोब्ळ् 
> >> wrote:
> >>>
> >>>> From: Noble Paul നോബിള്‍
> >>  नोब्ळ् 
> >>>> Subject: Re: How to index large set data
> >>>> To: solr-user@lucene.apache.org
> >>>> Date: Thursday, May 21, 2009, 9:57 PM
> >>>> check the status page of DIH and see
> >>>> if it is working properly. and
> >>>> if, yes what is the rate of indexing
> >>>>
> >>>> On Thu, May 21, 2009 at 11:48 AM, Jianbin
> Dai
> >> 
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I have about 45GB xml files to be
> indexed. I
> >> am using
> >>>> DataImportHandler. I started the full
> import 4
> >> hours ago,
> >>>> and it's still running
> >>>>> My computer has 4GB memory. Any
> suggestion on
> >> the
> >>>> solutions?
> >>>>> Thanks!
> >>>>>
> >>>>> JB
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>
> -
> >>>> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >> -- 
> >>
> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >
> >
> >
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem
> (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination..com/search
> 
> 






How to use DIH to index attributes in xml file

2009-05-22 Thread Jianbin Dai

I have an xml file like this 




301.46


In the data-config.xml, I use


but how can I index "id", "mid"?

Thanks.


  


Re: How to index large set data

2009-05-22 Thread Jianbin Dai

Hi Pual, but in your previous post, you said "there is already an issue for 
writing to Solr in multiple threads  SOLR-1089". Do you think use solrj alone 
would be better than DIH? 
Thanks and have a good weekend!

--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> no need to use embedded Solrserver.
> you can use SolrJ with streaming
> in multiple threads
> 
> On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai 
> wrote:
> >
> > If I do the xml parsing by myself and use embedded
> client to do the push, would it be more efficient than DIH?
> >
> >
> > --- On Fri, 5/22/09, Grant Ingersoll 
> wrote:
> >
> >> From: Grant Ingersoll 
> >> Subject: Re: How to index large set data
> >> To: solr-user@lucene.apache.org
> >> Date: Friday, May 22, 2009, 5:38 AM
> >> Can you parallelize this?  I
> >> don't know that the DIH can handle it,
> >> but having multiple threads sending docs to Solr
> is the
> >> best
> >> performance wise, so maybe you need to look at
> alternatives
> >> to pulling
> >> with DIH and instead use a client to push into
> Solr.
> >>
> >>
> >> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
> >>
> >> >
> >> > about 2.8 m total docs were created. only the
> first
> >> run finishes. In
> >> > my 2nd try, it hangs there forever at the end
> of
> >> indexing, (I guess
> >> > right before commit), with cpu usage of 100%.
> Total 5G
> >> (2050) index
> >> > files are created. Now I have two problems:
> >> > 1. why it hangs there and failed?
> >> > 2. how can i speed up the indexing?
> >> >
> >> >
> >> > Here is my solrconfig.xml
> >> >
> >> >
> >>
> false
> >> >
> >>
> 3000
> >> >
> >> 1000
> >> >
> >>
> 2147483647
> >> >
> >>
> 1
> >> >
> >>
> false
> >> >
> >> >
> >> >
> >> >
> >> > --- On Thu, 5/21/09, Noble Paul
> >> നോബിള്‍  नो
> >> > ब्ळ् 
> >> wrote:
> >> >
> >> >> From: Noble Paul നോബിള്‍
> >> नोब्ळ्
> >> >> 
> >> >> Subject: Re: How to index large set data
> >> >> To: solr-user@lucene.apache.org
> >> >> Date: Thursday, May 21, 2009, 10:39 PM
> >> >> what is the total no:of docs created
> >> >> ?  I guess it may not be memory
> >> >> bound. indexing is mostly amn IO bound
> operation.
> >> You may
> >> >> be able to
> >> >> get a better perf if a SSD is used (solid
> state
> >> disk)
> >> >>
> >> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin
> Dai
> >> 
> >> >> wrote:
> >> >>>
> >> >>> Hi Paul,
> >> >>>
> >> >>> Thank you so much for answering my
> questions.
> >> It
> >> >> really helped.
> >> >>> After some adjustment, basically
> setting
> >> mergeFactor
> >> >> to 1000 from the default value of 10, I
> can
> >> finished the
> >> >> whole job in 2.5 hours. I checked that
> during
> >> running time,
> >> >> only around 18% of memory is being used,
> and VIRT
> >> is always
> >> >> 1418m. I am thinking it may be restricted
> by JVM
> >> memory
> >> >> setting. But I run the data import
> command through
> >> web,
> >> >> i.e.,
> >> >>>
> >> >>
> >>
> http://:/solr/dataimport?command=full-import,
> >> >> how can I set the memory allocation for
> JVM?
> >> >>> Thanks again!
> >> >>>
> >> >>> JB
> >> >>>
> >> >>> --- On Thu, 5/21/09, Noble Paul
> >> നോബിള്‍
> >> >>  नोब्ळ् 
> >> >> wrote:
> >> >>>
> >> >>>> From: Noble Paul
> നോബിള്‍
> >> >>  नोब्ळ् 
> >> >>>> Subject: Re: How to index large
> set data
> >> >>>> To: solr-u...@lucene.apache..org
> >> >>>> Date: Thursday, May 21, 2009,
> 9:57 PM
> >> >>>> check the status page of DIH and
> see
> >> >>>> if it is working properly. and
> >> >>>> if, yes what is the rate of
> indexing
> >> >>>>
> >> >>>> On Thu, May 21, 2009 at 11:48 AM,
> Jianbin
> >> Dai
> >> >> 
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> I have about 45GB xml files
> to be
> >> indexed. I
> >> >> am using
> >> >>>> DataImportHandler. I started the
> full
> >> import 4
> >> >> hours ago,
> >> >>>> and it's still running.
> >> >>>>> My computer has 4GB memory.
> Any
> >> suggestion on
> >> >> the
> >> >>>> solutions?
> >> >>>>> Thanks!
> >> >>>>>
> >> >>>>> JB
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>
> >>
> -
> >> >>>> Noble Paul | Principal Engineer|
> AOL | http://aol.com
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >>
> -
> >> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >> >>
> >> >
> >> >
> >> >
> >>
> >> --
> >> Grant Ingersoll
> >> http://www.lucidimagination.com/
> >>
> >> Search the Lucene ecosystem
> >> (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> >> using Solr/Lucene:
> >> http://www.lucidimagination...com/search
> >>
> >>
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 






Re: How to use DIH to index attributes in xml file

2009-05-22 Thread Jianbin Dai

Oh, I guess I didn't say it clearly in my post. 
I didn't use wild cards in xpath. My question was how to index attributes "id" 
and "mid" in the following xml file.




301.46


In the data-config.xml, I use


but what are the xpath for "id" and "mid"?

Thanks again!





--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: How to use DIH to index attributes in xml file
> To: solr-user@lucene.apache.org
> Date: Friday, May 22, 2009, 9:03 PM
> wild cards are not supported . u must
> use full xpath
> 
> On Sat, May 23, 2009 at 4:55 AM, Jianbin Dai 
> wrote:
> >
> > I have an xml file like this
> >
> > 
> >                     type="stock-4" />
> >                     type="cond-0" />
> >                  
>  301.46
> > 
> >
> > In the data-config.xml, I use
> >   xpath="/.../merchantProduct/price" />
> >
> > but how can I index "id", "mid"?
> >
> > Thanks.
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 






Re: How to index large set data

2009-05-24 Thread Jianbin Dai

Hi Paul,

Hope you have a great weekend so far.
I still have a couple of questions you might help me out:

1. In your earlier email, you said "if possible , you can setup multiple DIH 
say /dataimport1, /dataimport2 etc and split your files and can achieve 
parallelism"
I am not sure if I understand it right. I put two requesHandler in 
solrconfig.xml, like this



  ./data-config.xml





  ./data-config2.xml




and create data-config.xml and data-config2.xml.
then I run the command
http://host:8080/solr/dataimport?command=full-import

But only one data set (the first one) was indexed. Did I get something wrong?


2. I noticed that after solr indexed about 8M documents (around two hours), it 
gets very very slow. I use "top" command in linux, and noticed that RES is 1g 
of memory. I did several experiments, every time RES reaches 1g, the indexing 
process becomes extremely slow. Is this memory limit set by JVM? And how can I 
set the JVM memory when I use DIH through web command full-import?

Thanks!


JB




--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: How to index large set data
> To: "Jianbin Dai" 
> Date: Friday, May 22, 2009, 10:04 PM
> On Sat, May 23, 2009 at 10:27 AM,
> Jianbin Dai 
> wrote:
> >
> > Hi Pual, but in your previous post, you said "there is
> already an issue for writing to Solr in multiple threads
>  SOLR-1089". Do you think use solrj alone would be better
> than DIH?
> 
> nope
> you will have to do indexing in multiple threads
> 
> if possible , you can setup multiple DIH say /dataimport1,
> /dataimport2 etc and split your files and can achieve
> parallelism
> 
> 
> > Thanks and have a good weekend!
> >
> > --- On Fri, 5/22/09, Noble Paul നോബിള്‍
>  नोब्ळ् 
> wrote:
> >
> >> no need to use embedded Solrserver..
> >> you can use SolrJ with streaming
> >> in multiple threads
> >>
> >> On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai
> 
> >> wrote:
> >> >
> >> > If I do the xml parsing by myself and use
> embedded
> >> client to do the push, would it be more efficient
> than DIH?
> >> >
> >> >
> >> > --- On Fri, 5/22/09, Grant Ingersoll 
> >> wrote:
> >> >
> >> >> From: Grant Ingersoll 
> >> >> Subject: Re: How to index large set data
> >> >> To: solr-user@lucene.apache.org
> >> >> Date: Friday, May 22, 2009, 5:38 AM
> >> >> Can you parallelize this?  I
> >> >> don't know that the DIH can handle it,
> >> >> but having multiple threads sending docs
> to Solr
> >> is the
> >> >> best
> >> >> performance wise, so maybe you need to
> look at
> >> alternatives
> >> >> to pulling
> >> >> with DIH and instead use a client to push
> into
> >> Solr.
> >> >>
> >> >>
> >> >> On May 22, 2009, at 3:42 AM, Jianbin Dai
> wrote:
> >> >>
> >> >> >
> >> >> > about 2.8 m total docs were created.
> only the
> >> first
> >> >> run finishes. In
> >> >> > my 2nd try, it hangs there forever
> at the end
> >> of
> >> >> indexing, (I guess
> >> >> > right before commit), with cpu usage
> of 100%.
> >> Total 5G
> >> >> (2050) index
> >> >> > files are created. Now I have two
> problems:
> >> >> > 1. why it hangs there and failed?
> >> >> > 2. how can i speed up the indexing?
> >> >> >
> >> >> >
> >> >> > Here is my solrconfig.xml
> >> >> >
> >> >> >
> >> >>
> >>
> false
> >> >> >
> >> >>
> >>
> 3000
> >> >> >
> >> >>
> 1000
> >> >> >
> >> >>
> >>
> 2147483647
> >> >> >
> >> >>
> >>
> 1
> >> >> >
> >> >>
> >>
> false
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --- On Thu, 5/21/09, Noble Paul
> >> >> നോബിള്‍  नो
> >> >> > ब्ळ् 
> >> >> wrote:
> >> >> >
> >> >> >> From: Noble Paul
> നോബിള്‍
> >> >> नोब्ळ्
> >> >> >> 
> &g

Is it memory leaking in solr?

2009-05-25 Thread Jianbin Dai

I am using DIH to do indexing. After I indexed about 8M documents (took about 
1hr40m), it used up almost all memory (4GB), and the indexing becomes extremely 
slow. If I delete all indexing and shutdown tomcat, it still shows over 3gb 
memory was used. Is it memory leaking? if it is, then the leaking is in solr 
indexing or DIH?  Thanks.


  



Re: Is it memory leaking in solr?

2009-05-25 Thread Jianbin Dai

Again, indexing becomes extremely slow after indexed 8m documents (about 25G of 
original file size). Here is the memory usage info of my computer. Does this 
have anything to do with tomcat setting? Thanks.


top - 08:09:53 up  7:22,  1 user,  load average: 1.03, 1.01, 1.00
Tasks:  78 total,   2 running,  76 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.9%us,  0.2%sy,  0.0%ni, 49.8%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4044776k total,  3960740k used,84036k free,42196k buffers
Swap:  2031608k total,   84k used,  2031524k free,  2729892k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   
 3322 root  21   0 1357m 1.0g  11m S  100 27.0 397:51.74 java  



--- On Mon, 5/25/09, Jianbin Dai  wrote:

> From: Jianbin Dai 
> Subject: Is it memory leaking in solr?
> To: solr-user@lucene.apache.org, noble.p...@gmail.com
> Date: Monday, May 25, 2009, 1:27 AM
> 
> I am using DIH to do indexing. After I indexed about 8M
> documents (took about 1hr40m), it used up almost all memory
> (4GB), and the indexing becomes extremely slow. If I delete
> all indexing and shutdown tomcat, it still shows over 3gb
> memory was used. Is it memory leaking? if it is, then the
> leaking is in solr indexing or DIH?  Thanks.
> 
> 
>       
> 
> 






Re: Is it memory leaking in solr?

2009-05-26 Thread Jianbin Dai


Hi Otis,

The slowness was due to the JVM memory limit set by tomcat.. I have solved this 
problem. Initially I thought there might be memory leaking because I noticed 
the following behavior:
In the peak of indexing, almost all 4GB of memory was used. Once indexing is 
done, the memory usage was about 3GB. If I delete all indexing, and shutdown 
solr, I still noticed that about 2 GB memory used, much more than the initial 
memory usage about 250M.
I am not sure if I guess it right. Thanks.


--- On Tue, 5/26/09, Otis Gospodnetic  wrote:

> From: Otis Gospodnetic 
> Subject: Re: Is it memory leaking in solr?
> To: solr-user@lucene.apache.org
> Date: Tuesday, May 26, 2009, 10:03 AM
> 
> Jianbin,
> 
> If you connect to that Java process with jconsole, do you
> see a lot of garbage collection activity?
> 
> What makes you think there is a memory leak?  The
> slowness?
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: Jianbin Dai 
> > To: solr-user@lucene.apache.org
> > Sent: Monday, May 25, 2009 1:05:43 PM
> > Subject: Re: Is it memory leaking in solr?
> > 
> > 
> > Again, indexing becomes extremely slow after indexed
> 8m documents (about 25G of 
> > original file size). Here is the memory usage info of
> my computer. Does this 
> > have anything to do with tomcat setting? Thanks.
> > 
> > 
> > top - 08:09:53 up  7:22,  1 user,  load
> average: 1.03, 1.01, 1.00
> > Tasks:  78 total,   2
> running,  76 sleeping,   0
> stopped,   0 zombie
> > Cpu(s): 49.9%us,  0.2%sy,  0.0%ni,
> 49.8%id,  0.2%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> > Mem:   4044776k total,  3960740k
> used,    84036k free,    42196k buffers
> > Swap:  2031608k total,   
>    84k used,  2031524k free, 
> 2729892k cached
> > 
> >   PID USER      PR 
> NI  VIRT  RES  SHR S %CPU %MEM   
> TIME+  COMMAND           
> 
> >               
>    
> > 3322 root      21   0
> 1357m 1.0g  11m S  100 27.0 397:51.74 java  
> > 
> > 
> > 
> > --- On Mon, 5/25/09, Jianbin Dai wrote:
> > 
> > > From: Jianbin Dai 
> > > Subject: Is it memory leaking in solr?
> > > To: solr-user@lucene.apache.org,
> noble.p...@gmail.com
> > > Date: Monday, May 25, 2009, 1:27 AM
> > > 
> > > I am using DIH to do indexing. After I indexed
> about 8M
> > > documents (took about 1hr40m), it used up almost
> all memory
> > > (4GB), and the indexing becomes extremely slow.
> If I delete
> > > all indexing and shutdown tomcat, it still shows
> over 3gb
> > > memory was used. Is it memory leaking? if it is,
> then the
> > > leaking is in solr indexing or DIH? 
> Thanks.
> > > 
> > > 
> > >       
> > > 
> > > 
> 
> 






how to do exact serch with solrj

2009-05-30 Thread Jianbin Dai

Hi,

I want to search "hello the world" in the "title" field using solrj. I set the 
query filter
query.addFilterQuery("title");
query.setQuery("hello the world");

but it returns not exact match results as well. 

I know one way to do it is to set "title" field to string instead of text. But 
is there any way i can do it? If I do the search through web interface Solr 
Admin by title:"hello the world", it returns exact matches.

Thanks.

JB


  



Re: how to do exact serch with solrj

2009-05-30 Thread Jianbin Dai

I tried, but seems it's not working right.

--- On Sat, 5/30/09, Avlesh Singh  wrote:

> From: Avlesh Singh 
> Subject: Re: how to do exact serch with solrj
> To: solr-user@lucene.apache.org
> Date: Saturday, May 30, 2009, 10:56 PM
> query.setQuery("title:hello the
> world") is what you need.
> 
> Cheers
> Avlesh
> 
> On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai 
> wrote:
> 
> >
> > Hi,
> >
> > I want to search "hello the world" in the "title"
> field using solrj. I set
> > the query filter
> > query.addFilterQuery("title");
> > query.setQuery("hello the world");
> >
> > but it returns not exact match results as well.
> >
> > I know one way to do it is to set "title" field to
> string instead of text.
> > But is there any way i can do it? If I do the search
> through web interface
> > Solr Admin by title:"hello the world", it returns
> exact matches.
> >
> > Thanks.
> >
> > JB
> >
> >
> >
> >
> >
> 


  



Re: how to do exact serch with solrj

2009-05-30 Thread Jianbin Dai

That's correct! Thanks Avlesh.

--- On Sat, 5/30/09, Avlesh Singh  wrote:

> From: Avlesh Singh 
> Subject: Re: how to do exact serch with solrj
> To: solr-user@lucene.apache.org
> Date: Saturday, May 30, 2009, 11:45 PM
> You need exact match for all the
> three tokens?
> If yes, try query.setQuery("title:\"hello the world\"");
> 
> Cheers
> Avlesh
> 
> On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai 
> wrote:
> 
> >
> > I tried, but seems it's not working right.
> >
> > --- On Sat, 5/30/09, Avlesh Singh 
> wrote:
> >
> > > From: Avlesh Singh 
> > > Subject: Re: how to do exact serch with solrj
> > > To: solr-user@lucene.apache.org
> > > Date: Saturday, May 30, 2009, 10:56 PM
> > > query.setQuery("title:hello the
> > > world") is what you need.
> > >
> > > Cheers
> > > Avlesh
> > >
> > > On Sun, May 31, 2009 at 6:23 AM, Jianbin Dai
> 
> > > wrote:
> > >
> > > >
> > > > Hi,
> > > >
> > > > I want to search "hello the world" in the
> "title"
> > > field using solrj. I set
> > > > the query filter
> > > > query.addFilterQuery("title");
> > > > query.setQuery("hello the world");
> > > >
> > > > but it returns not exact match results as
> well.
> > > >
> > > > I know one way to do it is to set "title"
> field to
> > > string instead of text.
> > > > But is there any way i can do it? If I do
> the search
> > > through web interface
> > > > Solr Admin by title:"hello the world", it
> returns
> > > exact matches.
> > > >
> > > > Thanks.
> > > >
> > > > JB
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> >
> >
> 


  


Index Comma Separated numbers

2009-06-04 Thread Jianbin Dai

Hi, One of the fields to be indexed is price which is comma separated, e.g., 
12,034.00.  How can I indexed it as a number? 
I am using DIH to pull the data. Thanks.


  



Re: how to do exact serch with solrj

2009-06-04 Thread Jianbin Dai

I still have a problem with exact matching.

query.setQuery("title:\"hello the world\"");

This will return all docs with title containing "hello the world", i.e.,
"hello the world, Jack" will also be matched. What I want is exactly "hello the 
world". Setting this field to string instead of text doesn't work well either, 
because I want something like "Hello, The World" to be matched as well.
Any idea? Thanks.


> --- On Sat, 5/30/09, Avlesh Singh 
> wrote:
> 
> > From: Avlesh Singh 
> > Subject: Re: how to do exact serch with solrj
> > To: solr-user@lucene.apache.org
> > Date: Saturday, May 30, 2009, 11:45 PM
> > You need exact match for all the
> > three tokens?
> > If yes, try query.setQuery("title:\"hello the
> world\"");
> > 
> > Cheers
> > Avlesh
> > 
> > On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai 
> > wrote:
> > 
> > >
> > > I tried, but seems it's not working right.
> > >
> > > --- On Sat, 5/30/09, Avlesh Singh 
> > wrote:
> > >
> > > > From: Avlesh Singh 
> > > > Subject: Re: how to do exact serch with
> solrj
> > > > To: solr-user@lucene.apache.org
> > > > Date: Saturday, May 30, 2009, 10:56 PM
> > > > query.setQuery("title:hello the
> > > > world") is what you need.
> > > >
> > > > Cheers
> > > > Avlesh
> > > >
> > > > On Sun, May 31, 2009 at 6:23 AM, Jianbin
> Dai
> > 
> > > > wrote:
> > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > I want to search "hello the world" in
> the
> > "title"
> > > > field using solrj. I set
> > > > > the query filter
> > > > > query.addFilterQuery("title");
> > > > > query.setQuery("hello the world");
> > > > >
> > > > > but it returns not exact match results
> as
> > well.
> > > > >
> > > > > I know one way to do it is to set
> "title"
> > field to
> > > > string instead of text.
> > > > > But is there any way i can do it? If I
> do
> > the search
> > > > through web interface
> > > > > Solr Admin by title:"hello the world",
> it
> > returns
> > > > exact matches.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > JB
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > 
> 
> 
>       
> 





Re: Index Comma Separated numbers

2009-06-05 Thread Jianbin Dai

Hi,

Yes, I put it in data-config.xml, like following


 wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: Index Comma Separated numbers
> To: solr-user@lucene.apache.org
> Date: Thursday, June 4, 2009, 9:24 PM
> did you try the
> NumberFormatTransformer ?
> 
> On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai 
> wrote:
> >
> > Hi, One of the fields to be indexed is price which is
> comma separated, e.g., 12,034.00.  How can I indexed it as
> a number?
> > I am using DIH to pull the data. Thanks.
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 






Re: Index Comma Separated numbers

2009-06-05 Thread Jianbin Dai

I forgot to put formatStyle="number" on the field. 
It works now. Thanks!!


--- On Fri, 6/5/09, Jianbin Dai  wrote:

> From: Jianbin Dai 
> Subject: Re: Index Comma Separated numbers
> To: solr-user@lucene.apache.org, noble.p...@gmail.com
> Date: Friday, June 5, 2009, 12:37 PM
> 
> Hi,
> 
> Yes, I put it in data-config.xml, like following
> 
>                
>                     
>        
>    dataSource="xmlreader"
>                
>        
>    processor="XPathEntityProcessor"
>                
>        
>    url="${f.fileAbsolutePath}"
>                
>        
>    forEach="/abc/def/gh"
>                
>        
>    transformer="NumberFormatTransformer"
>                
>            >
>                
>      
> But it's not working on comma separated numbers.
> Did I miss something?
> 
> Thanks.
> 
> 
> 
> 
> 
> --- On Thu, 6/4/09, Noble Paul നോബിള്‍ 
> नोब्ळ् 
> wrote:
> 
> > From: Noble Paul നോബിള്‍ 
> नोब्ळ् 
> > Subject: Re: Index Comma Separated numbers
> > To: solr-user@lucene.apache.org
> > Date: Thursday, June 4, 2009, 9:24 PM
> > did you try the
> > NumberFormatTransformer ?
> > 
> > On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai 
> > wrote:
> > >
> > > Hi, One of the fields to be indexed is price
> which is
> > comma separated, e.g., 12,034.00.  How can I indexed
> it as
> > a number?
> > > I am using DIH to pull the data. Thanks.
> > >
> > >
> > >
> > >
> > >
> > 
> > 
> > 
> > -- 
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> > 
> 
> 
> 
> 
> 






Use DIH with large xml file

2009-06-20 Thread Jianbin Dai

Hi,

I have about 50GB of data to be indexed each day using DIH. Some of the files 
are as large as 6GB. I set the JVM Xmx to be 3GB, but the DIH crashes on those 
big files. Is there any way to handle it?

Thanks.

JB


  



Re: Use DIH with large xml file

2009-06-20 Thread Jianbin Dai

Can DIH read item by item instead of the whole file before indexing? my biggest 
file size is 6GB, larger than the JVM max ram value.


--- On Sat, 6/20/09, Erik Hatcher  wrote:

> From: Erik Hatcher 
> Subject: Re: Use DIH with large xml file
> To: solr-user@lucene.apache.org
> Date: Saturday, June 20, 2009, 6:52 PM
> How are you configuring DIH to read
> those files?  It is likely that you'll need at least as
> much RAM to the JVM as the largest file you're processing,
> though that depends entirely on how the file is being
> processed.
> 
>     Erik
> 
> On Jun 20, 2009, at 9:23 PM, Jianbin Dai wrote:
> 
> > 
> > Hi,
> > 
> > I have about 50GB of data to be indexed each day using
> DIH. Some of the files are as large as 6GB. I set the JVM
> Xmx to be 3GB, but the DIH crashes on those big files. Is
> there any way to handle it?
> > 
> > Thanks.
> > 
> > JB
> > 
> > 
> > 
> 
> 






weighted search and index

2010-03-03 Thread Jianbin Dai
Hi,

I am trying to use solr for a content match application. 

A content is described by a set of keywords with weights associated, eg.,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

Those contents would be indexed in solr.
In the search, I also have a set of keywords with weights:

Query: Sports 0.8, golf 0.5

I am trying to find the closest matching contents for this query.

My question is how to index the contents with weighted scores, and how to
write search query. I was trying to use boosting, but seems not working
right.

Thanks.

Jianbin




RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Thank you very much Erick!

1. I used boost in search, but I don't know exactly what's the best way to
boost, for such as Sports 0.8, golf 0.5 in my example, would it be
sports^0.8 AND golf^0.5 ?


2. I cannot use boost in indexing. Because the weight of the value changes,
not the field, look at this example again,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

There is no good way to boost it during indexing.

Thanks.

JB


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 5:45 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

You have to provide some more details to get meaningful help.

You say "I was trying to use boosting". How? At index time?
Search time? Both? Can you provide some code snippets?
What does your schema look like for the relevant field(s)?

You say "but seems not working right". What does that mean? No hits?
Hits not ordered as you expect? Have you tried putting "&debugQuery=on" on
your URL and examined the return values?

Have you looked at your index with the admin page and/or Luke to see if
the data in the index is as you expect?

As far as I know, boosts are multiplicative. So boosting by a value less
than
1 will actually decrease the ranking. But see the Lucene scoring, See:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
html

And remember, that boosting will *tend* to move a hit up or down in the
ranking, not position it absolutely.

HTH
Erick

On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai  wrote:

> Hi,
>
> I am trying to use solr for a content match application.
>
> A content is described by a set of keywords with weights associated, eg.,
>
> C1: fruit 0.8, apple 0.4, banana 0.2
> C2: music 0.9, pop song 0.6, Britney Spears 0.4
>
> Those contents would be indexed in solr.
> In the search, I also have a set of keywords with weights:
>
> Query: Sports 0.8, golf 0.5
>
> I am trying to find the closest matching contents for this query.
>
> My question is how to index the contents with weighted scores, and how to
> write search query. I was trying to use boosting, but seems not working
> right.
>
> Thanks.
>
> Jianbin
>
>
>



RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Hi Erick,

Each doc contains some keywords that are indexed. However each keyword is
associated with a weight to represent its importance. In my example, 
D1: fruit 0.8, apple 0.4, banana 0.2

The keyword fruit is the most important keyword, which means I really really
want it to be matched in a search result, but banana is less important (It
would be good to be matched though).

Hope that explains.

Thanks.

JB



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 6:23 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

Then I'm totally lost as to what you're trying to accomplish. Perhaps
a higher-level statement of the problem would help.

Because no matter how often I look at your point <2>, I don't see
what relevance the numbers have if you're not using them to
boost at index time. Why are they even there?

Erick

On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai  wrote:

> Thank you very much Erick!
>
> 1. I used boost in search, but I don't know exactly what's the best way to
> boost, for such as Sports 0.8, golf 0.5 in my example, would it be
> sports^0.8 AND golf^0.5 ?
>
>
> 2. I cannot use boost in indexing. Because the weight of the value
changes,
> not the field, look at this example again,
>
> C1: fruit 0.8, apple 0.4, banana 0.2
> C2: music 0.9, pop song 0.6, Britney Spears 0.4
>
> There is no good way to boost it during indexing.
>
> Thanks.
>
> JB
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, March 03, 2010 5:45 PM
> To: solr-user@lucene.apache.org
> Subject: Re: weighted search and index
>
> You have to provide some more details to get meaningful help.
>
> You say "I was trying to use boosting". How? At index time?
> Search time? Both? Can you provide some code snippets?
> What does your schema look like for the relevant field(s)?
>
> You say "but seems not working right". What does that mean? No hits?
> Hits not ordered as you expect? Have you tried putting "&debugQuery=on" on
> your URL and examined the return values?
>
> Have you looked at your index with the admin page and/or Luke to see if
> the data in the index is as you expect?
>
> As far as I know, boosts are multiplicative. So boosting by a value less
> than
> 1 will actually decrease the ranking. But see the Lucene scoring, See:
>
>
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
>
html<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
rity.%0Ahtml>
>
> And remember, that boosting will *tend* to move a hit up or down in the
> ranking, not position it absolutely.
>
> HTH
> Erick
>
> On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai  wrote:
>
> > Hi,
> >
> > I am trying to use solr for a content match application.
> >
> > A content is described by a set of keywords with weights associated,
eg.,
> >
> > C1: fruit 0.8, apple 0.4, banana 0.2
> > C2: music 0.9, pop song 0.6, Britney Spears 0.4
> >
> > Those contents would be indexed in solr.
> > In the search, I also have a set of keywords with weights:
> >
> > Query: Sports 0.8, golf 0.5
> >
> > I am trying to find the closest matching contents for this query.
> >
> > My question is how to index the contents with weighted scores, and how
to
> > write search query. I was trying to use boosting, but seems not working
> > right.
> >
> > Thanks.
> >
> > Jianbin
> >
> >
> >
>
>



RE: weighted search and index

2010-03-04 Thread Jianbin Dai
Thanks! Will try it.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, March 04, 2010 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

OK, lights are finally dawning. I think what you want is payloads,
see:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payload
s/
<http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloa
ds/>for
your index-time term boosting. Query time boosting is as you
indicated

HTH
Erick

On Wed, Mar 3, 2010 at 9:34 PM, Jianbin Dai  wrote:

> Hi Erick,
>
> Each doc contains some keywords that are indexed. However each keyword is
> associated with a weight to represent its importance. In my example,
> D1: fruit 0.8, apple 0.4, banana 0.2
>
> The keyword fruit is the most important keyword, which means I really
> really
> want it to be matched in a search result, but banana is less important (It
> would be good to be matched though).
>
> Hope that explains.
>
> Thanks.
>
> JB
>
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, March 03, 2010 6:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: weighted search and index
>
> Then I'm totally lost as to what you're trying to accomplish. Perhaps
> a higher-level statement of the problem would help.
>
> Because no matter how often I look at your point <2>, I don't see
> what relevance the numbers have if you're not using them to
> boost at index time. Why are they even there?
>
> Erick
>
> On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai  wrote:
>
> > Thank you very much Erick!
> >
> > 1. I used boost in search, but I don't know exactly what's the best way
> to
> > boost, for such as Sports 0.8, golf 0.5 in my example, would it be
> > sports^0.8 AND golf^0.5 ?
> >
> >
> > 2. I cannot use boost in indexing. Because the weight of the value
> changes,
> > not the field, look at this example again,
> >
> > C1: fruit 0.8, apple 0.4, banana 0.2
> > C2: music 0.9, pop song 0.6, Britney Spears 0.4
> >
> > There is no good way to boost it during indexing.
> >
> > Thanks.
> >
> > JB
> >
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Wednesday, March 03, 2010 5:45 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: weighted search and index
> >
> > You have to provide some more details to get meaningful help.
> >
> > You say "I was trying to use boosting". How? At index time?
> > Search time? Both? Can you provide some code snippets?
> > What does your schema look like for the relevant field(s)?
> >
> > You say "but seems not working right". What does that mean? No hits?
> > Hits not ordered as you expect? Have you tried putting "&debugQuery=on"
> on
> > your URL and examined the return values?
> >
> > Have you looked at your index with the admin page and/or Luke to see if
> > the data in the index is as you expect?
> >
> > As far as I know, boosts are multiplicative. So boosting by a value less
> > than
> > 1 will actually decrease the ranking. But see the Lucene scoring, See:
> >
> >
>
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity
> .
> >
> html<
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
> rity.%0Ahtml>
> >
> > And remember, that boosting will *tend* to move a hit up or down in the
> > ranking, not position it absolutely.
> >
> > HTH
> > Erick
> >
> > On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai  wrote:
> >
> > > Hi,
> > >
> > > I am trying to use solr for a content match application.
> > >
> > > A content is described by a set of keywords with weights associated,
> eg.,
> > >
> > > C1: fruit 0.8, apple 0.4, banana 0.2
> > > C2: music 0.9, pop song 0.6, Britney Spears 0.4
> > >
> > > Those contents would be indexed in solr.
> > > In the search, I also have a set of keywords with weights:
> > >
> > > Query: Sports 0.8, golf 0.5
> > >
> > > I am trying to find the closest matching contents for this query.
> > >
> > > My question is how to index the contents with weighted scores, and how
> to
> > > write search query. I was trying to use boosting, but seems not
working
> > > right.
> > >
> > > Thanks.
> > >
> > > Jianbin
> > >
> > >
> > >
> >
> >
>
>