subject:"Highest frequency terms for a subset of documents"

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

Ok, thanks On Friday, April 22, 2011, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 6:50 PM, Ofer Fort wrote: >> Ok, I'll give it a try, as this is a server I am willing to risk. >> How is the competability between solrj of bulkpostings, trunk, 3.1 and 1.4.1? > > bulkpostings, trunk, and 3.1 sho

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley

On Thu, Apr 21, 2011 at 6:50 PM, Ofer Fort wrote: > Ok, I'll give it a try, as this is a server I am willing to risk. > How is the competability between solrj of bulkpostings, trunk, 3.1 and 1.4.1? bulkpostings, trunk, and 3.1 should all be relatively solrj compatible. But the SolrJ javabin form

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

Ok, I'll give it a try, as this is a server I am willing to risk. How is the competability between solrj of bulkpostings, trunk, 3.1 and 1.4.1? On Friday, April 22, 2011, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 6:34 PM, Ofer Fort wrote: >> So I'm guessing my best approach now would be to t

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley

On Thu, Apr 21, 2011 at 6:34 PM, Ofer Fort wrote: > So I'm guessing my best approach now would be to test trunk, and hope > that as 3.1 cut the performance in half, trunk will do the same Trunk prob won't be much better... but the bulkpostings branch possibly could be. -Yonik http://www.lucenere

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

So I'm guessing my best approach now would be to test trunk, and hope that as 3.1 cut the performance in half, trunk will do the same Thanks for the info Ofer On Friday, April 22, 2011, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 6:25 PM, Ofer Fort wrote: >> Well, it was worth the try;-) >> Bu

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley

On Thu, Apr 21, 2011 at 6:25 PM, Ofer Fort wrote: > Well, it was worth the try;-) > But will using the facet.method=fc, will reducing the subset size > reduce the time and memory? Meaning is it an O( ndocs of the set)? facet.method=fc builds a multi-valued fieldcache like structure (UnInvertedFie

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

Well, it was worth the try;-) But will using the facet.method=fc, will reducing the subset size reduce the time and memory? Meaning is it an O( ndocs of the set)? Thanks On Thursday, April 21, 2011, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 11:15 AM, Ofer Fort wrote: >> So if i want to use th

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley

On Thu, Apr 21, 2011 at 11:15 AM, Ofer Fort wrote: > So if i want to use the facet.method=fc, is there a way to speed it up? and > remove the bucket size limitation? Not really - else we would have done it already ;-) We don't really have great methods for faceting on full-text fields (as opposed

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

So if i want to use the facet.method=fc, is there a way to speed it up? and remove the bucket size limitation? On Thu, Apr 21, 2011 at 5:58 PM, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 10:41 AM, Ofer Fort wrote: > > I see, thanks. > > So if I would want to implement something that would fit

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley

On Thu, Apr 21, 2011 at 10:41 AM, Ofer Fort wrote: > I see, thanks. > So if I would want to implement something that would fit my needs, would > going through the subset of documents and counting all the terms in each > one, would be faster? and easier to implement? That's not just your needs, th

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

I see, thanks. So if I would want to implement something that would fit my needs, would going through the subset of documents and counting all the terms in each one, would be faster? and easier to implement? On Thu, Apr 21, 2011 at 5:36 PM, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 9:44 AM, O

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley

On Thu, Apr 21, 2011 at 9:44 AM, Ofer Fort wrote: > Not sure i fully understand, > If "facet.method=enum steps over all terms in the index for that field", > than what does setting the q=field:subset do? if i set the q=*:*, than how > do i get the frequency only on my subset? It's an implementati

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

Not sure i fully understand, If "facet.method=enum steps over all terms in the index for that field", than what does setting the q=field:subset do? if i set the q=*:*, than how do i get the frequency only on my subset? Ofer On Thu, Apr 21, 2011 at 4:40 PM, Yonik Seeley wrote: > On Thu, Apr 21, 20

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Yonik Seeley

On Thu, Apr 21, 2011 at 9:24 AM, Ofer Fort wrote: > Another strange behavior is that the Qtime seems pretty stable, no matter > how many object match my query. 200K and 20K both take about 17s. > I would have guessed that since the time is going over all the terms of all > the subset documents, wo

Re: Highest frequency terms for a subset of documents

2011-04-21 Thread Ofer Fort

OK, so I copied my index and ran solr3.1 against it. Qtime dropped, from about 40s to 17s! This is good news, but still longer than i hoped for. I tried to do the same text with 4.0, but i'm getting IndexFormatTooOldException since my index was created using 1.4.1. Is my only chance to test this is

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort

my documents are user entries, so i'm guessing they vary a lot. Tomorrow i'll try 3.1 and also 4.0, and see if they have an improvement. thanks guys! On Thu, Apr 21, 2011 at 3:02 AM, Yonik Seeley wrote: > On Wed, Apr 20, 2011 at 7:45 PM, Ofer Fort wrote: > > Thanks > > but i've disabled the cach

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Yonik Seeley

On Wed, Apr 20, 2011 at 7:45 PM, Ofer Fort wrote: > Thanks > but i've disabled the cache already, since my concern is speed and i'm > willing to pay the price (memory) Then you should not disable the cache. >, and my subset are not fixed. > Does the facet search do any extra work that i don't ne

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort

BTW, i'm using solr 1.4.1, does 3.1 or 4.0 contain any performance improvements that will make a difference as far as facet search? thanks again Ofer On Thu, Apr 21, 2011 at 2:45 AM, Ofer Fort wrote: > Thanks > but i've disabled the cache already, since my concern is speed and i'm > willing to p

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort

Thanks but i've disabled the cache already, since my concern is speed and i'm willing to pay the price (memory), and my subset are not fixed. Does the facet search do any extra work that i don't need, that i might be able to disable (either by a flag or by a code change), Somehow i feel, or rather

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Yonik Seeley

On Wed, Apr 20, 2011 at 7:34 PM, Chris Hostetter wrote: > > : thanks, but that's what i started with, but it took an even longer time and > : threw this: > : Approaching too many values for UnInvertedField faceting on field 'text' : > : bucket size=15560140 > : Approaching too many values for UnIn

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Chris Hostetter

: thanks, but that's what i started with, but it took an even longer time and : threw this: : Approaching too many values for UnInvertedField faceting on field 'text' : : bucket size=15560140 : Approaching too many values for UnInvertedField faceting on field 'text : : bucket size=15619075 : Excep

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort

lds, facet.method=fc is the magic. >> I think facet.method=fc is even the default in Solr 1.4+, if you hadn't >> explicitly set it to enum instead! >> >> Jonathan >> ____________ >> From: Ofer Fort [ofer...@gmail.com] >> Sent: Wednesday, Apri

Re: Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort

___ > From: Ofer Fort [ofer...@gmail.com] > Sent: Wednesday, April 20, 2011 6:49 PM > To: solr-user@lucene.apache.org > Subject: Highest frequency terms for a subset of documents > Hi, > I am looking for the best way to find the terms with the highest frequency

RE: Highest frequency terms for a subset of documents

2011-04-20 Thread Jonathan Rochkind

it to enum instead! Jonathan From: Ofer Fort [ofer...@gmail.com] Sent: Wednesday, April 20, 2011 6:49 PM To: solr-user@lucene.apache.org Subject: Highest frequency terms for a subset of documents Hi, I am looking for the best way to find the terms with the highest f

Highest frequency terms for a subset of documents

2011-04-20 Thread Ofer Fort

Hi, I am looking for the best way to find the terms with the highest frequency for a given subset of documents. (terms in the text field) My first thought was to do a count facet search , where the query defines the subset of documents and the facet.field is the text field, this gives me the result

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

Re: Highest frequency terms for a subset of documents

RE: Highest frequency terms for a subset of documents

Highest frequency terms for a subset of documents

25 matches

Site Navigation

Mail list logo

Footer information