Re: Facet Performance

2020-06-17 Thread Erick Erickson
queryResultCache doesn’t really help with faceting, even if it’s hit for the main query. That cache only stores a subset of the hits, and to facet properly you need the entire result set…. > On Jun 17, 2020, at 12:47 PM, James Bodkin > wrote: > > We've noticed that the filterCache uses a sig

Re: Facet Performance

2020-06-17 Thread James Bodkin
We've noticed that the filterCache uses a significant amount of memory, as we've assigned 8GB Heap per instance. In total, we have 32 shards with 2 replicas, hence (8*32*2) 512G Heap space alone, further memory is required to ensure the index is always memory mapped for performance reasons. Ide

Re: Facet Performance

2020-06-17 Thread Michael Gibney
To expand a bit on what Erick said regarding performance: my sense is that the RefGuide assertion that "docValues=true" makes faceting "faster" could use some qualification/clarification. My take, fwiw: First, to reiterate/paraphrase what Erick said: the "faster" assertion is not comparing to "fac

Re: Facet Performance

2020-06-17 Thread Erick Erickson
Uninvertible is a safety mechanism to make sure that you don’t _unknowingly_ use a docValues=false field for faceting/grouping/sorting/function queries. The primary point of docValues=true is twofold: 1> reduce Java heap requirements by using the OS memory to hold it 2> uninverting can be expen

Re: Facet Performance

2020-06-17 Thread James Bodkin
The large majority of the relevant fields have fewer than 20 unique values. We have two fields over that with 150 unique values and 5300 unique values retrospectively. At the moment, our filterCache is configured with a maximum size of 8192. From the DocValues documentation (https://lucene.apac

Re: Facet Performance

2020-06-17 Thread Anthony Groves
Ah, interesting! So if the number of possible values is low (like <= 10), it is faster to *not *use docvalues on that (indexed) faceted field? Does this hold true even when using faceting techniques like tag and exclusion? Thanks, Anthony On Wed, Jun 17, 2020 at 9:37 AM David Smiley wrote: > I

Re: Facet Performance

2020-06-17 Thread David Smiley
I strongly recommend setting indexed=true on a field you facet on for the purposes of efficient refinement (fq=field:value). But it strictly isn't required, as you have discovered. ~ David On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney wrote: > facet.method=enum works by executing a query (ag

Re: Facet Performance

2020-06-17 Thread Michael Gibney
facet.method=enum works by executing a query (against indexed values) for each indexed value in a given field (which, for indexed=false, is "no values"). So that explains why facet.method=enum no longer works. I was going to suggest that you might not want to set indexed=false on the docValues face

Re: Facet Performance

2020-06-17 Thread James Bodkin
Thanks, I've implemented some queries that improve the first-hit execution for faceting. Since turning off indexed on those fields, we've noticed that facet.method=enum no longer returns the facets when used. Using facet.method=fc/fcs is significantly slower compared to facet.method=enum for us

Re: Facet Performance

2020-06-16 Thread Erick Erickson
Ok, I see the disconnect... Necessary parts if the index are read from disk lazily. So your newSearcher or firstSearcher query needs to do whatever operation causes the relevant parts of the index to be read. In this case, probably just facet on all the fields you care about. I'd add sorting too if

Re: Facet Performance

2020-06-16 Thread James Bodkin
I've been trying to build a query that I can use in newSearcher based off the information in your previous e-mail. I thought you meant to build a *:* query as per Query 1 in my previous e-mail but I'm still seeing the first-hit execution. Now I'm wondering if you meant to create a *:* query with

Re: Facet Performance

2020-06-16 Thread Erick Erickson
Did you try the autowarming like I mentioned in my previous e-mail? > On Jun 16, 2020, at 10:18 AM, James Bodkin > wrote: > > We've changed the schema to enable docValues for these fields and this led to > an improvement in the response time. We found a further improvement by also > switching

Re: Facet Performance

2020-06-16 Thread James Bodkin
We've changed the schema to enable docValues for these fields and this led to an improvement in the response time. We found a further improvement by also switching off indexed as these fields are used for faceting and filtering only. Since those changes, we've found that the first-execution for q

Re: Facet Performance

2020-06-12 Thread Erick Erickson
I question whether fiterCache has anything to do with it, I suspect what’s really happening is that first time you’re reading the relevant bits from disk into memory. And to double check you should have docVaues enabled for all these fields. The “uninverting” process can be very expensive, and

Re: Facet Performance

2020-06-12 Thread James Bodkin
We've run the performance test after changing the fields to be of the type string. We're seeing improved performance, especially after the first time the query has run. The first run is taking around 1-2 seconds rather than 6-8 seconds and when the filter cache is present, the response time is a

Re: Facet Performance

2020-06-11 Thread James Bodkin
Could you explain why the performance is an issue for points-based fields? I've looked through the referenced issue (which is fixed in the version we are running) but I'm missing the link between the two. Is there an issue to improve this for points-based fields? We're going to change the field

Re: Facet Performance

2020-06-11 Thread Erick Erickson
There’s a lot of confusion about using points-based fields for faceting, see: https://issues.apache.org/jira/browse/SOLR-13227 for instance. Two options you might try: 1> copyField to a string field and facet on that (won’t work, of course, for any kind of interval/range facet) 2> use the deprec

Re: Facet performance problem

2018-02-20 Thread Shawn Heisey
On 2/20/2018 1:18 AM, LOPEZ-CORTES Mariano-ext wrote: We return a facet list of values in "motifPresence" field (person status). Status: [ ] status1 [x] status2 [x] status3 The user then selects 1 or multiple status (It's this step that we called "facet filtering

RE: Facet performance problem

2018-02-20 Thread LOPEZ-CORTES Mariano-ext
solution? -Message d'origine- De : Erick Erickson [mailto:erickerick...@gmail.com] Envoyé : lundi 19 février 2018 18:18 À : solr-user Objet : Re: Facet performance problem I'm confused here. What do you mean by "facet filtering"? Your examples have no facets at all, just

Re: Facet performance problem

2018-02-19 Thread Erick Erickson
I'm confused here. What do you mean by "facet filtering"? Your examples have no facets at all, just a _filter query_. I'll assume you want to use filter query (fq), and faceting has nothing to do with it. This is one of the tricky bits of docValues. While it's _possible_ to search on a field that'

RE: Facet performance

2013-10-23 Thread Lemke, Michael SZ/HZA-ZSW
On Tue, October 22, 2013 5:23 PM Michael Lemke wrote: >On Tue, October 22, 2013 9:23 AM Toke Eskildsen wrote: >>On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote: >>> QTime fc: >>>never returns, webserver restarts itself after 30 min with 100% CPU >>> load >> >>It might be

RE: Facet performance

2013-10-23 Thread Toke Eskildsen
On Tue, 2013-10-22 at 17:25 +0200, Lemke, Michael SZ/HZA-ZSW wrote: > On Tue, October 22, 2013 11:54 AM Andre Bois-Crettez wrote: > >> This is with Solr 1.4. > >Really ? > >This sound really outdated to me. > >Have you tried a tried more recent version, 4.5 just went out ? > > Sorry, can't. Too m

RE: Facet performance

2013-10-22 Thread Lemke, Michael SZ/HZA-ZSW
On Tue, October 22, 2013 11:54 AM Andre Bois-Crettez wrote: > >> This is with Solr 1.4. >Really ? >This sound really outdated to me. >Have you tried a tried more recent version, 4.5 just went out ? Sorry, can't. Too much `grown' stuff. Michael

RE: Facet performance

2013-10-22 Thread Lemke, Michael SZ/HZA-ZSW
On Tue, October 22, 2013 9:23 AM Toke Eskildsen wrote: >On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote: >> QTime fc: >>never returns, webserver restarts itself after 30 min with 100% CPU >> load > >It might be because it dies due to garbage collection. But since more >m

Re: Facet performance

2013-10-22 Thread Andre Bois-Crettez
This is with Solr 1.4. Really ? This sound really outdated to me. Have you tried a tried more recent version, 4.5 just went out ? -- André Bois-Crettez Software Architect Search Developer http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège socia

RE: Facet performance

2013-10-22 Thread Toke Eskildsen
On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote: > QTime enum: > 1st call: 1200 > subsequent calls: 200 Those numbers seems fine. > QTime fc: >never returns, webserver restarts itself after 30 min with 100% CPU > load It might be because it dies due to garba

RE: Facet performance

2013-10-21 Thread Lemke, Michael SZ/HZA-ZSW
On Mon, October 21, 2013 10:04 AM, Toke Eskildsen wrote: >On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote: >> Toke Eskildsen wrote: >> > Unfortunately the enum-solution is normally quite slow when there >> > are enough unique values to trigger the "too many > values"-exception. >

RE: Facet performance

2013-10-21 Thread Toke Eskildsen
On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote: > Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote: > > Unfortunately the enum-solution is normally quite slow when there > > are enough unique values to trigger the "too many > values"-exception. > > [...] > > [...] And yes

RE: Facet performance

2013-10-18 Thread Chris Hostetter
: >> 1. q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 : >> 2. q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 : > : >> The only difference is am empty facet.prefix in the

Re: Facet performance

2013-10-18 Thread Otis Gospodnetic
DocValues is the new black http://wiki.apache.org/solr/DocValues Otis -- Solr & ElasticSearch Support -- http://sematext.com/ SOLR Performance Monitoring -- http://sematext.com/spm On Fri, Oct 18, 2013 at 12:30 PM, Lemke, Michael SZ/HZA-ZSW wrote: > Toke Eskildsen [mailto:t...@statsbiblioteke

RE: Facet performance

2013-10-18 Thread Lemke, Michael SZ/HZA-ZSW
Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote: >Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote: >> 1. >> q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 >> 2. >> q=word&facet.field=CONTENT&facet=true&facet.prefix=a&

RE: Facet performance

2013-10-18 Thread Toke Eskildsen
Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote: > 1. > q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 > 2. > q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0 > T

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
gt; On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote: >> It seems BOBO-Browse is alternate faceting engine; would be interesting to >> compare performance with SOLR... Distributed? >> >> >> -Original Message- >> From: Jason Rutherglen [mailto:jason.rutherg.

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
h SOLR... Distributed? > > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: August-12-09 6:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > For your fields with many terms you may want to

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
uld be interesting to > compare performance with SOLR... Distributed? > > > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: August-12-09 6:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > For

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
Interesting, it has "BoboRequestHandler implements SolrRequestHandler" - easy to try it; and shards support [Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? [Jason Rutherglen] For your fields with many terms

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
It seems BOBO-Browse is alternate faceting engine; would be interesting to compare performance with SOLR... Distributed? -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-12-09 6:12 PM To: solr-user@lucene.apache.org Subject: Re: facet

RE: facet performance tips

2009-08-13 Thread Fuad Efendi
vé [mailto:jerome.et...@gmail.com] Sent: August-13-09 5:38 AM To: solr-user@lucene.apache.org Subject: Re: facet performance tips Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different t

Re: facet performance tips

2009-08-13 Thread Jérôme Etévé
Thanks everyone for your advices. I increased my filterCache, and the faceting performances improved greatly. My faceted field can have at the moment ~4 different terms, so I did set a filterCache size of 5 and it works very well. However, I'm planning to increase the number of terms to

Re: facet performance tips

2009-08-12 Thread Stephen Duncan Jr
Note that depending on the profile of your field (full text and how many unique terms on average per document), the improvements from 1.4 may not apply, as you may exceed the limits of the new faceting technique in Solr 1.4. -Stephen On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote: > Yes, in

Re: facet performance tips

2009-08-12 Thread Jason Rutherglen
> > > > -Original Message----- > From: Erik Hatcher [mailto:ehatc...@apache.org] > Sent: August-12-09 2:12 PM > To: solr-user@lucene.apache.org > Subject: Re: facet performance tips > > Yes, increasing the filterCache size will help with Solr 1.3 > performance.

RE: facet performance tips

2009-08-12 Thread Fuad Efendi
al Message- From: Erik Hatcher [mailto:ehatc...@apache.org] Sent: August-12-09 2:12 PM To: solr-user@lucene.apache.org Subject: Re: facet performance tips Yes, increasing the filterCache size will help with Solr 1.3 performance. Do note that trunk (soon Solr 1.4) has dramatically improved fac

Re: facet performance tips

2009-08-12 Thread Erik Hatcher
Yes, increasing the filterCache size will help with Solr 1.3 performance. Do note that trunk (soon Solr 1.4) has dramatically improved faceting performance. Erik On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote: Hi everyone, I'm using some faceting on a solr index containing ~ 1

RE: facet performance tips

2009-08-12 Thread Manepalli, Kalyan
Jerome, Yes you need to increase the filterCache size to something close to unique number of facet elements. But also consider the RAM required to accommodate the increase. I did see a significant performance gain by increasing the filterCache size Thanks, Kalyan Manepalli -Origina

Re: Facet Performance

2008-07-31 Thread Funtick
Hoss, This is still extremely interesting area for possible improvements; I simply don't want the topic to die http://www.nabble.com/Facet-Performance-td7746964.html http://issues.apache.org/jira/browse/SOLR-665 http://issues.apache.org/jira/browse/SOLR-667 http://issues.apache.org/jira/browse/

Re: Facet Performance

2006-12-08 Thread Andrew Nagy
Erik Hatcher wrote: On Dec 8, 2006, at 2:15 PM, Andrew Nagy wrote: My data is 492,000 records of book data. I am faceting on 4 fields: author, subject, language, format. Format and language are fairly simple as their are only a few unique terms. Author and subject however are much differe

Re: Facet Performance

2006-12-08 Thread Chris Hostetter
: Unfortunately which strategy will be chosen is currently undocumented : and control is a bit oblique: If the field is tokenized or multivalued : or Boolean, the FilterQuery method will be used; otherwise the : FieldCache method. I expect I or others will improve that shortly. Bear in mind, wh

Re: Facet Performance

2006-12-08 Thread Erik Hatcher
On Dec 8, 2006, at 2:15 PM, Andrew Nagy wrote: My data is 492,000 records of book data. I am faceting on 4 fields: author, subject, language, format. Format and language are fairly simple as their are only a few unique terms. Author and subject however are much different in that there are

Re: Facet Performance

2006-12-08 Thread Andrew Nagy
J.J. Larrea wrote: Unfortunately which strategy will be chosen is currently undocumented and control is a bit oblique: If the field is tokenized or multivalued or Boolean, the FilterQuery method will be used; otherwise the FieldCache method. I expect I or others will improve that shortly.

Re: Facet Performance

2006-12-08 Thread Yonik Seeley
On 12/8/06, J.J. Larrea <[EMAIL PROTECTED]> wrote: Unfortunately which strategy will be chosen is currently undocumented and control is a bit oblique: If the field is tokenized or multivalued or Boolean, the FilterQuery method will be used; otherwise the FieldCache method. If anyone had time

Re: Facet Performance

2006-12-08 Thread J.J. Larrea
Andrew Nagy, ditto on what Yonik said. Here is some further elaboration: I am doing much the same thing (faceting on Author etc.). When my Author field was defined as a solr.TextField, even using solr.KeywordTokenizerFactory so it wasn't actually tokenized, the faceting code chose the QueryFilt

Re: Facet Performance

2006-12-08 Thread Andrew Nagy
Yonik Seeley wrote: Are they multivalued, and do they need to be. Anything that is of type "string" and not multivalued will use the lucene FieldCache rather than the filterCache. The author field is multivalued. Will this be a strong performance issue? I could make multiple author fields as

Re: Facet Performance

2006-12-08 Thread Yonik Seeley
On 12/8/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: Chris Hostetter wrote: >: Could you suggest a better configuration based on this? > >If that's what your stats look like after a single request, then i would >guess you would need to make your cache size at least 1.6 million in order >for it to

Re: Facet Performance

2006-12-08 Thread Andrew Nagy
Chris Hostetter wrote: : Could you suggest a better configuration based on this? If that's what your stats look like after a single request, then i would guess you would need to make your cache size at least 1.6 million in order for it to be of any use in improving your facet speed. Would th

Re: Facet Performance

2006-12-08 Thread Yonik Seeley
On 12/8/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : My data is 492,000 records of book data. I am faceting on 4 fields: : author, subject, language, format. : Format and language are fairly simple as their are only a few unique : terms. Author and subject however are much different in that

Re: Facet Performance

2006-12-08 Thread Chris Hostetter
: Here are the stats, Im still a newbie to SOLR, so Im not totally sure : what this all means: : lookups : 1530036 : hits : 2 : hitratio : 0.00 : inserts : 1530035 : evictions : 1504435 : size : 25600 those numbers are telling you that your cache is capable of holding 25,600 items. you have attem

Re: Facet Performance

2006-12-08 Thread Andrew Nagy
Yonik Seeley wrote: On 12/8/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: I changed the filterCache to the following: However a search that normally takes .04s is taking 74 seconds once I use the facets since I am faceting on 4 fields. The first time or subsequent times? Is your filterCa

Re: Facet Performance

2006-12-08 Thread Yonik Seeley
On 12/8/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: I changed the filterCache to the following: However a search that normally takes .04s is taking 74 seconds once I use the facets since I am faceting on 4 fields. The first time or subsequent times? Is your filterCache big enough yet? Wha

Re: Facet Performance

2006-12-08 Thread Andrew Nagy
Yonik Seeley wrote: 1) facet on single-valued strings if you can 2) if you can't do (1) then enlarge the fieldcache so that the number of filters (one per possible term in the field you are filtering on) can fit. I changed the filterCache to the following: However a search that normally t

Re: Facet Performance

2006-12-07 Thread Chris Hostetter
: > This seems like a poor choice for an element : > name. Why not just name the element what is in the "name" attribute? : > It would make parsing much easier! : : When the XML was first conceived, there was a preference for limiting : the number of tags. : The structure could have been inverted

Re: Facet Performance

2006-12-07 Thread Yonik Seeley
On 12/7/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: On complaint about the faceting though: Why is the element that is returned called "1st". I think maybe you are seeing lst (it starts with an L, not a one). It is short for NamedList, an ordered list who's elements are named. This seems like

Re: Facet Performance

2006-12-07 Thread Andrew Nagy
Yonik Seeley wrote: 1) facet on single-valued strings if you can 2) if you can't do (1) then enlarge the fieldcache so that the number of filters (one per possible term in the field you are filtering on) can fit. I wll try this out. 3) facet counts are limited to the results of the query, fi

Re: Facet Performance

2006-12-07 Thread Yonik Seeley
On 12/7/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: In September there was a thread [1] on this list about heterogeneous facets and their performance. I am having a similar issue and am unclear as the resolution of this thread. I performed a search against my dataset (492,000 records) and got th

Re: Facet performance with heterogeneous 'facets'?

2006-09-22 Thread Yonik Seeley
On 9/22/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Excellent news; as you guessed, my schema was (for some reason) set to version 1.0. Yeah, I just realized that having "version" right next to "name" would lead people to think it's "their" version number, when it's really Solr's version nu

Re: Facet performance with heterogeneous 'facets'?

2006-09-22 Thread Michael Imbeault
Excellent news; as you guessed, my schema was (for some reason) set to version 1.0. This also caused some of the problems I had with the original SolrPHP (parsing the wrong response). But better yet, the 800 seconds query is now running in 0.5-2 seconds! Amazing optimization! I can now do face

Re: Facet performance with heterogeneous 'facets'?

2006-09-22 Thread Yonik Seeley
On 9/22/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: I upgraded to the most recent Solr build (9-22) and sadly it's still really slow. 800 seconds query with a single facet on first_author, 15 millions documents total, the query return 180. Maybe i'm doing something wrong? Also, this is on my

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Michael Imbeault
I upgraded to the most recent Solr build (9-22) and sadly it's still really slow. 800 seconds query with a single facet on first_author, 15 millions documents total, the query return 180. Maybe i'm doing something wrong? Also, this is on my personal desktop; not on a server. Still, I'm getting

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: Hang in there Michael, a fix is on the way for your scenario (and subscribe to solr-dev if you want to stay on the bleeding edge): OK, the optimization has been checked in. You can checkout from svn and build Solr, or wait for the 9-22 nightl

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Btw, Any plans for a facets cache? Maybe a partial one (like caching top terms to implement some other optimizations). My general philosophy on caching in Solr has been to cache things the client can't: elemental things, or *parts* of req

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Michael Imbeault
Dude, stop being so awesome (and the whole Solr team). Seriously! Every problem / request (MoreLikeThis class, change AND/OR preference programatically, etc) I've submitted to this mailing list has received a quick, more-than-I-ever-expected answer. I'll subscribe to the dev list (been reading

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Yonik Seeley
On 9/21/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: It turns out that journal_name has 17038 different tokens, which is manageable, but first_author has > 400 000. I don't think this will ever yield good performance, so i might only do journal_name facets. Hang in there Michael, a fix is on

Re: Facet performance with heterogeneous 'facets'?

2006-09-21 Thread Michael Imbeault
Thanks for all the great answers. Quick Question: did you say you are faceting on the first name field seperately from the last name field? ... why? You misunderstood. I'm doing faceting on first author, and last author of the list. Life science papers have authors list, and the first one is u

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Chris Hostetter
: I just updated the comments in solrconfig.xml: I've tweaked the SolrCaching wiki page to include some of this info as well, feel free to add any additional info you think would be helpful to other people (or ask any qestions about it if any of it still doesn't seem clear to you)... htt

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Chris Hostetter
: > when we facet on the authors, we start with : > that list and go in order, generating their facet constraint count using : > the DocSet intersection just like we currently do ... if we reach our : > facet.limit before we reach the end of hte list and the lowest constraint : > count is higher t

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Yonik Seeley
On 9/19/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: Quick Question: did you say you are faceting on the first name field seperately from the last name field? ... why? You'll probably see a sharp increase in performacne if you have a single untokenized author field containing hte full name an

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Chris Hostetter
Quick Question: did you say you are faceting on the first name field seperately from the last name field? ... why? You'll probably see a sharp increase in performacne if you have a single untokenized author field containing hte full name and you facet on that -- there will be a lot less unique te

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Yonik Seeley
I just updated the comments in solrconfig.xml: On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Another followup: I bumped all the caches in solrconfig.xml to size="1600384" initialSize="400096" autowarmCount="400096" It seemed to fix the problem on a very smal

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Joachim Martin
Michael Imbeault wrote: Also, is there any plans to add an option not to run a facet search if the result set is too big? To avoid 40 seconds queries if the docset is too large... You could run one query with facet=false, check the result size and then run it again (should be fast because i

Re: Facet performance with heterogeneous 'facets'?

2006-09-19 Thread Yonik Seeley
On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Yonik Seeley wrote: > For cases like "author", if there is only one value per document, then > a possible fix is to use the field cache. If there can be multiple > occurrences, there doesn't seem to be a good way that preserves exact > coun

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Michael Imbeault
Another followup: I bumped all the caches in solrconfig.xml to size="1600384" initialSize="400096" autowarmCount="400096" It seemed to fix the problem on a very small index (facets on last and first author fields, + 12 range date facets, sub 0.3 seconds for queries). I'll check

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Michael Imbeault
Yonik Seeley wrote: I noticed this too, and have been thinking about ways to fix it. The root of the problem is that lucene, like all full-text search engines, uses inverted indicies. It's fast and easy to get all documents for a particular term, but getting all terms for a document documents is

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Yonik Seeley
On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Just a little follow-up - I did a little more testing, and the query takes 20 seconds no matter what - If there's one document in the results set, or if I do a query that returns all 13 documents. Yes, currently the same strategy is al

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Yonik Seeley
On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote: Been playing around with the news 'facets search' and it works very well, but it's really slow for some particular applications. I've been trying to use it to display the most frequent authors of articles I noticed this too, and have been

Re: Facet performance with heterogeneous 'facets'?

2006-09-18 Thread Michael Imbeault
Just a little follow-up - I did a little more testing, and the query takes 20 seconds no matter what - If there's one document in the results set, or if I do a query that returns all 13 documents. It seems something isn't right... it looks like solr is doing faceted search on the whole ind