queryResultCache doesn’t really help with faceting, even if it’s hit for the
main query.
That cache only stores a subset of the hits, and to facet properly you need
the entire result set….
> On Jun 17, 2020, at 12:47 PM, James Bodkin
> wrote:
>
> We've noticed that the filterCache uses a sig
We've noticed that the filterCache uses a significant amount of memory, as
we've assigned 8GB Heap per instance.
In total, we have 32 shards with 2 replicas, hence (8*32*2) 512G Heap space
alone, further memory is required to ensure the index is always memory mapped
for performance reasons.
Ide
To expand a bit on what Erick said regarding performance: my sense is
that the RefGuide assertion that "docValues=true" makes faceting
"faster" could use some qualification/clarification. My take, fwiw:
First, to reiterate/paraphrase what Erick said: the "faster" assertion
is not comparing to "fac
Uninvertible is a safety mechanism to make sure that you don’t _unknowingly_
use a docValues=false
field for faceting/grouping/sorting/function queries. The primary point of
docValues=true is twofold:
1> reduce Java heap requirements by using the OS memory to hold it
2> uninverting can be expen
The large majority of the relevant fields have fewer than 20 unique values. We
have two fields over that with 150 unique values and 5300 unique values
retrospectively.
At the moment, our filterCache is configured with a maximum size of 8192.
From the DocValues documentation
(https://lucene.apac
Ah, interesting! So if the number of possible values is low (like <= 10),
it is faster to *not *use docvalues on that (indexed) faceted field?
Does this hold true even when using faceting techniques like tag and
exclusion?
Thanks,
Anthony
On Wed, Jun 17, 2020 at 9:37 AM David Smiley
wrote:
> I
I strongly recommend setting indexed=true on a field you facet on for the
purposes of efficient refinement (fq=field:value). But it strictly isn't
required, as you have discovered.
~ David
On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney
wrote:
> facet.method=enum works by executing a query (ag
facet.method=enum works by executing a query (against indexed values)
for each indexed value in a given field (which, for indexed=false, is
"no values"). So that explains why facet.method=enum no longer works.
I was going to suggest that you might not want to set indexed=false on
the docValues face
Thanks, I've implemented some queries that improve the first-hit execution for
faceting.
Since turning off indexed on those fields, we've noticed that facet.method=enum
no longer returns the facets when used.
Using facet.method=fc/fcs is significantly slower compared to facet.method=enum
for us
Ok, I see the disconnect... Necessary parts if the index are read from disk
lazily. So your newSearcher or firstSearcher query needs to do whatever
operation causes the relevant parts of the index to be read. In this case,
probably just facet on all the fields you care about. I'd add sorting too
if
I've been trying to build a query that I can use in newSearcher based off the
information in your previous e-mail. I thought you meant to build a *:* query
as per Query 1 in my previous e-mail but I'm still seeing the first-hit
execution.
Now I'm wondering if you meant to create a *:* query with
Did you try the autowarming like I mentioned in my previous e-mail?
> On Jun 16, 2020, at 10:18 AM, James Bodkin
> wrote:
>
> We've changed the schema to enable docValues for these fields and this led to
> an improvement in the response time. We found a further improvement by also
> switching
We've changed the schema to enable docValues for these fields and this led to
an improvement in the response time. We found a further improvement by also
switching off indexed as these fields are used for faceting and filtering only.
Since those changes, we've found that the first-execution for q
I question whether fiterCache has anything to do with it, I suspect what’s
really happening is that first time you’re reading the relevant bits from disk
into memory. And to double check you should have docVaues enabled for all these
fields. The “uninverting” process can be very expensive, and
We've run the performance test after changing the fields to be of the type
string. We're seeing improved performance, especially after the first time the
query has run. The first run is taking around 1-2 seconds rather than 6-8
seconds and when the filter cache is present, the response time is a
Could you explain why the performance is an issue for points-based fields? I've
looked through the referenced issue (which is fixed in the version we are
running) but I'm missing the link between the two. Is there an issue to improve
this for points-based fields?
We're going to change the field
There’s a lot of confusion about using points-based fields for faceting, see:
https://issues.apache.org/jira/browse/SOLR-13227 for instance.
Two options you might try:
1> copyField to a string field and facet on that (won’t work, of course, for
any kind of interval/range facet)
2> use the deprec
On 2/20/2018 1:18 AM, LOPEZ-CORTES Mariano-ext wrote:
We return a facet list of values in "motifPresence" field (person status).
Status:
[ ] status1
[x] status2
[x] status3
The user then selects 1 or multiple status (It's this step that we called "facet
filtering
solution?
-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com]
Envoyé : lundi 19 février 2018 18:18
À : solr-user
Objet : Re: Facet performance problem
I'm confused here. What do you mean by "facet filtering"? Your examples have no
facets at all, just
I'm confused here. What do you mean by "facet filtering"? Your
examples have no facets at all, just a _filter query_.
I'll assume you want to use filter query (fq), and faceting has
nothing to do with it. This is one of the tricky bits of docValues.
While it's _possible_ to search on a field that'
On Tue, October 22, 2013 5:23 PM Michael Lemke wrote:
>On Tue, October 22, 2013 9:23 AM Toke Eskildsen wrote:
>>On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
>>> QTime fc:
>>>never returns, webserver restarts itself after 30 min with 100% CPU
>>> load
>>
>>It might be
On Tue, 2013-10-22 at 17:25 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
> On Tue, October 22, 2013 11:54 AM Andre Bois-Crettez wrote:
> >> This is with Solr 1.4.
> >Really ?
> >This sound really outdated to me.
> >Have you tried a tried more recent version, 4.5 just went out ?
>
> Sorry, can't. Too m
On Tue, October 22, 2013 11:54 AM Andre Bois-Crettez wrote:
>
>> This is with Solr 1.4.
>Really ?
>This sound really outdated to me.
>Have you tried a tried more recent version, 4.5 just went out ?
Sorry, can't. Too much `grown' stuff.
Michael
On Tue, October 22, 2013 9:23 AM Toke Eskildsen wrote:
>On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
>> QTime fc:
>>never returns, webserver restarts itself after 30 min with 100% CPU
>> load
>
>It might be because it dies due to garbage collection. But since more
>m
This is with Solr 1.4.
Really ?
This sound really outdated to me.
Have you tried a tried more recent version, 4.5 just went out ?
--
André Bois-Crettez
Software Architect
Search Developer
http://www.kelkoo.com/
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège socia
On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
> QTime enum:
> 1st call: 1200
> subsequent calls: 200
Those numbers seems fine.
> QTime fc:
>never returns, webserver restarts itself after 30 min with 100% CPU
> load
It might be because it dies due to garba
On Mon, October 21, 2013 10:04 AM, Toke Eskildsen wrote:
>On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
>> Toke Eskildsen wrote:
>> > Unfortunately the enum-solution is normally quite slow when there
>> > are enough unique values to trigger the "too many > values"-exception.
>
On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
> Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
> > Unfortunately the enum-solution is normally quite slow when there
> > are enough unique values to trigger the "too many > values"-exception.
> > [...]
>
> [...] And yes
: >> 1.
q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
: >> 2.
q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
: >
: >> The only difference is am empty facet.prefix in the
DocValues is the new black
http://wiki.apache.org/solr/DocValues
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
SOLR Performance Monitoring -- http://sematext.com/spm
On Fri, Oct 18, 2013 at 12:30 PM, Lemke, Michael SZ/HZA-ZSW
wrote:
> Toke Eskildsen [mailto:t...@statsbiblioteke
Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
>Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote:
>> 1.
>> q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
>> 2.
>> q=word&facet.field=CONTENT&facet=true&facet.prefix=a&
Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote:
> 1.
> q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
> 2.
> q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
> T
gt; On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote:
>> It seems BOBO-Browse is alternate faceting engine; would be interesting to
>> compare performance with SOLR... Distributed?
>>
>>
>> -Original Message-
>> From: Jason Rutherglen [mailto:jason.rutherg.
h SOLR... Distributed?
>
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: August-12-09 6:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> For your fields with many terms you may want to
uld be interesting to
> compare performance with SOLR... Distributed?
>
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: August-12-09 6:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> For
Interesting, it has "BoboRequestHandler implements SolrRequestHandler"
- easy to try it; and shards support
[Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be
interesting to
compare performance with SOLR... Distributed?
[Jason Rutherglen] For your fields with many terms
It seems BOBO-Browse is alternate faceting engine; would be interesting to
compare performance with SOLR... Distributed?
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: August-12-09 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet
vé [mailto:jerome.et...@gmail.com]
Sent: August-13-09 5:38 AM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips
Thanks everyone for your advices.
I increased my filterCache, and the faceting performances improved greatly.
My faceted field can have at the moment ~4 different t
Thanks everyone for your advices.
I increased my filterCache, and the faceting performances improved greatly.
My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.
However, I'm planning to increase the number of terms to
Note that depending on the profile of your field (full text and how many
unique terms on average per document), the improvements from 1.4 may not
apply, as you may exceed the limits of the new faceting technique in Solr
1.4.
-Stephen
On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher wrote:
> Yes, in
>
>
>
> -Original Message-----
> From: Erik Hatcher [mailto:ehatc...@apache.org]
> Sent: August-12-09 2:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> Yes, increasing the filterCache size will help with Solr 1.3
> performance.
al Message-
From: Erik Hatcher [mailto:ehatc...@apache.org]
Sent: August-12-09 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips
Yes, increasing the filterCache size will help with Solr 1.3
performance.
Do note that trunk (soon Solr 1.4) has dramatically improved fac
Yes, increasing the filterCache size will help with Solr 1.3
performance.
Do note that trunk (soon Solr 1.4) has dramatically improved faceting
performance.
Erik
On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
Hi everyone,
I'm using some faceting on a solr index containing ~ 1
Jerome,
Yes you need to increase the filterCache size to something close to
unique number of facet elements. But also consider the RAM required to
accommodate the increase.
I did see a significant performance gain by increasing the filterCache size
Thanks,
Kalyan Manepalli
-Origina
Hoss,
This is still extremely interesting area for possible improvements; I simply
don't want the topic to die
http://www.nabble.com/Facet-Performance-td7746964.html
http://issues.apache.org/jira/browse/SOLR-665
http://issues.apache.org/jira/browse/SOLR-667
http://issues.apache.org/jira/browse/
Erik Hatcher wrote:
On Dec 8, 2006, at 2:15 PM, Andrew Nagy wrote:
My data is 492,000 records of book data. I am faceting on 4 fields:
author, subject, language, format.
Format and language are fairly simple as their are only a few unique
terms. Author and subject however are much differe
: Unfortunately which strategy will be chosen is currently undocumented
: and control is a bit oblique: If the field is tokenized or multivalued
: or Boolean, the FilterQuery method will be used; otherwise the
: FieldCache method. I expect I or others will improve that shortly.
Bear in mind, wh
On Dec 8, 2006, at 2:15 PM, Andrew Nagy wrote:
My data is 492,000 records of book data. I am faceting on 4
fields: author, subject, language, format.
Format and language are fairly simple as their are only a few
unique terms. Author and subject however are much different in
that there are
J.J. Larrea wrote:
Unfortunately which strategy will be chosen is currently undocumented and
control is a bit oblique: If the field is tokenized or multivalued or Boolean,
the FilterQuery method will be used; otherwise the FieldCache method. I expect
I or others will improve that shortly.
On 12/8/06, J.J. Larrea <[EMAIL PROTECTED]> wrote:
Unfortunately which strategy will be chosen is currently undocumented and
control is a bit oblique: If the field is tokenized or multivalued or Boolean,
the FilterQuery method will be used; otherwise the FieldCache method.
If anyone had time
Andrew Nagy, ditto on what Yonik said. Here is some further elaboration:
I am doing much the same thing (faceting on Author etc.). When my Author field
was defined as a solr.TextField, even using solr.KeywordTokenizerFactory so it
wasn't actually tokenized, the faceting code chose the QueryFilt
Yonik Seeley wrote:
Are they multivalued, and do they need to be.
Anything that is of type "string" and not multivalued will use the
lucene FieldCache rather than the filterCache.
The author field is multivalued. Will this be a strong performance issue?
I could make multiple author fields as
On 12/8/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:
Chris Hostetter wrote:
>: Could you suggest a better configuration based on this?
>
>If that's what your stats look like after a single request, then i would
>guess you would need to make your cache size at least 1.6 million in order
>for it to
Chris Hostetter wrote:
: Could you suggest a better configuration based on this?
If that's what your stats look like after a single request, then i would
guess you would need to make your cache size at least 1.6 million in order
for it to be of any use in improving your facet speed.
Would th
On 12/8/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: My data is 492,000 records of book data. I am faceting on 4 fields:
: author, subject, language, format.
: Format and language are fairly simple as their are only a few unique
: terms. Author and subject however are much different in that
: Here are the stats, Im still a newbie to SOLR, so Im not totally sure
: what this all means:
: lookups : 1530036
: hits : 2
: hitratio : 0.00
: inserts : 1530035
: evictions : 1504435
: size : 25600
those numbers are telling you that your cache is capable of holding 25,600
items. you have attem
Yonik Seeley wrote:
On 12/8/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:
I changed the filterCache to the following:
However a search that normally takes .04s is taking 74 seconds once I
use the facets since I am faceting on 4 fields.
The first time or subsequent times?
Is your filterCa
On 12/8/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:
I changed the filterCache to the following:
However a search that normally takes .04s is taking 74 seconds once I
use the facets since I am faceting on 4 fields.
The first time or subsequent times?
Is your filterCache big enough yet? Wha
Yonik Seeley wrote:
1) facet on single-valued strings if you can
2) if you can't do (1) then enlarge the fieldcache so that the number
of filters (one per possible term in the field you are filtering on)
can fit.
I changed the filterCache to the following:
However a search that normally t
: > This seems like a poor choice for an element
: > name. Why not just name the element what is in the "name" attribute?
: > It would make parsing much easier!
:
: When the XML was first conceived, there was a preference for limiting
: the number of tags.
: The structure could have been inverted
On 12/7/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:
On complaint about the faceting though: Why is the element that is
returned called "1st".
I think maybe you are seeing lst (it starts with an L, not a one).
It is short for NamedList, an ordered list who's elements are named.
This seems like
Yonik Seeley wrote:
1) facet on single-valued strings if you can
2) if you can't do (1) then enlarge the fieldcache so that the number
of filters (one per possible term in the field you are filtering on)
can fit.
I wll try this out.
3) facet counts are limited to the results of the query, fi
On 12/7/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:
In September there was a thread [1] on this list about heterogeneous
facets and their performance. I am having a similar issue and am
unclear as the resolution of this thread.
I performed a search against my dataset (492,000 records) and got th
On 9/22/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Excellent news; as you guessed, my schema was (for some reason) set to
version 1.0.
Yeah, I just realized that having "version" right next to "name" would
lead people to think it's "their" version number, when it's really
Solr's version nu
Excellent news; as you guessed, my schema was (for some reason) set to
version 1.0. This also caused some of the problems I had with the
original SolrPHP (parsing the wrong response).
But better yet, the 800 seconds query is now running in 0.5-2 seconds!
Amazing optimization! I can now do face
On 9/22/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
I upgraded to the most recent Solr build (9-22) and sadly it's still
really slow. 800 seconds query with a single facet on first_author, 15
millions documents total, the query return 180. Maybe i'm doing
something wrong? Also, this is on my
I upgraded to the most recent Solr build (9-22) and sadly it's still
really slow. 800 seconds query with a single facet on first_author, 15
millions documents total, the query return 180. Maybe i'm doing
something wrong? Also, this is on my personal desktop; not on a server.
Still, I'm getting
On 9/21/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
Hang in there Michael, a fix is on the way for your scenario (and
subscribe to solr-dev if you want to stay on the bleeding edge):
OK, the optimization has been checked in. You can checkout from svn
and build Solr, or wait for the 9-22 nightl
On 9/21/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Btw, Any plans for a facets cache?
Maybe a partial one (like caching top terms to implement some other
optimizations). My general philosophy on caching in Solr has been to
cache things the client can't: elemental things, or *parts* of
req
Dude, stop being so awesome (and the whole Solr team). Seriously! Every
problem / request (MoreLikeThis class, change AND/OR preference
programatically, etc) I've submitted to this mailing list has received a
quick, more-than-I-ever-expected answer.
I'll subscribe to the dev list (been reading
On 9/21/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
It turns out that journal_name has 17038 different tokens, which is
manageable, but first_author has > 400 000. I don't think this will ever
yield good performance, so i might only do journal_name facets.
Hang in there Michael, a fix is on
Thanks for all the great answers.
Quick Question: did you say you are faceting on the first name field
seperately from the last name field? ... why?
You misunderstood. I'm doing faceting on first author, and last author
of the list. Life science papers have authors list, and the first one is
u
: I just updated the comments in solrconfig.xml:
I've tweaked the SolrCaching wiki page to include some of this info as
well, feel free to add any additional info you think would be helpful to
other people (or ask any qestions about it if any of it still doesn't seem
clear to you)...
htt
: > when we facet on the authors, we start with
: > that list and go in order, generating their facet constraint count using
: > the DocSet intersection just like we currently do ... if we reach our
: > facet.limit before we reach the end of hte list and the lowest constraint
: > count is higher t
On 9/19/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
Quick Question: did you say you are faceting on the first name field
seperately from the last name field? ... why?
You'll probably see a sharp increase in performacne if you have a single
untokenized author field containing hte full name an
Quick Question: did you say you are faceting on the first name field
seperately from the last name field? ... why?
You'll probably see a sharp increase in performacne if you have a single
untokenized author field containing hte full name and you facet on that --
there will be a lot less unique te
I just updated the comments in solrconfig.xml:
On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Another followup: I bumped all the caches in solrconfig.xml to
size="1600384"
initialSize="400096"
autowarmCount="400096"
It seemed to fix the problem on a very smal
Michael Imbeault wrote:
Also, is there any plans to add an option not to run a facet search if
the result set is too big? To avoid 40 seconds queries if the docset
is too large...
You could run one query with facet=false, check the result size and then
run it again (should be fast because i
On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Yonik Seeley wrote:
> For cases like "author", if there is only one value per document, then
> a possible fix is to use the field cache. If there can be multiple
> occurrences, there doesn't seem to be a good way that preserves exact
> coun
Another followup: I bumped all the caches in solrconfig.xml to
size="1600384"
initialSize="400096"
autowarmCount="400096"
It seemed to fix the problem on a very small index (facets on last and
first author fields, + 12 range date facets, sub 0.3 seconds for
queries). I'll check
Yonik Seeley wrote:
I noticed this too, and have been thinking about ways to fix it.
The root of the problem is that lucene, like all full-text search
engines, uses inverted indicies. It's fast and easy to get all
documents for a particular term, but getting all terms for a document
documents is
On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Just a little follow-up - I did a little more testing, and the query
takes 20 seconds no matter what - If there's one document in the results
set, or if I do a query that returns all 13 documents.
Yes, currently the same strategy is al
On 9/18/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Been playing around with the news 'facets search' and it works very
well, but it's really slow for some particular applications. I've been
trying to use it to display the most frequent authors of articles
I noticed this too, and have been
Just a little follow-up - I did a little more testing, and the query
takes 20 seconds no matter what - If there's one document in the results
set, or if I do a query that returns all 13 documents.
It seems something isn't right... it looks like solr is doing faceted
search on the whole ind
84 matches
Mail list logo