from:"Toke Eskildsen"

Re: Unexpected Performance decrease when upgrading Solr 5.5.2 to 8.5.2

2020-09-16 Thread Toke Eskildsen

cValues changed with schema version 1.6 (https://issues.apache.org/jira/browse/SOLR-8220). Have you checked that the same number of fields are returned for the two setups? - Toke Eskildsen?

Re: Solr Float/Double multivalues fields

2020-07-03 Thread Toke Eskildsen

03/docvalues-vs-stored-fields-apache-solr-features-and-performance-smackdown.html BTW: The documentation should definitely mention that stored preserves order & duplicates. It is not obvious. - Toke Eskildsen, Royal Danish Library

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Toke Eskildsen

to be processed, it indicates that the cluster is overloaded. Increasing the timeout is just a band-aid. - Toke Eskildsen, Royal Danish Library

Re: multivalue faceting term optimization

2020-03-09 Thread Toke Eskildsen

hash:01* OR hash:02* OR hash:03* OR hash:04* -> Facets for 1950K documents (100M/256 * 5) Prefix queries might prove to be too expensive, so you could also create fields with random values from 0-9, 0-99, 0-999 etc. and do exact match filtering on those to get the number of hits down. - Toke Eskildsen, Royal Danish Library

Re: Number of requested rows

2020-02-05 Thread Toke Eskildsen

eeding-up-core-search/ and there is https://issues.apache.org/jira/browse/LUCENE-8875 which takes care of the Sentinel thing in solr 8.2. - Toke Eskildsen, Royal Danish Library

Re: Solr 7.7 heap space is getting full

2020-01-22 Thread Toke Eskildsen

e problem. - Toke Eskildsen, Royal Danish Library

Re: How to block expensive solr queries

2019-10-08 Thread Toke Eskildsen

On Mon, 2019-10-07 at 10:18 -0700, Wei wrote: > /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.cal > cdistinct=true ... > Is there a way to block certain solr queries based on url pattern? > i.e. ignore the stats.calcdistinct request in this case. It sounds like it is possible f

Re: Throughput does not increase in spite of low CPU usage

2019-10-01 Thread Toke Eskildsen

e shard? Single shard indexes maximizes throughput at the possible cost of latency, so that seems fitting for your requirements. - Toke Eskildsen, Royal Danish Library

Re: Incremental export of a huge collection

2019-09-09 Thread Toke Eskildsen

cessorFactory that is mentioned: http://lucene.apache.org/solr/7_2_1/solr-core/org/apache/solr/schema/DatePointField.html - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-09 Thread Toke Eskildsen

te to read up on that and respond in that thread, to avoid hi-jacking this one. It probably won't be this week as Real Work is heating up. - Toke Eskildsen, Royal Danish Library

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-08 Thread Toke Eskildsen

last year we experienced similar > problems. The iterator-based DocValues implementation in Solr 7 has a performance issue with large segments, with symptoms akin to SOLR-8096. If you have not already solved your problems, Solr 8 (with an upgraded index) might help. - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-07 Thread Toke Eskildsen

try disabling grouping fully. It does not explain the difference between Solr 4 & 8, but I agree with David that we need to isolate what causes the overall slowdown first, before we can attempt to fix the Solr 4 vs 8 thing. - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-04 Thread Toke Eskildsen

a very large result set requires more CPU power to uncompress in Solr 8 (but less IO)) * Do you have any response related defaults in your solrconfig.xml, such as faceting or grouping? (You might be doing heavy aggregation even if you don't explicitly ask for it) - Toke Eskildsen, Royal Danish Library

Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Toke Eskildsen

ome special high-performance setup with a budget for tuning: Matching terms and joining filters is core Solr (Lucene really) functionality. Plain query & filter-matching time tend to be dwarfed by aggregations (grouping, faceting, stats). - Toke Eskildsen

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-03 Thread Toke Eskildsen

very large documents? How big is your index in bytes? - Toke Eskildsen

Re: Multi-lingual Search & Accent Marks

2019-08-31 Thread Toke Eskildsen

now and keep > going back and forth on whether we should preserve accent marks. Going with what we do, my answer would be: Yes, do preserve and also remove :-). You could even have 3 or more levels of normalisation, depending on how much time you have for polishing. - Toke Eskildsen

Re: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-28 Thread Toke Eskildsen

getSorted for each collect call? Could you share your code somewhere? - Toke Eskildsen

Re: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-27 Thread Toke Eskildsen

lues in Solr, so the safe (best performance) solution would be to implement something like the pseudo code I wrote earlier. - Toke Eskildsen, Royal Danish Library

Re: SOLR 7+ / Lucene 7+ and performance issues with DelegatingCollector and PostFilter

2019-08-27 Thread Toke Eskildsen

&& isValid(dv.binaryValue().utf8ToString()) in your collect method. https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/index/DocValues.html#getSorted-org.apache.lucene.index.LeafReader-java.lang.String - If you want to speed it up further, you can use BytesRefs as keys in your c

Re: Configuration recommendation for SolrCloud

2019-06-29 Thread Toke Eskildsen

ts knows which cluster to use? Can it be divided further? - Toke Eskildsen

Re: Is Solr can do that ?

2019-06-22 Thread Toke Eskildsen

obs. Scaling this specialized setup to your corpus size would require about 3TB of SSD, 64MB RAM and 4 CPU-cores, divided among 4 shards. You are likely to need quite a lot more than that, so this is just to say that at this scale the use of the index matters _a lot_. - Toke Eskildsen

Re: not able to optimize

2019-06-04 Thread Toke Eskildsen

as worst case for storage usage during optimize is a total of 3*index size. - Toke Eskildsen, Royal Danish Library

Re: Graph query extremely slow

2019-05-22 Thread Toke Eskildsen

fference between evaluating a graph query (any query really) and asking for 1M results to be returned. With that in mind, what do you set rows to? - Toke Eskildsen, Royal Danish Library

Re: Solr8.0.0 Performance Test

2019-05-21 Thread Toke Eskildsen

ld-value-faceting-parameters - Toke Eskildsen, Royal Danish Library

Re: Graph query extremely slow

2019-05-20 Thread Toke Eskildsen

pache.org/jira/browse/SOLR-13013 If it is easy for you to test, you could try Solr 8 as that should work better for random access of DocValues. - Toke Eskildsen, Royal Danish Library

Re: Solr8.0.0 Performance Test

2019-05-20 Thread Toke Eskildsen

e indexes and/or setups where performance is very important. - Toke Eskildsen, Royal Danish library

Re: Performance of /export requests

2019-05-11 Thread Toke Eskildsen

regression for DocValues that is very visible when using export. See https://issues.apache.org/jira/browse/SOLR-13013), so I would expect it to be slower than Solr 5. You could try with Solr 8 where this regression should be mitigated somewhat. - Toke Eskildsen

Re: Performance problems with extremely common terms in collection (Solr 7.4)

2019-04-08 Thread Toke Eskildsen

t well with that. Instead you can look at Common Grams, where your high-frequency words gets concatenated with surrounding words. This only works with phrases though. There's a nice article at https://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 - Toke Eskildsen, Royal Danish Library

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen

aring me the work. - Toke Eskildsen, Royal Danish Library

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen

f possible) will be for a later Solr version. Currently it is not possible to tweak the docValues indexing parameters outside of code changes. Do note that we're still operating on guesses here. The cause for your regression might easily be elsewhere. - Toke Eskildsen, Royal Danish Library

Re: Slower indexing speed in Solr 8.0.0

2019-04-02 Thread Toke Eskildsen

e tiny, but mistakes happen. With that in mind, do you have DocValues enabled for a lot of your fields? Performance issues like this one are notoriously hard to debug remote. Is it possible for you to share your setup and your test data? - Toke Eskildsen, Royal Danish Library

Re: Using copyFields

2019-03-28 Thread Toke Eskildsen

, the query > doesn't fetch results. You need to tell Solr which fields it should search: df=cfield https://lucene.apache.org/solr/guide/7_7/the-standard-query-parser.html#standard-query-parser-parameters - Toke Eskildsen, Royal Danish Library

Re: Solr index slow response

2019-03-18 Thread Toke Eskildsen

due to stop-the-world garbage collections. Try dialing Xmx _way_ down: If your batches are only 5MB each, try Xmx=20g or less. I know that the stats above says that Solr uses 111GB, but the JVM has a tendency to expand the heap quite a lot when it is getting hammered. If you want to check beforehand, you can see how much memeory is freed from full GCs in the GC-log. - Toke Eskildsen, Royal Danish Library

Re: [ANNOUNCE] Apache Solr 8.0.0 released

2019-03-14 Thread Toke Eskildsen

On Thu, 2019-03-14 at 13:16 +0100, jim ferenczi wrote: > http://lucene.apache.org/solr/8_0_0/changes/Changes.html Thank you for the hard work of rolling the release! Looking forward to upgrading. - Toke Eskildsen, Royal Danish Library

Re: What is the benefit of stored="true" in *PointFields

2019-02-07 Thread Toke Eskildsen

ent retrieval) doc values performance for indexes with many documents. - Toke Eskildsen, royal Danish Library

Re: Infrastructure required for SOLR 7.5

2018-12-12 Thread Toke Eskildsen

hat you are unsure of. - Toke Eskildsen

Re: URL Case Sensitive/Insensitive

2018-12-11 Thread Toke Eskildsen

hey do add up. For most practical purposes (URL-lookup & grouping, following links between archived pages, resolving embedded resources from pages) we use the heavily normalised URL. - Toke Eskildsen

Re: Moving Solr index from Staging to Production

2018-11-28 Thread Toke Eskildsen

Arunan Sugunakumar wrote: > https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html We (also?) prefer to keep our stage/build setup separate from production. Backup + restore works well for us. It is very fast, as it is basically just copying the segment files. - T

Re: Able to search with indexed=false and docvalues=true

2018-11-21 Thread Toke Eskildsen

dea: Issue a query with debug=sanity and get a report from checks on both the underlying index and the issued query for indicators of problems: https://github.com/tokee/lucene-solr/issues/54 - Toke Eskildsen, Royal Danish Library

Re: Able to search with indexed=false and docvalues=true

2018-11-20 Thread Toke Eskildsen

I'll just note that faceting on a DocValues=true indexed=false field on a multi-shard index also has a performance penalty as the field will be slow-searched (using the DocValues) in the secondary fine-counting phase. - Toke Eskildsen, Royal Danish Library

Re: Median in Solr json facet api

2018-11-14 Thread Toke Eskildsen

On Wed, 2018-11-14 at 17:53 +0530, Anil wrote: > I don;t see median aggregation in JSON facet api documentation. It's the 50 percentile: https://lucene.apache.org/solr/guide/7_5/json-facet-api.html#metrics-example - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-11-14 Thread Toke Eskildsen

-8374 * Experiment with different amounts of concurrent requests to see what gives the optimum throughput. This also tells you how much extra hardware you need, if you decide you need to expand.. - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-11-08 Thread Toke Eskildsen

ully that would unearth very few problematic parts, such as regexp, function or prefix-wildcard queries. There might be ways to replace or tune those. - Toke Eskildsen, Royal Danish Library

Re: Re: SolrCloud scaling/optimization for high request rate

2018-11-05 Thread Toke Eskildsen

k check if it is the resolving of specific field values that is the problem. If using fl=id speeds up substantially, the next step would be to add fields gradually until (hopefully) there is a sharp performance decrease. - Toke Eskildsen, Royal Danish Library

Re: Index optimization takes too long

2018-11-04 Thread Toke Eskildsen

e bottleneck. Are you looking at overall CPU usage or single-core? When we run force merge, we have a single core at 100% while the rest are idle. NB: There is currently a thread "Static index, fastest way to do forceMerge" in the Lucene users mailinglist, which seem to be quite parallel t

Re: Re: SolrCloud scaling/optimization for high request rate

2018-11-01 Thread Toke Eskildsen

t runs, where you change different components and tell us roughly how that affects performance? 1) Only request simple sorting by score 2) Reduce rows to 0 3) Increase rows to 100 4) Set fl=id only - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-10-29 Thread Toke Eskildsen

measuring (which of course also takes resources, this time in the form of work hours). My rough suggestion of a factor 10 for your system is guesswork erring on the side of a high number. - Toke Eskildsen, Royal Danish Library

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Toke Eskildsen

with a max amount of concurrent connections and a sensible queue. Preferably after a bit of testing to locale where the highest throughput is. It won't make you hit your overall goal, but it can move you closer to it. - Toke Eskildsen

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Toke Eskildsen

lues=true, Solr treats all existing documents as having docValues enabled for that field. As there is no docValue content, DocValues-aware functionality such as sorting and faceting will not work for that field, until the documents has been re-indexed. - Toke Eskildsen

Re: SolrCloud scaling/optimization for high request rate

2018-10-26 Thread Toke Eskildsen

d help with the patch. - Toke Eskildsen

Re: Device I/O trouble with solr 7.5

2018-10-22 Thread Toke Eskildsen

-8374). With that in mind, could you tell me * How many documents you have in your index? * Whether you use stored or docValues for the fields that you retrieve as part of the search result? * If you perform heavy faceting, grouping or stats? Maybe provide a sample query, if you are able? Than

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Toke Eskildsen

simple answer there. If you have an index that you update very rarely, it can save memory and processing power. If you have a live index where you add and delete documents, it will probably be a bad idea. One strategy used with time series data is to have old and immutable data in dedicated collections, which can then be optimized. - Toke Eskildsen

Re: Solr Streaming Queries Performance Issues [v7.2.1]

2018-09-28 Thread Toke Eskildsen

up, so I would expect streaming to do the same. I would not expect a 30% increase to cause something serious on that account though. How many documents in your index? - Toke Eskildsen, Royal Danish Library

Re: OOM Solr 4.8.1

2018-09-18 Thread Toke Eskildsen

for excessive traffic in short bursts, not for a sustained high traffic level. This advice is independent of Shawn's BTW. You could increase your server capabiblities 10-fold and it would still apply. - Toke Eskildsen, Royal Danish Library

Re: Multiple solr instances per host vs Multiple cores in same solr instance

2018-09-03 Thread Toke Eskildsen

ind, have you considered posting a write-up of your hard work somewhere? It seems a shame only to have is as an input on this mailing list. - Toke Eskildsen, Royal Danish Library

Re: Solr admin client crash - caused by too many fields

2018-08-14 Thread Toke Eskildsen

fields is an outlier in Solr Land and as such warrants caution and consideration. - Toke Eskildsen, Royal Danish Library

Re: Solr Server crashes when requesting a result with too large resultRows

2018-08-08 Thread Toke Eskildsen

the result set. I would argue your OOM with small result sets and huge rows is a good thing: You encounter the problem immediately, instead of hitting it at some random time when a match-a-lot query is issued by a user. - Toke Eskildsen, Royal Danish Library

Re: facet.method=uif not working in solr cloud?

2018-02-08 Thread Toke Eskildsen

The relevant JIRA seems to be https://issues.apache.org/jira/ browse/SOLR-8988 Try setting facet.distrib.mco=true - Toke Eskildsen, Royal Danish Library

Re: With 100% CPU usage giving out of memory exception and solr is not responding

2017-12-29 Thread Toke Eskildsen

rCache or faceting on a high-cardinality field. If the query above is representative of your general queries, I'll guess it's the many docs + large filterCache one. It's fairly easy to check: * What is your Xmx? * How many documents in your index? * What is your filterCache size? - Toke Eskildsen

Re: OOM spreads to other replica's/HA when OOM

2017-12-19 Thread Toke Eskildsen

ring queries from the same user and then blacklisting the user? But what if the query is a link shared on a forum? And so forth. Hardening by blacklisting is a game that is hard to win. So to paraphrase Shawn: Make sure your users cannot issue OOMing queries. - Toke Eskildsen, Royal Danish Library - Aarhus

Re: JVM GC Issue

2017-12-02 Thread Toke Eskildsen

Dominique Bejean wrote: > Hi, Thank you for the explanations about faceting. I was thinking the hit > count had a biggest impact on facet memory lifecycle. Only if you have a very high facet.limit. Could you provide us with a typical query, including all the parameters? - Toke Eskildsen

Re: JVM GC Issue

2017-12-01 Thread Toke Eskildsen

0-12:00. If you cannot share, please check if you have excessive traffic around that time or if there is a lot of UnInverting going on (triggered by faceting on non.DocValues String fields). I know your post implies that you have already done so, so this is more of a sanity check. - Toke Eskildsen

Re: OutOfMemoryError in 6.5.1

2017-11-29 Thread Toke Eskildsen

ll use Threads (wrapped as Futures) as they are easy to work with. Getting into thousands of connections in Solr seems like a danger sigh to me, whether they are done async or not. - Toke Eskildsen

Re: OutOfMemoryError in 6.5.1

2017-11-29 Thread Toke Eskildsen

ve ~20 shards in your cloud? The issue of the default 10K limit is an old one: https://issues.apache.org/jira/browse/SOLR-7344 I suggest you put a proxy in from of your Solr-cloud to handle queueing of incoming requests. - Toke Eskildsen

Re: Huge Query execution time for multiple ORs

2017-11-28 Thread Toke Eskildsen

lter-queries for all the different groups so that the users does not pay the first-call penalty. This requires your filter- cache to be large enough to hold all the author lists. - Toke Eskildsen, Royal Danish Library

Re: Solr7: Very High number of threads on aggregator node

2017-11-25 Thread Toke Eskildsen

check if you have any "Overlapping onDeckSearchers" in your solr.log? - Toke Eskildsen

Re: Solr7: Very High number of threads on aggregator node

2017-11-20 Thread Toke Eskildsen

ueries has finished or not? It it is the latter, one explanation could be that your Solr 7 setup is simply slower on average to respond than your Solr 4 setup, to the point where it cannot keep up with the influx of queries. - Toke Eskildsen

Re: Solr7: Very High number of threads on aggregator node

2017-11-18 Thread Toke Eskildsen

ncurrent requests and a queue to hold the rest? Even with an overprovisioning of 4 requests/CPU-core to get them running close to 100% we're talking 1000 CPU-cores in your system. - Toke Eskildsen

Re: Faceting Word Count

2017-11-09 Thread Toke Eskildsen

ent search criteria: Do they all take ~1 minute or just the first? - Toke Eskildsen, Royal Danish Library

Re: Facets based on sampling

2017-10-24 Thread Toke Eskildsen

idea at https://sbdevel.wordpress.com/2014/03/17/fast-faceting-with-high-cardinality-and-small-result-set/ - Toke Eskildsen

Re: Solr deep paging queries run very slow due to redundant q param

2017-10-24 Thread Toke Eskildsen

r, it is a design decision. In order to provide pagination without recomputing the result set, you would need a guaranteed server-side state. Solr does not implement that pattern and thanks for that. - Toke Eskildsen, Royal Danish Library

Re: Really slow facet performance in 6.6

2017-10-23 Thread Toke Eskildsen

false since they are multi-valued). Debug info below. docValues works fine with multi-values (at least for Strings). - Toke Eskildsen

Re: 3 color jvm memory usage bar

2017-10-23 Thread Toke Eskildsen

27;t find it very usable for observing and tweaking heap size. The GC-log is better. - Toke Eskildsen, Royal Danish Library

Re: Solr staying constant on popularity indexes

2017-10-10 Thread Toke Eskildsen

complicated syntax Solr > uses. I think V2 APIs are coming to address this, but they did come a > bit late in the game. I guess you mean JSON APIs? Anyway, I fully agree that the old Solr syntax is extremely clunky as soon as we move beyond the simple "just supply a few search terms&

Re: Doubt about facet with dates

2017-10-06 Thread Toke Eskildsen

h-dates.html#Workin gwithDates-DateMath Your query would be something like mydate:[* TO NOW/DAY] AND mydate:[NOW+1DAY/DAY TO *] - Toke Eskildsen, Royal Danish Library

Re: FilterCache size should reduce as index grows?

2017-10-06 Thread Toke Eskildsen

g much further ahead, the whole caching system would benefit from having constraints that encompasses all the shards & collections served in the same Solr. Unfortunately it is a daunting task just to figure out the overall principles in this. - Toke Eskildsen, Royal Danish Library

Re: FilterCache size should reduce as index grows?

2017-10-05 Thread Toke Eskildsen

ed to 32. Best solution: Use maxSizeMB (if it works) Second best solution: Reduce to 32 or less Third best, but often used, solution: Hope that most of the entries are sparse and will remain so - Toke Eskildsen, Royal Danish Library

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Toke Eskildsen

Are you indexing while you search? If so, you need to set auto-warm or state a few explicit warmup-queries. If not, your measuring will not be representative as it will be on first-searches, which are always slower than warmed-searches. - Toke Eskildsen, Royal Danish Library

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-19 Thread Toke Eskildsen

ly ask for the number you need. Same goes for rows BTW. - Toke Eskildsen

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-19 Thread Toke Eskildsen

> I hope the heap size will continue to sustain for the index size. You can check the memory usage in the admin GUI. - Toke Eskildsen, Royal Danish Library

Re: Freeze Index

2017-09-14 Thread Toke Eskildsen

ith a 2GB JVM or something like that. One of the symptoms for having too large a memory allocation for the JVM are occasional long pauses due to garbage collection. However, you should not lose anything - it is just a pause. Can you describe in more detail what you mean by freeze and losing data

Re: slow solr facet processing

2017-09-05 Thread Toke Eskildsen

aking a shot at that. A fairly easy optimization would be to replace the BytesRef[] indexedTermsArray with a BytesRefArray. - Toke Eskildsen, Royal Danish Library

Re: slow solr facet processing

2017-09-04 Thread Toke Eskildsen

memory? What I am aiming at is if this is primarily a "many relatively slow random access"-thing or more due to the way DocValues are represented in the segments (the codec). - Toke Eskildsen, Royal Danish Library

Re: How many collections in a solrcloud are too many, how to determine this?

2017-08-09 Thread Toke Eskildsen

n-trivial overhead going from 1 to more than 1 shard. If your collections are not too large, chances are that you will lower your hardware requirements (and/or improve response times) by using only 1 shard/collection. - Toke Eskildsen, Royal Danish Library

Re: Solr 6.5.1 crashing when too many queries with error or high memory usage are queried

2017-07-02 Thread Toke Eskildsen

d for Solr, but even then you might want to have a hard limit, just to avoid the occasional "cat steps on F5 and the browser issues a gazillion requests"-scenario. -- Toke Eskildsen, Royal Danish Library

Re: Questions about typical/simple clustered Solr software and hardware architecture

2017-06-24 Thread Toke Eskildsen

garbage collections can take a long time. We have a setup with 25 nodes per physical server, each with 8GB of heap. Running that as a single node per physical machine would mean ~200GB heap. I am sure it is possible to wrangle such a beast, but I'd rather spend my energy on Solr instead. - Toke Eskildsen

Re: Will Solr support google like organic search ?

2017-06-09 Thread Toke Eskildsen

could say. Out-of-the-box Solr is pure relevance ranked. By the definition in the Wikipedia-article, it is already Organic Search. I think you need to go back to your client and ask what the client thinks "Organic Search" is. -- Toke Eskildsen, Royal Danish Library

Re: Slow inserting with SolrCloud when increasing replicas

2017-06-07 Thread Toke Eskildsen

single physical machine that could be an explanation. What is your hardware-setup? -- Toke Eskildsen, Royal Danish Library

Re: solr 6 at scale

2017-05-25 Thread Toke Eskildsen

an expert in segment merge mechanics). We're also using a 1 Solr/shard setup, but with SolrCloud. Our initial rationale for 1 Solr/shard was to avoid long GC-pauses due to large heaps, but that does not seem to be a problem here. Now we stick to it as it works fine and makes for simple lo

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen

Nawab Zada Asad Iqbal wrote: > @Toke, I stumbled upon your page last week but it seems that your huge > index doesn't receive a lot of query traffic. It switches between two kinds of usage: Everyday use is very low traffic by researchers using it interactively: 1-2 simultaneous queries, with fa

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen

Shawn Heisey wrote: > On 5/24/2017 3:44 AM, Toke Eskildsen wrote: >> It is relatively easy to downgrade to an earlier release within the >> same major version. We have not switched to 6.5.1 simply because we >> have no pressing need for it - Solr 6.3 works well for us. &

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen

works well for us. I guess it depends quite a bit on your need for stability. We are a library and uptime is only "best effort". -- Toke Eskildsen, Royal Danish Library

Re: Recommended index-size per core

2017-05-10 Thread Toke Eskildsen

he filter-cache (secondarily the other caches, but the filter-cache tends to be the large one). A heap of 10GB might very well be fine for handling your whole 50GB index. If that is on a 64GB machine, the remaining 54GB of RAM (minus the other stuff that is running) ought to ensure a fully cached

Re: Solr Query Performance benchmarking

2017-04-28 Thread Toke Eskildsen

Shawn Heisey wrote: > Adding more shards as Toke suggested *might* help,[...] I seem to have phrased my suggestion poorly. What I meant to suggest was a switch to a single shard (with 4 replicas) setup, instead of the current 2 shards (with 2 replicas). - Toke

Re: Solr Query Performance benchmarking

2017-04-28 Thread Toke Eskildsen

Why don't you use q instead of fq for the part of your request that changes? -- Toke Eskildsen, Royal Danish Library

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-15 Thread Toke Eskildsen

Chetas Joshi wrote: > Thanks for the insights into the memory requirements. Looks like cursor > approach is going to require a lot of memory for millions of documents. Sorry, that is a premature conclusion from your observations. > If I run a query that returns only 500K documents still keeping

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-11 Thread Toke Eskildsen

? Does it mean solr will serve stale data( i.e. > send stale data to the slaves) ignoring the changes from the second > commit? [...] Sorry, I am not that familiar with the details of master-slave-setups. -- Toke Eskildsen, Royal Danish Library

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-06 Thread Toke Eskildsen

wo problems may be linked. Quick sanity check: Look for "Overlapping onDeckSearchers" in your solr.log to see if your memory problems are caused by multiple open searchers: https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm ingSearchers.3DX.22_mean.3F -- Toke Eskildsen, Royal Danish Library

Re: SOLR Data Locality

2017-03-17 Thread Toke Eskildsen

tand the expected gain of adding replicas, if the data are remote. Why can't the replica Solrs run on the nodes with the data? Do you have very CPU-intensive search? - Toke Eskildsen

Re: Indexing CPU performance

2017-03-15 Thread Toke Eskildsen

e. You can get a detailed breakdown by doing VisualVM profiling and doing a snapshot instead of sampling, but be prepared to restart your Solr afterwards as that is quite intrusive. Another (and simpler) option would be to check how much IO-wait there is with 'top' from a shell. - Tok

1 2 3 4 5 6 >

1 - 100 of 594 matches

Mail list logo