Re: How large is your solr index?

2014-12-29 Thread Alexandre Rafalovitch
On 29 December 2014 at 21:42, Shawn Heisey wrote: > I believe it would be useful to organize a session at Lucene Revolution, > possibly more interactive than a straight presentation, where users with > very large indexes are encouraged to attend. The point of this session > would be to exchange w

Re: no replication using commitWithin via curl?

2014-12-29 Thread Brendan Humphreys
Thanks for the reply Shawn. Yes I am using 4.10.2 - I should have mentioned that in my original post. I can confirm there are not multiple versions of solr in the classpath; Our SolrCloud nodes are built programmatically in AWS using the download package of a specific Solr version as a starting po

Re: How large is your solr index?

2014-12-29 Thread Shawn Heisey
On 12/29/2014 2:30 PM, Toke Eskildsen wrote: > At Lucene/Solr Revolution 2014, Grant Ingersoll also asked for user stories > and pointed to https://wiki.apache.org/solr/SolrUseCases - sadly it has not > caught on. The only entry is for our (State and University Library, Denmark) > setup with 21T

Re: poor performance when connecting to CloudSolrServer(zkHosts) using solrJ

2014-12-29 Thread Shawn Heisey
On 12/29/2014 6:52 PM, zhangjia...@dcits.com wrote: > I setups a SolrCloud, and code a simple solrJ program to query solr > data as below, but it takes about 40 seconds to new CloudSolrServer > instance,less than 100 miliseconds is acceptable. what is going on when new > CloudSolrServer? and

Re: no replication using commitWithin via curl?

2014-12-29 Thread Shawn Heisey
On 12/29/2014 4:11 PM, Brendan Humphreys wrote: > We've noticed that when we send deletes to our SolrCloud cluster via curl > with the param commitWithin=1 specified, the deletes are applied and > are visible to the leader node, but aren't replicated to other nodes. > > The problem can be work

poor performance when connecting to CloudSolrServer(zkHosts) using solrJ

2014-12-29 Thread zhangjianad
hi, I setups a SolrCloud, and code a simple solrJ program to query solr data as below, but it takes about 40 seconds to new CloudSolrServer instance,less than 100 miliseconds is acceptable. what is going on when new CloudSolrServer? and how to fix this issue? String zkHost = "bice

Re: no replication using commitWithin via curl?

2014-12-29 Thread Brendan Humphreys
I've confirmed this is also happens with deletes via SolrJ with commitWithin - the document is deleted from the leader but the delete is not replicated to other nodes. Document updates are replicated fine. Any help in debugging this behaviour would be much appreciated. Cheers, -Brendan On 30 Dec

Re: How to implement multi-set in a Solr schema.

2014-12-29 Thread Meraj A. Khan
Thanks Jack, inorder to not affect the query time , what are the options available to handle this as index time ? So that I group all the similar books at index time by placing them in some kind of a set , and retrive all the contents of the set at query time if any one them matches the query. On D

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Alexandre Rafalovitch
On 29 December 2014 at 18:07, Jonathan Rochkind wrote: > I do not understand what separate query/index analysis you are suggesting to > accomplish what I wanted. I am sure you do know that, but just in case. At the moment, you have only one analyzer chain, so it applies at both index and query ti

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Erick Erickson
Jonathan: Well, it works if you set splitOnCaseChange="0" in just the query part of the analysis chain. I probably mislead you a bit months ago, WDFF is intended for this case iff you expect the case change to generate _tokens_ that are individually meaningful.. And unfortunately "significant" in

no replication using commitWithin via curl?

2014-12-29 Thread Brendan Humphreys
Hi, We've noticed that when we send deletes to our SolrCloud cluster via curl with the param commitWithin=1 specified, the deletes are applied and are visible to the leader node, but aren't replicated to other nodes. The problem can be worked around by issuing an explicit (hard) "commit". Is

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
On 12/29/14 5:24 PM, Jack Krupansky wrote: WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean while the query might be sloppy. You need to separate the index and query analyzers and they need to respect that distinction I do not understand what separate q

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Alexandre Rafalovitch
> splitOnCaseChange="1" So, it does not get split during indexing because there is no case change. But does get split during search and now you are looking for partial tokens against a combined single-token in the index. And not matching. The WordDelimiterFilterFactory is more for product IDs tha

RE: Solr performance issues

2014-12-29 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: > I've the same index with a bit different schema and 200M documents, > installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size > of index is about 1.5TB, have many updates every 5 minutes, complex queries > and faceting with resp

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jack Krupansky
WDF is powerful, but it is not magic. In general, the indexed data is expected to be clean while the query might be sloppy. You need to separate the index and query analyzers and they need to respect that distinction - the index analyzer would index as you have indicated, indexing both the unitary

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-29 Thread Jonathan Rochkind
Okay, some months later I've come back to this with an isolated reproduction case. Thanks very much for any advice or debugging help you can give. The WordDelimiter filter is making a mixed-case query NOT match the single-case source, when it ought to. I am in Solr 4.3 (sorry, that's what we

RE: How large is your solr index?

2014-12-29 Thread Toke Eskildsen
Bram Van Dam [bram.van...@intix.eu] wrote: > I'm trying to get a feel of how large Solr can grow without slowing down > too much. We're looking into a use-case with up to 100 billion documents > (SolrCloud), and we're a little afraid that we'll end up requiring 100 > servers to pull it off. One re

Re: How large is your solr index?

2014-12-29 Thread Jack Krupansky
And that Lucene index document limit includes deleted and updated documents, so even if your actual document count stays under 2^31-1, deleting and updating documents can push the apparent document count over the limit unless you very aggressively merge segments to expunge deleted documents. -- Ja

Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote: > What do you mean with "important parts of index"? and how to calculate their > size? I have no formal education in what's important when it comes to doing a query, but I can make some educated guesses. Starting with this as a reference: http://

[ANNOUNCE] Apache Solr 4.10.3 released

2014-12-29 Thread Mark Miller
December 2014, Apache Solr™ 4.10.3 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.3 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: How large is your solr index?

2014-12-29 Thread ralph tice
Like all things it really depends on your use case. We have >160B documents in our largest SolrCloud and doing a *:* to get that count takes ~13-14 seconds. Doing a text:happy query only takes ~3.5-3.6 seconds cold, subsequent queries for the same terms take <500ms. We have a little over 3TB of

Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks Shawn. What do you mean with "important parts of index"? and how to calculate their size? Thanks, Mahmoud Sent from my iPhone > On Dec 29, 2014, at 8:19 PM, Shawn Heisey wrote: > >> On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote: >> I've the same index with a bit different schema and

Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote: > I've the same index with a bit different schema and 200M documents, > installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size > of index is about 1.5TB, have many updates every 5 minutes, complex queries > and faceting with respon

Re: Loading data to FieldValueCache

2014-12-29 Thread Erick Erickson
bq: There will be no updates to my index. So, no worries about ageing out or garbage collection This is irrelevant to aging out filterCache entries, this is purely query time. bq: Each having 64 GB of RAM, out of which I am allocating 45 GB to Solr. It's usually a mistake to give Solr so much ra

Re: How large is your solr index?

2014-12-29 Thread Erick Erickson
When you say 2B docs on a single Solr instance, are you talking only one shard? Because if you are, you're very close to the absolute upper limit of a shard, internally the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems. But yeah, your 100B documents are going to use up a lot

Re: Highlighting do not show for some solr results

2014-12-29 Thread Erick Erickson
two things: 1> attachments rarely make it through the e-mail system, you have to put things like screenshots out on different servers and provide a link. 2> I did see the attachment in my moderator role and it's not clear what your problem really is. I'm _guessing_ that your complaint is that the

Highlighting do not show for some solr results

2014-12-29 Thread Volel, Andre
Hello, I turned on highlighting and some records do not have highlight text (See image below): [cid:image001.png@01D02358.A0E23D60] Does anyone know why this is happening and how I can fix it? Here is the querystring I am using "&wt=json&json.wrf=?&indent=true&hl=true&hl.fl=title,content&hl

Re: Loading data to FieldValueCache

2014-12-29 Thread Yonik Seeley
On Fri, Dec 26, 2014 at 12:26 PM, Erick Erickson wrote: > I don't know the complete algorithm, but if the number of docs that > satisfy the fq is "small enough", > then just the internal Lucene doc IDs are stored rather than a bitset. If smaller than maxDoc/64 ids are collected, a sorted int set

How large is your solr index?

2014-12-29 Thread Bram Van Dam
Hi folks, I'm trying to get a feel of how large Solr can grow without slowing down too much. We're looking into a use-case with up to 100 billion documents (SolrCloud), and we're a little afraid that we'll end up requiring 100 servers to pull it off. The largest index we currently have is ~2

Re: SolrCloud & Paging on large indexes

2014-12-29 Thread Bram Van Dam
On 12/23/2014 04:07 PM, Toke Eskildsen wrote: The beauty of the cursor is that it is has little to no overhead, relative to a standard top-X sorted search. A standard search uses a sliding window over the full result set, as does a cursor-search. Same amount of work. It is just a question of l

Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks all. I've the same index with a bit different schema and 200M documents, installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size of index is about 1.5TB, have many updates every 5 minutes, complex queries and faceting with response time of 100ms that is acceptable for us.