On 29 December 2014 at 21:42, Shawn Heisey wrote:
> I believe it would be useful to organize a session at Lucene Revolution,
> possibly more interactive than a straight presentation, where users with
> very large indexes are encouraged to attend. The point of this session
> would be to exchange w
Thanks for the reply Shawn.
Yes I am using 4.10.2 - I should have mentioned that in my original post. I
can confirm there are not multiple versions of solr in the classpath; Our
SolrCloud nodes are built programmatically in AWS using the download
package of a specific Solr version as a starting po
On 12/29/2014 2:30 PM, Toke Eskildsen wrote:
> At Lucene/Solr Revolution 2014, Grant Ingersoll also asked for user stories
> and pointed to https://wiki.apache.org/solr/SolrUseCases - sadly it has not
> caught on. The only entry is for our (State and University Library, Denmark)
> setup with 21T
On 12/29/2014 6:52 PM, zhangjia...@dcits.com wrote:
> I setups a SolrCloud, and code a simple solrJ program to query solr
> data as below, but it takes about 40 seconds to new CloudSolrServer
> instance,less than 100 miliseconds is acceptable. what is going on when new
> CloudSolrServer? and
On 12/29/2014 4:11 PM, Brendan Humphreys wrote:
> We've noticed that when we send deletes to our SolrCloud cluster via curl
> with the param commitWithin=1 specified, the deletes are applied and
> are visible to the leader node, but aren't replicated to other nodes.
>
> The problem can be work
hi,
I setups a SolrCloud, and code a simple solrJ program to query solr
data as below, but it takes about 40 seconds to new CloudSolrServer
instance,less than 100 miliseconds is acceptable. what is going on when new
CloudSolrServer? and how to fix this issue?
String zkHost = "bice
I've confirmed this is also happens with deletes via SolrJ with
commitWithin - the document is deleted from the leader but the delete is
not replicated to other nodes. Document updates are replicated fine.
Any help in debugging this behaviour would be much appreciated.
Cheers,
-Brendan
On 30 Dec
Thanks Jack, inorder to not affect the query time , what are the options
available to handle this as index time ? So that I group all the similar
books at index time by placing them in some kind of a set , and retrive all
the contents of the set at query time if any one them matches the query.
On D
On 29 December 2014 at 18:07, Jonathan Rochkind wrote:
> I do not understand what separate query/index analysis you are suggesting to
> accomplish what I wanted.
I am sure you do know that, but just in case. At the moment, you have
only one analyzer chain, so it applies at both index and query ti
Jonathan:
Well, it works if you set splitOnCaseChange="0" in just the query part
of the analysis chain. I probably mislead you a bit months ago, WDFF
is intended for this case iff you expect the case change to generate
_tokens_ that are individually meaningful.. And unfortunately
"significant" in
Hi,
We've noticed that when we send deletes to our SolrCloud cluster via curl
with the param commitWithin=1 specified, the deletes are applied and
are visible to the leader node, but aren't replicated to other nodes.
The problem can be worked around by issuing an explicit (hard) "commit".
Is
On 12/29/14 5:24 PM, Jack Krupansky wrote:
WDF is powerful, but it is not magic. In general, the indexed data is
expected to be clean while the query might be sloppy. You need to separate
the index and query analyzers and they need to respect that distinction
I do not understand what separate q
> splitOnCaseChange="1"
So, it does not get split during indexing because there is no case
change. But does get split during search and now you are looking for
partial tokens against a combined single-token in the index. And not
matching.
The WordDelimiterFilterFactory is more for product IDs tha
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
> I've the same index with a bit different schema and 200M documents,
> installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
> of index is about 1.5TB, have many updates every 5 minutes, complex queries
> and faceting with resp
WDF is powerful, but it is not magic. In general, the indexed data is
expected to be clean while the query might be sloppy. You need to separate
the index and query analyzers and they need to respect that distinction -
the index analyzer would index as you have indicated, indexing both the
unitary
Okay, some months later I've come back to this with an isolated
reproduction case. Thanks very much for any advice or debugging help you
can give.
The WordDelimiter filter is making a mixed-case query NOT match the
single-case source, when it ought to.
I am in Solr 4.3 (sorry, that's what we
Bram Van Dam [bram.van...@intix.eu] wrote:
> I'm trying to get a feel of how large Solr can grow without slowing down
> too much. We're looking into a use-case with up to 100 billion documents
> (SolrCloud), and we're a little afraid that we'll end up requiring 100
> servers to pull it off.
One re
And that Lucene index document limit includes deleted and updated
documents, so even if your actual document count stays under 2^31-1,
deleting and updating documents can push the apparent document count over
the limit unless you very aggressively merge segments to expunge deleted
documents.
-- Ja
On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote:
> What do you mean with "important parts of index"? and how to calculate their
> size?
I have no formal education in what's important when it comes to doing a
query, but I can make some educated guesses.
Starting with this as a reference:
http://
December 2014, Apache Solr™ 4.10.3 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.10.3
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted
Like all things it really depends on your use case. We have >160B
documents in our largest SolrCloud and doing a *:* to get that count takes
~13-14 seconds. Doing a text:happy query only takes ~3.5-3.6 seconds cold,
subsequent queries for the same terms take <500ms. We have a little over
3TB of
Thanks Shawn.
What do you mean with "important parts of index"? and how to calculate their
size?
Thanks,
Mahmoud
Sent from my iPhone
> On Dec 29, 2014, at 8:19 PM, Shawn Heisey wrote:
>
>> On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
>> I've the same index with a bit different schema and
On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote:
> I've the same index with a bit different schema and 200M documents,
> installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
> of index is about 1.5TB, have many updates every 5 minutes, complex queries
> and faceting with respon
bq: There will be no updates to my index. So, no worries about ageing
out or garbage collection
This is irrelevant to aging out filterCache entries, this is purely query time.
bq: Each having 64 GB of RAM, out of which I am allocating 45 GB to Solr.
It's usually a mistake to give Solr so much ra
When you say 2B docs on a single Solr instance, are you talking only one shard?
Because if you are, you're very close to the absolute upper limit of a
shard, internally
the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems.
But yeah, your 100B documents are going to use up a lot
two things:
1> attachments rarely make it through the e-mail system, you have to put
things like screenshots out on different servers and provide a link.
2> I did see the attachment in my moderator role and it's not clear what
your problem really is. I'm _guessing_ that your complaint is that the
Hello,
I turned on highlighting and some records do not have highlight text (See image
below):
[cid:image001.png@01D02358.A0E23D60]
Does anyone know why this is happening and how I can fix it?
Here is the querystring I am using
"&wt=json&json.wrf=?&indent=true&hl=true&hl.fl=title,content&hl
On Fri, Dec 26, 2014 at 12:26 PM, Erick Erickson
wrote:
> I don't know the complete algorithm, but if the number of docs that
> satisfy the fq is "small enough",
> then just the internal Lucene doc IDs are stored rather than a bitset.
If smaller than maxDoc/64 ids are collected, a sorted int set
Hi folks,
I'm trying to get a feel of how large Solr can grow without slowing down
too much. We're looking into a use-case with up to 100 billion documents
(SolrCloud), and we're a little afraid that we'll end up requiring 100
servers to pull it off.
The largest index we currently have is ~2
On 12/23/2014 04:07 PM, Toke Eskildsen wrote:
The beauty of the cursor is that it is has little to no overhead, relative to a
standard top-X sorted search. A standard search uses a sliding window over the
full result set, as does a cursor-search. Same amount of work. It is just a
question of l
Thanks all.
I've the same index with a bit different schema and 200M documents,
installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size
of index is about 1.5TB, have many updates every 5 minutes, complex queries
and faceting with response time of 100ms that is acceptable for us.
31 matches
Mail list logo