On Tue, 2015-11-03 at 11:09 +0530, Modassar Ather wrote:
> It is around 90GB of index (around 8 million documents) on one shard and
> there are 12 such shards. As per my understanding the sharding is required
> for this case. Please help me understand if it is not required.
Except for an internal
Hi
When I make atomic update - set field - also on content field and also another
field, the language field became generic. Meaning, it doesn’t work in the set
field, only in the first inserting. Even if in the first time the language was
detected, it just became generic after the update.
Any id
One rule of thumb for Solr is to shard after you reach 100 million documents.
With large documents, you might want to shard sooner.
We are running an unsharded index of 7 million documents (55GB) without
problems.
The EdgeNgramFilter generates a set of prefix terms for each term in the
documen
The information is not sufficient to say something. You can refer to solr
log to find the reason of log replay.
You can also check if the index is as per expectation. E.g Number of
document indexed.
Regards,
Modassar
On Tue, Nov 3, 2015 at 11:11 AM, Midas A wrote:
> Thanks Modassar for replying
Thanks Modassar for replying ,
could u please elaborate ..what wuld have happened when we were getting
this kind of warning ds
Regards,
Abhishek Tiwari
On Mon, Nov 2, 2015 at 6:00 PM, Modassar Ather
wrote:
> Normally tlog is replayed in case if solr server crashes for some reason
> and when r
Thanks Walter for your response,
It is around 90GB of index (around 8 million documents) on one shard and
there are 12 such shards. As per my understanding the sharding is required
for this case. Please help me understand if it is not required.
We have requirements where we need full wild card su
I just had a thought that perhaps Complex Phrase parser could be useful here:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
You still need to mark that full name to search against specific
field, so it may or may not in a more general stream o
Thanks Erick, that did it. I had thought the -z option was only for external
zookeepers. Using port 9983 allowed me to upload a config.
--
View this message in context:
http://lucene.472066.n3.nabble.com/creating-collection-with-solr5-missing-config-data-tp4237802p4237811.html
Sent from the Sol
If you've recently downloaded Solr 5.x and trying to figure out what
example creates a home where and why the example creation command uses
configset directory but not configset URL parameter, you may find this
useful:
http://blog.outerthoughts.com/2015/11/oh-solr-home-where-art-thou/
Regards,
The "new way" of doing things is to use the start
scripts, which is outlined at the start of the page I linked below.
You probably want to bite the bullet and get used to that
way of doing things, as it's likely going to be where ongoing
work is done.
If you still want to approach it the way you a
Without more data, I'd guess one of two things:
1> you're seeing stop-the-world GC pauses that cause Zookeeper to
think the node is unresponsive, which puts a node into recovery and
things go bad from there.
2> Somewhere in your solr logs you'll see OutOfMemory errors which can
also cascade a bun
I'm trying to plan a migration from a standalone solr instance to the
solrcloud. I understand the basic steps but am getting tripped up just
trying to create a new collection. For simplicity, I'm testing this on a
single machine, so I was trying to use the embedded zookeeper. I can't
figure out how
Hey there,
we are running a SolrCloud, which has 4 nodes, same config. Each node
has 8gb memory, 6GB assigned to the JVM. This is maybe too much, but
worked for a long time.
We currently run with 2 shards, 2 replicas and 11 collections. The
complete data-dir is about 5.3 GB.
I think we should mov
Let's say we're trying to do document to document matching (not with
MLT). We have a shingling analysis chain. The query is a document, which
is itself shingled. We then look up those shingles in the index. The %
of shingles found is in some sense a marker as to the extent to which
the documents ar
NP. I've occasionally taken to changing to another window and
refreshing the contributor page, seems to come back lots faster than
waiting which is very weird.
On Mon, Nov 2, 2015 at 9:01 AM, Steve Rowe wrote:
> Yes, sorry, the wiki took so long to come back after changing it to include
> Alex’s
Or a really simple--minded approach, just use the frequency
as a ration of numFound to estimate terms.
Doesn't work of course if you need precise counts.
On Mon, Nov 2, 2015 at 9:50 AM, Doug Turnbull
wrote:
> How precise do you need to be?
>
> I wonder if you could efficiently approximate "numbe
How precise do you need to be?
I wonder if you could efficiently approximate "number of matches" by
getting the document frequency of each term. I realize this is an
approximation, but the highest document frequency would be your floor.
Let's say you have terms t1, t2, and t3 ... tn. t1 has highe
I have a scenario where I want to search for documents that contain many
terms (maybe 100s or 1000s), and then know the number of terms that
matched. I'm happy to implement this as a query object/parser.
I understand that Lucene isn't well suited to this scenario. Any
suggestions as to how to make
On 2 November 2015 at 11:30, Gora Mohanty wrote:
> As per my last
> follow-up, there is currently no way to have DIH automatically pick up
> different data-config files without manually editing the DIH
> configuration each time.
I missed previous discussions, but the DIH config file is given in a
Yes, sorry, the wiki took so long to come back after changing it to include
Alex’s username that I forgot to send notification… Thanks Erick.
> On Oct 31, 2015, at 11:27 PM, Erick Erickson wrote:
>
> Looks like Steve added you today, you should be all set.
>
> On Sat, Oct 31, 2015 at 12:50 P
To back up a bit, how many documents are in this 90GB index? You might not need
to shard at all.
Why are you sending a query with a trailing wildcard? Are you matching the
prefix of words, for query completion? If so, look at the suggester, which is
designed to solve exactly that. Or you can us
On 2 November 2015 at 21:50, fabigol wrote:
> Hi,
> i have many files of config dataImport
> I want to start at once instead of launching DataImport for each file.
> is it possible??
Not to be antagonistic, but did you not ask this before, and have
various people not tried to help you?
With all
Hi,
i have many files of config dataImport
I want to start at once instead of launching DataImport for each file.
is it possible??
--
View this message in context:
http://lucene.472066.n3.nabble.com/Many-files-dataImport-in-same-project-tp4237731.html
Sent from the Solr - User mailing list arc
I think (not tested) that it should be safe to select Tomcat from the
dropdown, as both use keytool (bundled with JDK) to generate the CSR.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 2 November 2015 at 09:53, davidphilip
The input for the title field is user based so a wide range of things can
be entered there. Quoting the title is not what I'm looking for. I also
checked and q.op is AND and MM is 100%. In addition to the Title field the
user can also use general keywords so setting local params (df) to
somethin
On Mon, Nov 2, 2015 at 1:38 PM, fabigol wrote:
> Thank
> All works.
> I have 2 last questions:
> How can i put 0 by defaults " clean" during a indexation?
>
> To conclure, i wand to understand:
>
>
> Requests: 7 (1/s), Fetched: 452447 (45245/s), Skipped: 0, Processed: 17433
> (1743/s)
>
> What is
The doc[1] on reference guide provides steps related to setting up ssl with
self signed certificate. My employer wants me to set up and test with CA
signed certificate.
When I go to buy[2] a ssl certificate(just for testing), it asks for
specific web server name and jetty is not listed on it.
Is t
For the second question, i try:
false
and
true
false
in solrConfig.xml but without success?
--
View this message in context:
http://lucene.472066.n3.nabble.com/org-apache-solr-common-Sol
Hi solr fans,
Are there ways to affect on strategy
behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ?
As it seems, at the moment, the rewrite method loads max N words that
maximize term score. How can this be changed to loading top terms by
frequency, for example?
--
Dmitry Kan
On Mon, 2015-11-02 at 14:17 +0100, Toke Eskildsen wrote:
> http://rosalind:52300/solr/collection1/select?q=%22der+se*%
> 22&wt=json&indent=true&facet=false&group=true&group.field=domain
>
> gets expanded to
>
> parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" |
> author:kan svane*
Thank
All works.
I have 2 last questions:
How can i put 0 by defaults " clean" during a indexation?
To conclure, i wand to understand:
Requests: 7 (1/s), Fetched: 452447 (45245/s), Skipped: 0, Processed: 17433
(1743/s)
What is the "requests"?
What is 'Fetched"?
What is "Processed"?
Thank aga
On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote:
> The query q=network se* is quick enough in our system too. It takes
> around 3-4 seconds for around 8 million records.
>
> The problem is with the same query as phrase. q="network se*".
I misunderstood your query then. I tried replicatin
Hi Shruti,
If you are looking to index images to make them searchable (Image Search)
then you will have to look at LIRE (Lucene Image Retrieval)
http://www.lire-project.net/ and can follow Lire Solr Plugin at this site
https://bitbucket.org/dermotte/liresolr.
Thanks,
Susheel
On Sat, Oct 31, 201
Well it seems that doing q="network se*" is working but not in the way you
expect. Doing this q="network se*" would not trigger a prefix query and the
"*" character would be treated as any character. I suspect that your query
is in fact "network se" (assuming you're using a StandardTokenizer) and
t
The problem is with the same query as phrase. q="network se*".
The last . is fullstops for the sentence and the query is q=field:"network
se*"
Best,
Modassar
On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi wrote:
> Oups I did not read the thread carrefully.
> *The problem is with the same query a
Oups I did not read the thread carrefully.
*The problem is with the same query as phrase. q="network se*".*
I was not aware that you could do that with Solr ;). I would say this is
expected because in such case if the number of expansions for "se*" is big
then you would have to check the positions
*I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. And the
remaining heap will be used for activities other than Solr. Please help me
understand.*
Well those 28GB of heap are the memory "reserved" for your
Normally tlog is replayed in case if solr server crashes for some reason
and when restarted it tries to recover from the crash gracefully.
You can look into following documentation which explains about transaction
logs and related stuff of Solr.
http://lucidworks.com/blog/2013/08/23/understanding-
I monitored swap activities for the query using vmstat. The *so* and *si*
shows 0 till the completion of query. Also the top showed 0 against swap.
This means there was no scarcity of physical memory. Swap activity seems
not to be a bottleneck.
Kindly note that this I ran on 8 node cluster with 30
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
excessive garbage collection. You could turn GC-logging on to check
that. With a bit of luck GC would be the cause of the slow down.
Yes it is with top command
On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote:
> The remaining size after you removed the heap usage should be reserved for
> the index (not only the other system activities).
> I am not able to get the above point. So when I start Solr with 28g RAM,
> for all the activities related to S
On Mon, 2015-11-02 at 14:34 +0530, Modassar Ather wrote:
> No! This is a single big machine with 12 shards on it.
> Around 370 gb on the single machine.
Okay. I guess your observation of 400% for a single core is with top and
looking at that core's entry? If so, the 400% can be explained by
exces
Thanks Jim for your response.
The remaining size after you removed the heap usage should be reserved for
the index (not only the other system activities).
I am not able to get the above point. So when I start Solr with 28g RAM,
for all the activities related to Solr it should not go beyond 28g. A
*if it correlates with the bad performance you're seeing. One important
thing to notice is that a significant part of your index needs to be in RAM
(especially if you're using SSDs) in order to achieve good performance.*
Especially if you're not using SSDs, sorry ;)
2015-11-02 11:38 GMT+01:00 jim
12 shards with 28GB for the heap and 90GB for each index means that you
need at least 336GB for the heap (assuming you're using all of it which may
be easily the case considering the way the GC is handling memory) and ~=
1TO for the index. Let's say that you don't need your entire index in RAM,
the
Just to add one more point that one external Zookeeper instance is also
running on this particular machine.
Regards,
Modassar
On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather
wrote:
> Hi Toke,
> Thanks for your response. My comments in-line.
>
> That is 12 machines, running a shard each?
> No! Th
Hi Toke,
Thanks for your response. My comments in-line.
That is 12 machines, running a shard each?
No! This is a single big machine with 12 shards on it.
What is the total amount of physical memory on each machine?
Around 370 gb on the single machine.
Well, se* probably expands to a great deal o
On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote:
> I have a setup of 12 shard cluster started with 28gb memory each on a
> single server. There are no replica. The size of index is around 90gb on
> each shard. The Solr version is 5.2.1.
That is 12 machines, running a shard each?
What is t
48 matches
Mail list logo