Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Tue, 2015-11-03 at 11:09 +0530, Modassar Ather wrote: > It is around 90GB of index (around 8 million documents) on one shard and > there are 12 such shards. As per my understanding the sharding is required > for this case. Please help me understand if it is not required. Except for an internal

RE: language plugin

2015-11-02 Thread Chaushu, Shani
Hi When I make atomic update - set field - also on content field and also another field, the language field became generic. Meaning, it doesn’t work in the set field, only in the first inserting. Even if in the first time the language was detected, it just became generic after the update. Any id

Re: Very high memory and CPU utilization.

2015-11-02 Thread Walter Underwood
One rule of thumb for Solr is to shard after you reach 100 million documents. With large documents, you might want to shard sooner. We are running an unsharded index of 7 million documents (55GB) without problems. The EdgeNgramFilter generates a set of prefix terms for each term in the documen

Re: warning

2015-11-02 Thread Modassar Ather
The information is not sufficient to say something. You can refer to solr log to find the reason of log replay. You can also check if the index is as per expectation. E.g Number of document indexed. Regards, Modassar On Tue, Nov 3, 2015 at 11:11 AM, Midas A wrote: > Thanks Modassar for replying

Re: warning

2015-11-02 Thread Midas A
Thanks Modassar for replying , could u please elaborate ..what wuld have happened when we were getting this kind of warning ds Regards, Abhishek Tiwari On Mon, Nov 2, 2015 at 6:00 PM, Modassar Ather wrote: > Normally tlog is replayed in case if solr server crashes for some reason > and when r

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Thanks Walter for your response, It is around 90GB of index (around 8 million documents) on one shard and there are 12 such shards. As per my understanding the sharding is required for this case. Please help me understand if it is not required. We have requirements where we need full wild card su

Re: Kate Winslet vs Winslet Kate

2015-11-02 Thread Alexandre Rafalovitch
I just had a thought that perhaps Complex Phrase parser could be useful here: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser You still need to mark that full name to search against specific field, so it may or may not in a more general stream o

Re: creating collection with solr5 - missing config data

2015-11-02 Thread tedsolr
Thanks Erick, that did it. I had thought the -z option was only for external zookeepers. Using port 9983 allowed me to upload a config. -- View this message in context: http://lucene.472066.n3.nabble.com/creating-collection-with-solr5-missing-config-data-tp4237802p4237811.html Sent from the Sol

[ANN]: Blog article: every Solr home and example in Solr 5.3

2015-11-02 Thread Alexandre Rafalovitch
If you've recently downloaded Solr 5.x and trying to figure out what example creates a home where and why the example creation command uses configset directory but not configset URL parameter, you may find this useful: http://blog.outerthoughts.com/2015/11/oh-solr-home-where-art-thou/ Regards,

Re: creating collection with solr5 - missing config data

2015-11-02 Thread Erick Erickson
The "new way" of doing things is to use the start scripts, which is outlined at the start of the page I linked below. You probably want to bite the bullet and get used to that way of doing things, as it's likely going to be where ongoing work is done. If you still want to approach it the way you a

Re: SolrCloud breaks and does not recover

2015-11-02 Thread Erick Erickson
Without more data, I'd guess one of two things: 1> you're seeing stop-the-world GC pauses that cause Zookeeper to think the node is unresponsive, which puts a node into recovery and things go bad from there. 2> Somewhere in your solr logs you'll see OutOfMemory errors which can also cascade a bun

creating collection with solr5 - missing config data

2015-11-02 Thread tedsolr
I'm trying to plan a migration from a standalone solr instance to the solrcloud. I understand the basic steps but am getting tripped up just trying to create a new collection. For simplicity, I'm testing this on a single machine, so I was trying to use the embedded zookeeper. I can't figure out how

SolrCloud breaks and does not recover

2015-11-02 Thread Björn Häuser
Hey there, we are running a SolrCloud, which has 4 nodes, same config. Each node has 8gb memory, 6GB assigned to the JVM. This is maybe too much, but worked for a long time. We currently run with 2 shards, 2 replicas and 11 collections. The complete data-dir is about 5.3 GB. I think we should mov

Re: Queries for many terms

2015-11-02 Thread Upayavira
Let's say we're trying to do document to document matching (not with MLT). We have a shingling analysis chain. The query is a document, which is itself shingled. We then look up those shingles in the index. The % of shingles found is in some sense a marker as to the extent to which the documents ar

Re: contributor request

2015-11-02 Thread Erick Erickson
NP. I've occasionally taken to changing to another window and refreshing the contributor page, seems to come back lots faster than waiting which is very weird. On Mon, Nov 2, 2015 at 9:01 AM, Steve Rowe wrote: > Yes, sorry, the wiki took so long to come back after changing it to include > Alex’s

Re: Queries for many terms

2015-11-02 Thread Erick Erickson
Or a really simple--minded approach, just use the frequency as a ration of numFound to estimate terms. Doesn't work of course if you need precise counts. On Mon, Nov 2, 2015 at 9:50 AM, Doug Turnbull wrote: > How precise do you need to be? > > I wonder if you could efficiently approximate "numbe

Re: Queries for many terms

2015-11-02 Thread Doug Turnbull
How precise do you need to be? I wonder if you could efficiently approximate "number of matches" by getting the document frequency of each term. I realize this is an approximation, but the highest document frequency would be your floor. Let's say you have terms t1, t2, and t3 ... tn. t1 has highe

Queries for many terms

2015-11-02 Thread Upayavira
I have a scenario where I want to search for documents that contain many terms (maybe 100s or 1000s), and then know the number of terms that matched. I'm happy to implement this as a query object/parser. I understand that Lucene isn't well suited to this scenario. Any suggestions as to how to make

Re: Many files /dataImport in same project

2015-11-02 Thread Alexandre Rafalovitch
On 2 November 2015 at 11:30, Gora Mohanty wrote: > As per my last > follow-up, there is currently no way to have DIH automatically pick up > different data-config files without manually editing the DIH > configuration each time. I missed previous discussions, but the DIH config file is given in a

Re: contributor request

2015-11-02 Thread Steve Rowe
Yes, sorry, the wiki took so long to come back after changing it to include Alex’s username that I forgot to send notification… Thanks Erick. > On Oct 31, 2015, at 11:27 PM, Erick Erickson wrote: > > Looks like Steve added you today, you should be all set. > > On Sat, Oct 31, 2015 at 12:50 P

Re: Very high memory and CPU utilization.

2015-11-02 Thread Walter Underwood
To back up a bit, how many documents are in this 90GB index? You might not need to shard at all. Why are you sending a query with a trailing wildcard? Are you matching the prefix of words, for query completion? If so, look at the suggester, which is designed to solve exactly that. Or you can us

Re: Many files /dataImport in same project

2015-11-02 Thread Gora Mohanty
On 2 November 2015 at 21:50, fabigol wrote: > Hi, > i have many files of config dataImport > I want to start at once instead of launching DataImport for each file. > is it possible?? Not to be antagonistic, but did you not ask this before, and have various people not tried to help you? With all

Many files /dataImport in same project

2015-11-02 Thread fabigol
Hi, i have many files of config dataImport I want to start at once instead of launching DataImport for each file. is it possible?? -- View this message in context: http://lucene.472066.n3.nabble.com/Many-files-dataImport-in-same-project-tp4237731.html Sent from the Solr - User mailing list arc

Re: SSL on Solr with CA signed certificate

2015-11-02 Thread Alexandre Rafalovitch
I think (not tested) that it should be safe to select Tomcat from the dropdown, as both use keytool (bundled with JDK) to generate the CSR. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 2 November 2015 at 09:53, davidphilip

Re: Solr Keyword query on a specific field.

2015-11-02 Thread Aaron Gibbons
The input for the title field is user based so a wide range of things can be entered there. Quoting the title is not what I'm looking for. I also checked and q.op is AND and MM is 100%. In addition to the Title field the user can also use general keywords so setting local params (df) to somethin

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-11-02 Thread Tom Evans
On Mon, Nov 2, 2015 at 1:38 PM, fabigol wrote: > Thank > All works. > I have 2 last questions: > How can i put 0 by defaults " clean" during a indexation? > > To conclure, i wand to understand: > > > Requests: 7 (1/s), Fetched: 452447 (45245/s), Skipped: 0, Processed: 17433 > (1743/s) > > What is

SSL on Solr with CA signed certificate

2015-11-02 Thread davidphilip cherian
The doc[1] on reference guide provides steps related to setting up ssl with self signed certificate. My employer wants me to set up and test with CA signed certificate. When I go to buy[2] a ssl certificate(just for testing), it asks for specific web server name and jetty is not listed on it. Is t

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-11-02 Thread fabigol
For the second question, i try: false and true false in solrConfig.xml but without success? -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-Sol

ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

2015-11-02 Thread Dmitry Kan
Hi solr fans, Are there ways to affect on strategy behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ? As it seems, at the moment, the rewrite method loads max N words that maximize term score. How can this be changed to loading top terms by frequency, for example? -- Dmitry Kan

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 14:17 +0100, Toke Eskildsen wrote: > http://rosalind:52300/solr/collection1/select?q=%22der+se*% > 22&wt=json&indent=true&facet=false&group=true&group.field=domain > > gets expanded to > > parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" | > author:kan svane*

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-11-02 Thread fabigol
Thank All works. I have 2 last questions: How can i put 0 by defaults " clean" during a indexation? To conclure, i wand to understand: Requests: 7 (1/s), Fetched: 452447 (45245/s), Skipped: 0, Processed: 17433 (1743/s) What is the "requests"? What is 'Fetched"? What is "Processed"? Thank aga

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote: > The query q=network se* is quick enough in our system too. It takes > around 3-4 seconds for around 8 million records. > > The problem is with the same query as phrase. q="network se*". I misunderstood your query then. I tried replicatin

Re: Problem with the Content Field during Solr Indexing

2015-11-02 Thread Susheel Kumar
Hi Shruti, If you are looking to index images to make them searchable (Image Search) then you will have to look at LIRE (Lucene Image Retrieval) http://www.lire-project.net/ and can follow Lire Solr Plugin at this site https://bitbucket.org/dermotte/liresolr. Thanks, Susheel On Sat, Oct 31, 201

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
Well it seems that doing q="network se*" is working but not in the way you expect. Doing this q="network se*" would not trigger a prefix query and the "*" character would be treated as any character. I suspect that your query is in fact "network se" (assuming you're using a StandardTokenizer) and t

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
The problem is with the same query as phrase. q="network se*". The last . is fullstops for the sentence and the query is q=field:"network se*" Best, Modassar On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi wrote: > Oups I did not read the thread carrefully. > *The problem is with the same query a

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
Oups I did not read the thread carrefully. *The problem is with the same query as phrase. q="network se*".* I was not aware that you could do that with Solr ;). I would say this is expected because in such case if the number of expansions for "se*" is big then you would have to check the positions

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
*I am not able to get the above point. So when I start Solr with 28g RAM, for all the activities related to Solr it should not go beyond 28g. And the remaining heap will be used for activities other than Solr. Please help me understand.* Well those 28GB of heap are the memory "reserved" for your

Re: warning

2015-11-02 Thread Modassar Ather
Normally tlog is replayed in case if solr server crashes for some reason and when restarted it tries to recover from the crash gracefully. You can look into following documentation which explains about transaction logs and related stuff of Solr. http://lucidworks.com/blog/2013/08/23/understanding-

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
I monitored swap activities for the query using vmstat. The *so* and *si* shows 0 till the completion of query. Also the top showed 0 against swap. This means there was no scarcity of physical memory. Swap activity seems not to be a bottleneck. Kindly note that this I ran on 8 node cluster with 30

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Okay. I guess your observation of 400% for a single core is with top and looking at that core's entry? If so, the 400% can be explained by excessive garbage collection. You could turn GC-logging on to check that. With a bit of luck GC would be the cause of the slow down. Yes it is with top command

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote: > The remaining size after you removed the heap usage should be reserved for > the index (not only the other system activities). > I am not able to get the above point. So when I start Solr with 28g RAM, > for all the activities related to S

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 14:34 +0530, Modassar Ather wrote: > No! This is a single big machine with 12 shards on it. > Around 370 gb on the single machine. Okay. I guess your observation of 400% for a single core is with top and looking at that core's entry? If so, the 400% can be explained by exces

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Thanks Jim for your response. The remaining size after you removed the heap usage should be reserved for the index (not only the other system activities). I am not able to get the above point. So when I start Solr with 28g RAM, for all the activities related to Solr it should not go beyond 28g. A

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
*if it correlates with the bad performance you're seeing. One important thing to notice is that a significant part of your index needs to be in RAM (especially if you're using SSDs) in order to achieve good performance.* Especially if you're not using SSDs, sorry ;) 2015-11-02 11:38 GMT+01:00 jim

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that you need at least 336GB for the heap (assuming you're using all of it which may be easily the case considering the way the GC is handling memory) and ~= 1TO for the index. Let's say that you don't need your entire index in RAM, the

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Just to add one more point that one external Zookeeper instance is also running on this particular machine. Regards, Modassar On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather wrote: > Hi Toke, > Thanks for your response. My comments in-line. > > That is 12 machines, running a shard each? > No! Th

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Hi Toke, Thanks for your response. My comments in-line. That is 12 machines, running a shard each? No! This is a single big machine with 12 shards on it. What is the total amount of physical memory on each machine? Around 370 gb on the single machine. Well, se* probably expands to a great deal o

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote: > I have a setup of 12 shard cluster started with 28gb memory each on a > single server. There are no replica. The size of index is around 90gb on > each shard. The Solr version is 5.2.1. That is 12 machines, running a shard each? What is t