shard query with duplicated documents cause inaccuate paginating
When we have duplicated documents (same uniqueID) among the shards, the query results could be non-deterministic, this is an known issue. The consequence when we display the search results on our UI page with paginating is: if user click the 'last page', it could display an empty page since the total doc count returned by the query with dups is not accurate (includes dups apparently). Is there a known work around for this problem? We tried the following 2 approaches but each of them problem: 1) use a query like: curl -d "q=*:*&fl=message_id&rows=1&start=1999" http://[hostname]:8080/mywebapp/shards/[coreid]/select? Since I am using a very large number for the 'rows', it will return the accurate doc count, but it takes about 20 second to run this query on an average customer with a little over 1 million rows returned, so the performance is not acceptable. 2) use facet query: curl -d "q=*:*&fl=message_id&facet=true&facet.mincount=2&rows=0&facet.field=message_id&indent=on" http://[hostname]:8080/[mywebapp]/shards/[coreid]/select? the test shows this might not return accurate doc counts from time to time. any suggestions what is the best work around to get an accurate doc count with sharded query with dups, and efficient when run with large data set? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/shard-query-with-duplicated-documents-cause-inaccuate-paginating-tp4133666.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: how to index documents into a specific core and how to search against that core?
Yandong, have you figured out if it works for you to use one collection per customer? We have the similar use-case as yours: customer id's are used as core names. that was the reason our company did not upgrade to solrcould ... I might remember it wrong but I vaguely remember I looked into using collection for each customer, and it seems the number of collections as current release are fixes, aren't they? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-tp3985262p4078210.html Sent from the Solr - User mailing list archive at Nabble.com.
solr 3.5 core rename issue
We just tried to use .../solr/admin/cores?action=RENAME&core=core0&other=core5 to rename a core 'old' to 'new'. After the request is done, the solr.xml has new core name, and the solr admin shows the new core name in the list. But the index dir still has the old name as the directory name. I looked into solr 3.5 code, this is what the code does. However, if I bounce tomcat/solr, when solr is started up, it creates new index dir with 'new', and now of course there is no longer any document returned if you search the core. is this a bug? or did I miss anything? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 3.5 core rename issue
Hi Shawn, I do have persistent="true" in my solr.xml: ... the command I ran was to rename from '413' to '413a'. when i debug through solr CoreAdminHandler, I notice the persistent flag only controls if the new data will be persisted to solr.xml or not, thus as you can see, it did changed my solr.xml, there is no problem here. But the index dir ends up with no change at all (still '413'). I guess swap will have similar issue, I bet your 's0_0' directory actually hold data for core s0build, and s0_1 holds data for s0live after you swap them. Because I dont see anywhere in CoreAdminHandler and CoreContainer code actually rename the index directory. I might be wrong, but you can test and find out. Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425p4056435.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 3.5 core rename issue
thanks Shawn for filing the issue. by the way my solrconfig.xml has: ${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name} For now I will have to shutdown solr and write a script to modify the solr.xml manually and rename the core data directory to new one. by the way when I try to remove a core using unload (I am using solr 3.5): .../solr/admin/cores?action=UNLOAD&core=4130&deleteIndex=true it removes the core from solr.xml, but it leaves the data directory '413', but the index subfolder under 413 is removed, however there are spellchecker1 and spellchecker2 still remain. Do you know why? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425p4056865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 3.5 core rename issue
yeah I realize using ${solr.core.name} for dataDir must be the cause for the issue we see... it is fair to say the SWAP and RENAME just create an alias that still points to the old datadir. if they can not fix it then it is not a bug :-) at least we understand exactly what is going on there. thanks so much for your help! Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425p4057037.html Sent from the Solr - User mailing list archive at Nabble.com.
shard query return 500 on large data set
Hi - when I execute a shard query like: [myhost]:8080/solr/mycore/select?q=type:message&rows=14&...&qt=standard&wt=standard&explainOther=&hl.fl=&shards=solrserver1:8080/solr/mycore,solrserver2:8080/solr/mycore,solrserver3:8080/solr/mycore everything works fine until I query against a large set of data (> 100k documents), when the number of rows returned exceeds about 50k. by the way I am using HttpClient GET method to send the solr shard query over. In the above scenario, the query fails with a 500 server error as returned status code. I am using solr 3.5. I encountered a 404 before, when one of the shard servers does not have the core (404) the whole shard query will return 404 to me; so I expect if one of the server encounter a timeout (408?), the shard query should return time out status code? I guess I am not sure what will be the shard query results with various error scenario... guess i could look into solr code, but if you have any input, it will be appreciated. thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/shard-query-return-500-on-large-data-set-tp4057038.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: numFound changes on changing start and rows
any update on this? will this be addressed/fixed? in our system, our UI will allow user to paginate through search results. As my in deep test find out, if the rows=0, the results size is consistently the total sum of the documents on all shards regardless there is any duplicates; if the rows is a number larger than the supposedly returned the merge document number, the result numFound is accurate and consistent, however, if the rows is with a number smaller than the supposedly merge results size, it will be non-deterministic. unfortunately, in our system, it is not easy to work around this problem. we have to issue and query whenever use click on Next button, and the rows is 20 in our case and in most of the cases it is smaller than the merged results size, so we get a different number each time. If we do rows=0 up in front, it wont work either, since we want the accurate number and others may have indexed new documents at the same time. Especially when user hit the last page, sometimes we see the numFound off by hundreds, this wont work. please advice. thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/numFound-changes-on-changing-start-and-rows-tp3999752p4061628.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: numFound changes on changing start and rows
ok when my head is cooled down, I remember this old school issue... that I have been dealing with it myself. so I do not expect this can be straighten out or fixed in anyways. basically when you have to sorted results sets you need to merge, and paginate through, it is never an easy job (if all is possible) to figure out what is exactly the number if you only require a portion of the results being returned. for example if 1 set has 40,000 rows returned, the other set has 50,000 returned, and you want the start=440 and rows=20 (paginate on UI), the typical algorithm will be sort both sets and return the near portion of both sets, toss away the duplicates in that range (20 rows), so even you calcualte with the duplicates prior to that start point, you have no way to tell how many duplicates after that point, so you really do not know for fact the exact / accurate numFound, unless you require return the whole thing. and that is why when I give a huge rows number, it will give me the accurate count each time. However, solr shard query will throw 500 server error if the returned set is around 50k, which is reasonable. So find work around in the context is the only solution. Check with google search pattern, may get some fuzzy idea :-) thanks jie -- View this message in context: http://lucene.472066.n3.nabble.com/numFound-changes-on-changing-start-and-rows-tp3999752p4061633.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: rename a core to same name of existing core
did any one verified the following is ture? > the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is: > > *quote* > If a core with the same name exists, while the "new" created core is > initalizing, the "old" one will continue to accept requests. Once it > has finished, all new request will go to the "new" core, and the "old" > core will be unloaded. > */quote* step1 - I have a core 'abc' with 30 documents in it: http://myhost.com:8080/solr/abc/select/?q=type%3Amessage&version=2.2&start=0&rows=10&indent=on 10 step2 - then I create a new core with same name 'abc': http://myhost.com:8080/solr/admin/cores?action=create&name=abc&instanceDir=./ 0303abc/mxl/var/solr/solr.xml step3 - I cleared out my browser cache step4 - I did same query as in step1, got same results (30 documents): http://myhost.com:8080/solr/abc/select/?q=type%3Amessage&version=2.2&start=0&rows=10&indent=on 10 I thought the old core should be unloaded? did I misunderstand any thing here? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/rename-a-core-to-same-name-of-existing-core-tp3090960p4063008.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: rename a core to same name of existing core
thanks for the information, you are right, I was using the same instance dir. I agree with you, I would like to see an error is I am creating a core with the name of existing core name. right now I have to do ping first, and analyze if the returned code is 404 or not. Jie -- View this message in context: http://lucene.472066.n3.nabble.com/rename-a-core-to-same-name-of-existing-core-tp3090960p4063047.html Sent from the Solr - User mailing list archive at Nabble.com.
programmatically get dataDir setting from solrconfig.xml
I am trying to get the value of 'dataDir' that was set in solrconfig.xml. other thank query solr with http://[host]:8080/solr/default/admin/file/?contentType=text/xml;charset=utf-8&file=solrconfig.xml and parse the dataDir element using some xml parser, then resolve all possible environment variables and system properties (essentially same thing solr core manager does) to get the value in my java program... is there an admin URL or java API I can use to just get a setting defined in solrconfig.xml? eventually what I am trying to do is find the size of the index of a core. I am trying to reconstruct the path to the core and do a 'du' on the file system. so the second question is: is there a better way to do this? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/programmatically-get-dataDir-setting-from-solrconfig-xml-tp4023108.html Sent from the Solr - User mailing list archive at Nabble.com.
suggestion howto handle highly repetitive valued field
Hi - our indexed documents currently store solr fields like 'digest' or 'type', which most of our documents will end up with same value (such as 'sha1' for field 'digest', or 'message' for field 'type' etc). on each solr server, we usually have 100 of millions of documents indexed and with the same value on these fields (fields are stored and indexed). any suggestion what is the best approach if we suspect this will be very inefficient on disk space usage, or is it? thanks! Jie -- View this message in context: http://lucene.472066.n3.nabble.com/suggestion-howto-handle-highly-repetitive-valued-field-tp4026104.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: suggestion howto handle highly repetitive valued field
thank you David! -- View this message in context: http://lucene.472066.n3.nabble.com/suggestion-howto-handle-highly-repetitive-valued-field-tp4026104p4026163.html Sent from the Solr - User mailing list archive at Nabble.com.
how to understand this benchmark test results (compare index size after schema change)
I cleaned up the solr schema by change a small portion of the stored fields to stored="false". out for 5000 document (about 500M total size of original documents), I ran a benchmark comparing the solr index size between the schema before/after the clean up. first time run it showed about 40% reduction of index size (using old schema the index size is 52M, using new schema the index size is 30M). However, the second time I added another 5000 documents (similar data but different documents) to the index. This time for the total of 10,000 documents, index size using old schema is 57M, but the index size using new schema grows to 54M. How should I explain what I see, could it be possible the second group of 5000 documents have very different data size on the fields that is changed to be not stored? or is it because Solr/Lucene's index strategy or implementation will have smaller differences on the size of index when the number of documents grows? any input will be appreciated. thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-understand-this-benchmark-test-results-compare-index-size-after-schema-change-tp4026674.html Sent from the Solr - User mailing list archive at Nabble.com.
if I only need exact search, does frequency/score matter?
this is related to my previous post where I did not get any feedback yet... I am going through a practice to reduce the disk usage by solr index files. first step I took was to move some fields from stored to not stored. this reduced the size of .fdt by 30-60%. very promising... however I notice the .frq are taking almost as much disk space as the .fdt files. It seems .frq keeps the term frequency information. In our application, we only care about exact search (legal purpose), we do not care about search results in relevance (by score) at all. does this mean I can omit the freq? is it feasible in solr to turn the frequency off? I do need phrase search so I will have to keep the .prx which is also the huge files similar to .fdt files. Any suggestions or inputs? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: if I only need exact search, does frequency/score matter?
thanks for the information... I did come across that discussion, I guess I will try to write a customized Similarity class and disable tf. I hope this is not totally odd to do ... I do notice about 10GB .frq file size in cores that have total 10-30GB .fdt files. I wish the benchmark will show me enough disk usage reduction that worth this. if in future we are to bring back the relevant search, I believe we will have to re-index everything... thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4027327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to understand this benchmark test results (compare index size after schema change)
thanks Erik ... I did run optimize on both indices to get ride of the deleted data when compare to each other. (and my benchmark tests were just indexing 5000 new documents without duplicates...into a new core... but I did optimize just to make sure). I think one results is consistent that the .fdt/.fdx files are reduced by 30-60% after the stored= changes. So that is very promising results for my purpose. I am trying to get rid of the .frq (which is the 3rd largest seg files in my production), I have some discussion in another topic about this. thanks! Jie -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-understand-this-benchmark-test-results-compare-index-size-after-schema-change-tp4026674p4027544.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: if I only need exact search, does frequency/score matter?
thanks, this is very helpful -- View this message in context: http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4027559.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: if I only need exact search, does frequency/score matter?
Hi Otis, do you think I should customize both tf and idf to disable the term frequency? i.e. something like: public float tf(float freq) { return freq > 0 ? 1.0f : 0.0f; } public float idf(int docFreq, int numDocs) { return docFreq > 0 ? 1.0f : 0.0f; } thanks! Jie -- View this message in context: http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4027578.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: if I only need exact search, does frequency/score matter?
Hi Otis, I customized the Similarity class and add it through the end of schema.xml: ... ... and mypackage.NoTfSimilarity.java is like: public class NoTfSimilarity extends DefaultSimilarity { public float tf(float freq) { return freq > 0 ? 1.0f : 0.0f; } public float idf(int docFreq, int numDocs) { return docFreq > 0 ? 1.0f : 0.0f; } } I deploy the class in .../tomcat/webapps/solr/WEB-INF/classes/mypackage/NoTfSimilarity.class restart tomcat. I ran the benchmark with indexing same set of data, comparing to results previous to the change, the .frq files size remain the same. Also the query still shows the scores being calculated: ... ... 0.8838835 ... ... any idea what I am missing here? seems it is not using my customized similarity class. thanks jie -- View this message in context: http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4028125.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: if I only need exact search, does frequency/score matter?
Hi Otis, here is the debug output on the query... seems all tf and idf indeed return 1.0f as I customized... I did not overwrite queryNorm or weight etc... see below. but the bottom line is that if my purpose is to reduce the frq file size, customize similarity seems wont help on that. I guess the term frequency is still stored no matter what the similarity algorithm is, correct? thanks Jie type:message AND subject_eng:Resourcestype:message AND subject_eng:Resources+type:message +subject_eng:resources+type:message +subject_eng:resources 0.92807764 = (MATCH) sum of: 0.70710677 = (MATCH) weight(type:message in 596), product of: 0.70710677 = queryWeight(type:message), product of: 1.0 = idf(docFreq=10247, maxDocs=10247) 0.70710677 = queryNorm 1.0 = (MATCH) fieldWeight(type:message in 596), product of: 1.0 = tf(termFreq(type:message)=1) 1.0 = idf(docFreq=10247, maxDocs=10247) 1.0 = fieldNorm(field=type, doc=596) 0.22097087 = (MATCH) weight(subject_eng:resources in 596), product of: 0.70710677 = queryWeight(subject_eng:resources), product of: 1.0 = idf(docFreq=20, maxDocs=10247) 0.70710677 = queryNorm 0.3125 = (MATCH) fieldWeight(subject_eng:resources in 596), product of: 1.0 = tf(termFreq(subject_eng:resources)=1) 1.0 = idf(docFreq=20, maxDocs=10247) 0.3125 = fieldNorm(field=subject_eng, doc=596) LuceneQParser2.01.00.00.00.00.00.00.00.00.00.00.00.00.00.0 -- View this message in context: http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4028131.html Sent from the Solr - User mailing list archive at Nabble.com.
POST query with non-ASCII to solr using httpclient wont work
When I use HttpClient and its PostMethod to post a query with some Chinese, solr fails returning any record, or return everything. ... ... method = new PostMethod(solrReq); method.getParams().setContentCharset("UTF-8"); method.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); ... ... I used tcp dump and found out the query my application above sent is an urlencoded query string to solr (see the "q=xxx" part): ../SPOST /solr/413/select HTTP/1.1 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Accept: */* User-Agent: Jakarta Commons-HttpClient/3.1 Host: 172.20.73.142:8080 Content-Length: 192 q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+&hl.fl=&qt=standard&wt=standard&rows=20 17:09:55.592527 IP xxx> yyy.webcache: tcp 0 ... ... I found this urlencoding is what causing solr query failing. I found this by copying the above urlencoded query to a file and use curl command, then I got same error, but if I replace the above query with decoded string, then it works with solr: curl -v -H 'Content-type:application/x-www-form-urlencoded; charset=utf-8' http://localhost:8080/solr/413/select --data @/tmp/chinese_query when /tmp/chinese_query has following it works with solr: q=type:message+AND+customer_id:413+AND+subject_zhs:能力+&hl.fl=&qt=standard&wt=standard&rows=20 But if I switched the /tmp/chinese_query to use urlencoded string, it fails again with same error: q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+&hl.fl=&qt=standard&wt=standard&rows=20 So, my conclusion: 1) solr (I am using 3.5) only accept decoded query string, it fails with url encoded query 2) httpclient will send out urlencoded string no matter what (there is no way seems to me to make it sends out request in POST without urlencoding the body). am I missing something, or do you have any suggestion what I am doing wrong? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: POST query with non-ASCII to solr using httpclient wont work
:-) Otis, I also looked at solrJ source code, seems exactly what I am doing here... but I probably will do what you suggested ... thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957p4032973.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: POST query with non-ASCII to solr using httpclient wont work
unfortunately solrj is not an option here... we will have to make a quick fix with a patch out in production. I am still unable to make the solr (3.5) take url encoded query. again passing non-urlencoded query string works with non-ASIIC (Chinese), but fails return anything when sending request with urlencoded + Chinese. any suggestion? thanks jie -- View this message in context: http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957p4033262.html Sent from the Solr - User mailing list archive at Nabble.com.
queryResultWindowSize vs rows
what will happen if in my query I specify a greater number for rows than the queryResultWindowSize in my solrconfig.xml for example, if queryResultWindowSize=100, but I need process a batch query from solr with rows=1000 each time and vary the start move on... what will happen? if I do not turn off the searchResultCache, I look into your code a bit and it seems to me it will get supersetMaxDoc = ((maxDocRequested -1)/queryResultWindowSize + 1)*queryResultWindowSize and 'supersetMaxDoc' number of doc will be cached? I hope it does not, otherwise we should turn off the cache and sacrifice the performance with UI paging. thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/queryResultWindowSize-vs-rows-tp401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: queryResultWindowSize vs rows
any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/queryResultWindowSize-vs-rows-tp401p4012336.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: queryResultWindowSize vs rows
Hi Erik, no I dont have any evidence, just a precaution question. So according to your explanation, this cache only keep the document ID, so if client paying to next group of document in the window, there will be another query to solr server to retrieve these docs, correct? ok that is good to know, because in production we are sharing the same solr server for both UI search and the batch query I mentioned in original question. thanks! Jie -- View this message in context: http://lucene.472066.n3.nabble.com/queryResultWindowSize-vs-rows-tp401p4012340.html Sent from the Solr - User mailing list archive at Nabble.com.
CheckIndex question
Hi - with a corrupted core, 1. if I run CheckIndex with -fix, it will drop the hook to the corrupted segment, but the segment files are still there, when we have a lot of corrupted segments, we have to manually pick them out and remove them, is there a way the tool can suffix them or prefix them so it is easier to be cleaned out? 2. we know the doc count in the corrupted segment, is it easy also output the doc id on those docs? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/CheckIndex-question-tp4014366.html Sent from the Solr - User mailing list archive at Nabble.com.
[/solr] memory leak prevent tomcat shutdown
very often when we try to shutdown tomcat, we got following error in catalina.out indicating a solr thread can not be stopped, the tomcat results hanging, we have to kill -9, which we think lead to some core corruptions in our production environment. please help ... catalina.out: ... ... Oct 19, 2012 10:17:22 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads SEVERE: The web application [/solr] appears to have started a thread named [pool-69-thread-1] but has failed to stop it. This is very likely to create a memory leak. Then I used kill -3 to signal the thread dump, here is what I get (note the thread [pool-69-thread-1] is hanging) : 2012-10-19 10:18:39 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.2-b06 mixed mode): "DestroyJavaVM" prio=10 tid=0x55b39800 nid=0x7e82 waiting on condition [0x] java.lang.Thread.State: RUNNABLE "pool-69-thread-1" prio=10 tid=0x2aaabcb41800 nid=0x19fa waiting on condition [0x4205e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0006de699d80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source) at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "JDWP Transport Listener: dt_socket" daemon prio=10 tid=0x578aa000 nid=0x19f9 runnable [0x] java.lang.Thread.State: RUNNABLE ... ... -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [/solr] memory leak prevent tomcat shutdown
by the way, I am running tomcat 6, solr 3.5 on redhat 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4014792.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [/solr] memory leak prevent tomcat shutdown
found a solr/lucene bug : TimeLimitingCollector starts thread in static {} with no way to stop them https://issues.apache.org/jira/browse/LUCENE-2822 is this the same issue? it is fixed in Luence 3.5. but I am using solr3.5 with lucene 2.9.3 (matched lucene version). can anyone shed some light on if this means I need to upgrade to lucene 3.5? thanks jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4014833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [/solr] memory leak prevent tomcat shutdown
any input on this? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4015265.html Sent from the Solr - User mailing list archive at Nabble.com.
solr replication against active indexing on master
I have a question about the solr replication (master/slaves). when index activities are on going on master, when slave send in file list command to get a version (actually to my understand a snapshot of the time) of all files and their size/timestamp etc. then slaves will decide which files need to be polled and send in another request. if the master has on going activities of indexing, especially if commit just happened during 2 slave commands (file list and poll), then we will fail, correct? how is this working correctly? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-against-active-indexing-on-master-tp4017696.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication against active indexing on master
thanks ... could you please point me to some more detailed explanation on line or I will have to read the code to find out? I would like to understand a little more on how this is achieved. thanks! Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-against-active-indexing-on-master-tp4017696p4017707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication against active indexing on master
thanks... I just read the related code ... now I understand it seems the master keeps replicable snapshots (version), so it should be static. thank you Otis! -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-against-active-indexing-on-master-tp4017696p4017743.html Sent from the Solr - User mailing list archive at Nabble.com.
load balance with SolrCloud
we are using solr 3.5 in production and we deal with customers data of terabytes. we are using shards for large customers and write our own replica management in our software. Now with the rapid growth of data, we are looking into solrcloud for its robustness of sharding and replications. I understand by read some documents on line that there is no SPOF using solrcloud, so any instance in the cluster can server the query/index. However, is it true that we need to write our own load balancer in front of solrCloud? For example if we want to implement a model similar to Loggly, i.e. each customer start indexing into the small shard of its own, then if any of the customers grow more than the small shard's limit, we switch to index into another small shard (we call it front end shard), meanwhile merge the just released small shard to next level larger shard. Since the merge can happen between two instances on different servers, we probably end up with synch the index files for the merging shards and then use solr merge. I am curious if there is anything solr provide to help on these kind of strategy dealing with unevenly grow big customer data (a core)? or do we have to write these in our software layer from scratch? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: load balance with SolrCloud
thanks for your feedback Erick. I am also aware of the current limitation of shard number in a collection is fixed. changing the number will need re-config and re-index. Let's say if the limitation gets levitated in near future release, I would then consider setup collection for each customer, which will include varies number of shards and their replicas (depend on the customer size and it should grow dynamically). so this will lead to having multiple collections on one solr server instance... I assume setup n collections on one server is not an issue? or is it? I am skeptical, see example on solr wiki below, it seems it is starting a solr instance with a specific collection and its config: cd example java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018659.html Sent from the Solr - User mailing list archive at Nabble.com.