from:"Jie Sun"

shard query with duplicated documents cause inaccuate paginating

2014-04-29 Thread Jie Sun

When we have duplicated documents (same uniqueID) among the shards, the query results could be non-deterministic, this is an known issue. The consequence when we display the search results on our UI page with paginating is: if user click the 'last page', it could display an empty page since the to

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2013-07-15 Thread Jie Sun

Yandong, have you figured out if it works for you to use one collection per customer? We have the similar use-case as yours: customer id's are used as core names. that was the reason our company did not upgrade to solrcould ... I might remember it wrong but I vaguely remember I looked into using

solr 3.5 core rename issue

2013-04-16 Thread Jie Sun

We just tried to use .../solr/admin/cores?action=RENAME&core=core0&other=core5 to rename a core 'old' to 'new'. After the request is done, the solr.xml has new core name, and the solr admin shows the new core name in the list. But the index dir still has the old name as the directory name. I loo

Re: solr 3.5 core rename issue

2013-04-16 Thread Jie Sun

Hi Shawn, I do have persistent="true" in my solr.xml: ... the command I ran was to rename from '413' to '413a'. when i debug through solr CoreAdminHandler, I notice the persistent flag only controls if the new data will be persisted to solr.xml or not, thus as you can se

Re: solr 3.5 core rename issue

2013-04-17 Thread Jie Sun

thanks Shawn for filing the issue. by the way my solrconfig.xml has: ${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name} For now I will have to shutdown solr and write a script to modify the solr.xml manually and rename the core data directory to new one. by the way when I try to re

Re: solr 3.5 core rename issue

2013-04-18 Thread Jie Sun

yeah I realize using ${solr.core.name} for dataDir must be the cause for the issue we see... it is fair to say the SWAP and RENAME just create an alias that still points to the old datadir. if they can not fix it then it is not a bug :-) at least we understand exactly what is going on there. than

shard query return 500 on large data set

2013-04-18 Thread Jie Sun

Hi - when I execute a shard query like: [myhost]:8080/solr/mycore/select?q=type:message&rows=14&...&qt=standard&wt=standard&explainOther=&hl.fl=&shards=solrserver1:8080/solr/mycore,solrserver2:8080/solr/mycore,solrserver3:8080/solr/mycore everything works fine until I query against a large

RE: numFound changes on changing start and rows

2013-05-08 Thread Jie Sun

any update on this? will this be addressed/fixed? in our system, our UI will allow user to paginate through search results. As my in deep test find out, if the rows=0, the results size is consistently the total sum of the documents on all shards regardless there is any duplicates; if the rows

RE: numFound changes on changing start and rows

2013-05-08 Thread Jie Sun

ok when my head is cooled down, I remember this old school issue... that I have been dealing with it myself. so I do not expect this can be straighten out or fixed in anyways. basically when you have to sorted results sets you need to merge, and paginate through, it is never an easy job (if all i

Re: rename a core to same name of existing core

2013-05-13 Thread Jie Sun

did any one verified the following is ture? > the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is: > > *quote* > If a core with the same name exists, while the "new" created core is > initalizing, the "old" one will continue to accept requests. Once it > has finished, all new request

Re: rename a core to same name of existing core

2013-05-13 Thread Jie Sun

thanks for the information, you are right, I was using the same instance dir. I agree with you, I would like to see an error is I am creating a core with the name of existing core name. right now I have to do ping first, and analyze if the returned code is 404 or not. Jie -- View this message

programmatically get dataDir setting from solrconfig.xml

2012-11-28 Thread Jie Sun

I am trying to get the value of 'dataDir' that was set in solrconfig.xml. other thank query solr with http://[host]:8080/solr/default/admin/file/?contentType=text/xml;charset=utf-8&file=solrconfig.xml and parse the dataDir element using some xml parser, then resolve all possible environment vari

suggestion howto handle highly repetitive valued field

2012-12-11 Thread Jie Sun

Hi - our indexed documents currently store solr fields like 'digest' or 'type', which most of our documents will end up with same value (such as 'sha1' for field 'digest', or 'message' for field 'type' etc). on each solr server, we usually have 100 of millions of documents indexed and with the sam

Re: suggestion howto handle highly repetitive valued field

2012-12-11 Thread Jie Sun

thank you David! -- View this message in context: http://lucene.472066.n3.nabble.com/suggestion-howto-handle-highly-repetitive-valued-field-tp4026104p4026163.html Sent from the Solr - User mailing list archive at Nabble.com.

how to understand this benchmark test results (compare index size after schema change)

2012-12-12 Thread Jie Sun

I cleaned up the solr schema by change a small portion of the stored fields to stored="false". out for 5000 document (about 500M total size of original documents), I ran a benchmark comparing the solr index size between the schema before/after the clean up. first time run it showed about 40% redu

if I only need exact search, does frequency/score matter?

2012-12-13 Thread Jie Sun

this is related to my previous post where I did not get any feedback yet... I am going through a practice to reduce the disk usage by solr index files. first step I took was to move some fields from stored to not stored. this reduced the size of .fdt by 30-60%. very promising... however I notice

Re: if I only need exact search, does frequency/score matter?

2012-12-15 Thread Jie Sun

thanks for the information... I did come across that discussion, I guess I will try to write a customized Similarity class and disable tf. I hope this is not totally odd to do ... I do notice about 10GB .frq file size in cores that have total 10-30GB .fdt files. I wish the benchmark will show me

Re: how to understand this benchmark test results (compare index size after schema change)

2012-12-17 Thread Jie Sun

thanks Erik ... I did run optimize on both indices to get ride of the deleted data when compare to each other. (and my benchmark tests were just indexing 5000 new documents without duplicates...into a new core... but I did optimize just to make sure). I think one results is consistent that the .f

Re: if I only need exact search, does frequency/score matter?

2012-12-17 Thread Jie Sun

thanks, this is very helpful -- View this message in context: http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4027559.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: if I only need exact search, does frequency/score matter?

2012-12-17 Thread Jie Sun

Hi Otis, do you think I should customize both tf and idf to disable the term frequency? i.e. something like: public float tf(float freq) { return freq > 0 ? 1.0f : 0.0f; } public float idf(int docFreq, int numDocs) { return docFreq > 0 ? 1.0f : 0.0f; } t

Re: if I only need exact search, does frequency/score matter?

2012-12-19 Thread Jie Sun

Hi Otis, I customized the Similarity class and add it through the end of schema.xml: ... ... and mypackage.NoTfSimilarity.java is like: public class NoTfSimilarity extends DefaultSimilarity { public float tf(float freq) { return freq > 0 ? 1.0f : 0.0f; } public flo

Re: if I only need exact search, does frequency/score matter?

2012-12-19 Thread Jie Sun

Hi Otis, here is the debug output on the query... seems all tf and idf indeed return 1.0f as I customized... I did not overwrite queryNorm or weight etc... see below. but the bottom line is that if my purpose is to reduce the frq file size, customize similarity seems wont help on that. I guess th

POST query with non-ASCII to solr using httpclient wont work

2013-01-12 Thread Jie Sun

When I use HttpClient and its PostMethod to post a query with some Chinese, solr fails returning any record, or return everything. ... ... method = new PostMethod(solrReq); method.getParams().setContentCharset("UTF-8"); method.setRequestHeader("Conten

Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-12 Thread Jie Sun

:-) Otis, I also looked at solrJ source code, seems exactly what I am doing here... but I probably will do what you suggested ... thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957p4032973.html Se

Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-14 Thread Jie Sun

unfortunately solrj is not an option here... we will have to make a quick fix with a patch out in production. I am still unable to make the solr (3.5) take url encoded query. again passing non-urlencoded query string works with non-ASIIC (Chinese), but fails return anything when sending request wi

queryResultWindowSize vs rows

2012-10-05 Thread Jie Sun

what will happen if in my query I specify a greater number for rows than the queryResultWindowSize in my solrconfig.xml for example, if queryResultWindowSize=100, but I need process a batch query from solr with rows=1000 each time and vary the start move on... what will happen? if I do not turn o

Re: queryResultWindowSize vs rows

2012-10-07 Thread Jie Sun

any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/queryResultWindowSize-vs-rows-tp401p4012336.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: queryResultWindowSize vs rows

2012-10-07 Thread Jie Sun

Hi Erik, no I dont have any evidence, just a precaution question. So according to your explanation, this cache only keep the document ID, so if client paying to next group of document in the window, there will be another query to solr server to retrieve these docs, correct? ok that is good to kno

CheckIndex question

2012-10-17 Thread Jie Sun

Hi - with a corrupted core, 1. if I run CheckIndex with -fix, it will drop the hook to the corrupted segment, but the segment files are still there, when we have a lot of corrupted segments, we have to manually pick them out and remove them, is there a way the tool can suffix them or prefix them

[/solr] memory leak prevent tomcat shutdown

2012-10-19 Thread Jie Sun

very often when we try to shutdown tomcat, we got following error in catalina.out indicating a solr thread can not be stopped, the tomcat results hanging, we have to kill -9, which we think lead to some core corruptions in our production environment. please help ... catalina.out: ... ... Oct 19,

Re: [/solr] memory leak prevent tomcat shutdown

2012-10-19 Thread Jie Sun

by the way, I am running tomcat 6, solr 3.5 on redhat 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4014792.html Sent from the Solr - User ma

Re: [/solr] memory leak prevent tomcat shutdown

2012-10-19 Thread Jie Sun

found a solr/lucene bug : TimeLimitingCollector starts thread in static {} with no way to stop them https://issues.apache.org/jira/browse/LUCENE-2822 is this the same issue? it is fixed in Luence 3.5. but I am using solr3.5 with lucene 2.9.3 (matched lucene version). can anyone shed some light

Re: [/solr] memory leak prevent tomcat shutdown

2012-10-22 Thread Jie Sun

any input on this? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4015265.html Sent from the Solr - User mailing list archive at Nabble.com.

solr replication against active indexing on master

2012-11-01 Thread Jie Sun

I have a question about the solr replication (master/slaves). when index activities are on going on master, when slave send in file list command to get a version (actually to my understand a snapshot of the time) of all files and their size/timestamp etc. then slaves will decide which files need

Re: solr replication against active indexing on master

2012-11-01 Thread Jie Sun

thanks ... could you please point me to some more detailed explanation on line or I will have to read the code to find out? I would like to understand a little more on how this is achieved. thanks! Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-agains

Re: solr replication against active indexing on master

2012-11-01 Thread Jie Sun

thanks... I just read the related code ... now I understand it seems the master keeps replicable snapshots (version), so it should be static. thank you Otis! -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-against-active-indexing-on-master-tp4017696p4017743.

load balance with SolrCloud

2012-11-05 Thread Jie Sun

we are using solr 3.5 in production and we deal with customers data of terabytes. we are using shards for large customers and write our own replica management in our software. Now with the rapid growth of data, we are looking into solrcloud for its robustness of sharding and replications. I unde

Re: load balance with SolrCloud

2012-11-06 Thread Jie Sun

thanks for your feedback Erick. I am also aware of the current limitation of shard number in a collection is fixed. changing the number will need re-config and re-index. Let's say if the limitation gets levitated in near future release, I would then consider setup collection for each customer, whi

38 matches

Mail list logo