Re: SolrCloud DIH issue

2015-09-20 Thread Upayavira
It is worth noting that the ref guide page on configsets refers to non-cloud mode (a useful new feature) whereas people may confuse this with configsets in cloud mode, which use Zookeeper. Upayavira On Sun, Sep 20, 2015, at 04:59 AM, Ravi Solr wrote: > Cant thank you enough for clarifying it at

Re: SolrCloud DIH issue

2015-09-20 Thread Ravi Solr
Yes Upayavira, that's exactly what prompted me to ask Erick as soon as I read https://cwiki.apache.org/confluence/display/solr/Config+Sets Erick, Regarding my delta-import not working I do see the dataimport.properties in zookeeper. after I "upconfig" and "linkconfig" my conf files into ZK...see b

Questions regarding indexing JSON data

2015-09-20 Thread Kevin Vasko
I am new to Apache Solr and have been struggling with indexing some JSON files. I have several TB of twitter data in JSON format that I am having trouble posting/indexing. I am trying to use a schemaless schema so I don't have to add 200+ records fields manually. 1. The first issue is none of

Re: Does more shards in core improve performance?

2015-09-20 Thread Zheng Lin Edwin Yeo
I didn't find any increase in indexing throughput by adding shards in the same machine. However, I've managed to feed the index to Solr from more than one thread at a time. It can take up to 3 threads without affecting the indexing speed. Anything more than that, the CPU will hit 100%, and the ind

Cost of using group.cache.percent parameters in Result Grouping

2015-09-20 Thread Zheng Lin Edwin Yeo
Hi, I've been trying to improve the speed of my Result Grouping, and I've found that by setting the parameter group.cache.percent to 100 actually does improve the speed, especially for the longer query string. But I would like to find out is whether is there any cost in doing so? Like in terms of

How can I get a monotonically increasing field value for docs?

2015-09-20 Thread Gili Nachum
I've implemented a custom solr2solr ongoing unidirectional replication mechanism. A Replicator (acting as solrJ client), crawls documents from SolrCloud1 and writes them to SolrCloud2 in batches. The replicator crawl logic is to read documents with a time greater/equale to the time of the last rep