How to do parallel indexing on files (not on HDFS)

2018-05-20 Thread Raymond Xie
I know how to do indexing on file system like single file or folder, but how do I do that in a parallel way? The data I need to index is of huge volume and can't be put on HDFS. Thank you ** *Sincerely yours,* *Raymond*

Re: about solr reduce shard nums

2018-05-20 Thread Erick Erickson
Simplest would be to host multiple shards on the same machine. Use ADDREPLICA/DELETEREPLICA (collections API calls) to move the replicas hosted on the nodes you want to use for another purpose and, when all replicas are moved you can repurpose those machines. Another option would be to create a _n

Re: Index filename while indexing JSON file

2018-05-20 Thread Raymond Xie
would you consider to include the filename as another meta data fields for being indexed? I think your downstream python can do that easily. ** *Sincerely yours,* *Raymond* On Fri, May 18, 2018 at 3:47 PM, S.Ashwath wrote: > Hello, > > I have 2

Re: Caching Solr Grouping Results

2018-05-20 Thread Yasufumi Mizoguchi
Hi, I know few about groping component, but I think it is very hard. Because query result cache has {query and conditions} -> {DocList} structure. ( https://github.com/apache/lucene-solr/blob/e30264b31400a147507aabd121b1152020b8aa6d/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#

Re: Caching Solr Grouping Results

2018-05-20 Thread rubi.hali
Hi Yasufumi Thanks for the reply. Yes, you are correct. I also checked the code and it seems the same. We are facing performance issues due to grouping so wanted to be sure that we are not leaving out any possibility of caching the same in Query Result Cache. was just exploring field collapsing