Re: Solr use with Cloudera HDFS failed creating directory
does anyone able to sort this one out ? im hitting same error is there a way to fix this by copying right version of jars. I tried copying older version of jar in solr lib but get same error. Solr: 4.6.1 Hadoop: 2.0.0..CDH -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-use-with-Cloudera-HDFS-failed-creating-directory-tp4109143p4123082.html Sent from the Solr - User mailing list archive at Nabble.com.
Intercept updates and cascade loading of Index.
I would like to know is there a way to intercept or callback whenever new documents get added to Collection My use case is I would have one Collection split across shards and whenever new document arrives in Collection I would like to get callback with document itself so that I can take some of the field from document being added and pass it on to other COllection. Kind of cascading addition , reason being we would like to create lightweight index(2-3 fields) for autosuggest which is only used for autosuggest and replicated across all webservers where our app is running and for search we would direct to other collection as our search is not as latency sensitive as our autosuggest. any pointer would be helpful. -- View this message in context: http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Intercept updates and cascade loading of Index.
Thanks for reply Does that mean I need to compile SOLR with my custom request processor ? or there is way to extend existing framework and plugin new implementation Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833p4115951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Intercept updates and cascade loading of Index.
Thanks for insights. This helps indeed, however im not sure how do i get delta on commit. I guess I need to do some custom query to get what has changed since last update or sort of like that. I would experiment around that, if anyone does that please share. -- View this message in context: http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833p4116010.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud how to spread out to multiple nodes
since amount of data we would be indexing would increase over period of time (read 100-200G and more) we would like to use SOlrCloud. Now I have been reading posts and wikipages plus trying things on my own to test. to simplify i would create a collection with n number of shards where n=lets say 10. I would start everything on single machine to start with, however as my index grow I would like to spread out those Shards to multiple machine. My question is how do I spread shards (one collection) from one machine to multiple machine. It would be great help if some one can provide me steps to test this. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-how-to-spread-out-to-multiple-nodes-tp4116326.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud how to spread out to multiple nodes
Thanks, Im going to give this a try -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-how-to-spread-out-to-multiple-nodes-tp4116326p4117728.html Sent from the Solr - User mailing list archive at Nabble.com.
Grouping performance improvement
Im facing slow performance for query where im grouping on a field while querying. Size of index 57 million records, and we would be targeting 100 million + Im using grouping to create category based autosuggest. so when user press "a" I go and search for "a" and group by field say products. Now i have noticed performance of query is really get bad with group by clause. Im at experimental stage so I can change schema or try other alternative. Please let me know if there are way to cleverly design your schema to improve performance or im meeting some option to fine tune. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Grouping performance improvement
Thanks Alexey for giving some really good points. Just to make sure I get it right Are you suggesting 1. do facets on category first lets say I get 10 distinct category 2. do another query where q=search query and fq= facet category values May be im missing something, however Im not sure how to get factes along with lets say 5 documents under each facet value. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549p4118844.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Grouping performance improvement
Ok so I cannot move forward with this, If I use format like q=a&fq=category:(value1 value2 value3) this gives me results with first category. What i want is top n results per filter category and I dont want to use group as performance seems to be very bad for groups my observation is group query doesn't use cache -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549p4120093.html Sent from the Solr - User mailing list archive at Nabble.com.
Group query not cached in SOLR
I noticed group queries are not getting cached in SOLR, is that normal. I would like to enable caching if possible, any quick pointers would be helpful. -- View this message in context: http://lucene.472066.n3.nabble.com/Group-query-not-cached-in-SOLR-tp4120159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Group query not cached in SOLR
Any pointer in this will be helpful, is there a way to avoid using group by queries and achieve similar results or way to enable caching for group by queries -- View this message in context: http://lucene.472066.n3.nabble.com/Group-query-not-cached-in-SOLR-tp4120159p4120547.html Sent from the Solr - User mailing list archive at Nabble.com.
Schema Design:Multiple document vs less number of document with large number of values in multivalue field
Hi, I would like to know when designing index which approach is better, Approach-1 Large number of documents (100 Million +) with 5-10 values per document for one Multi value field Approach-2 Less number of documents with 50-100 values per document for one multi value field. Right now I have approach-1 where I have around 100 million documents and some group by queries really perform slow. I have freedom to reduce number of docs and group multiple documents into one and put identifier into multi value field. As I always search from multivalue field -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Design-Multiple-document-vs-less-number-of-document-with-large-number-of-values-in-multivalued-tp4121093.html Sent from the Solr - User mailing list archive at Nabble.com.