Re: Solr use with Cloudera HDFS failed creating directory

2014-03-12 Thread soodyogesh
does anyone able to sort this one out ? im hitting same error 

is there a way to fix this by copying right version of jars. I tried copying
older version of jar in solr lib but get same error.

Solr: 4.6.1
Hadoop: 2.0.0..CDH



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-use-with-Cloudera-HDFS-failed-creating-directory-tp4109143p4123082.html
Sent from the Solr - User mailing list archive at Nabble.com.


Intercept updates and cascade loading of Index.

2014-02-06 Thread soodyogesh
I would like to know is there a way to intercept or callback whenever new
documents get added to Collection

My use case is I would have one Collection split across shards and whenever
new document arrives in Collection I would like to get callback with
document itself so that I can take some of the field from document being
added and pass it on to other COllection.

Kind of cascading addition , reason being we would like to create
lightweight index(2-3 fields) for autosuggest which is only used for
autosuggest  and replicated across all webservers where our app is running

and for search we would direct to other collection as our search is not as
latency sensitive as our autosuggest.

any pointer would be helpful.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Intercept updates and cascade loading of Index.

2014-02-06 Thread soodyogesh
Thanks for reply

Does that mean I need to compile SOLR with my custom request processor ? or
there is way to extend existing framework and plugin new implementation 

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833p4115951.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Intercept updates and cascade loading of Index.

2014-02-07 Thread soodyogesh
Thanks for insights. This helps indeed, however im not sure how do i get
delta on commit. I guess I need to do some custom query  to get what has
changed since last update or sort of like that.

I would experiment around that, if anyone does that please share.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833p4116010.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud how to spread out to multiple nodes

2014-02-09 Thread soodyogesh


since amount of data we would be indexing would increase over period of time
(read 100-200G and more) we would like to use SOlrCloud.

Now I have been reading posts and wikipages plus trying things on my own to
test.

to simplify i would  create a collection with n number of shards where
n=lets say 10. I would start everything on single machine to start with,
however as my index grow I would like to spread out those Shards to multiple
machine.

My question is how do I spread shards (one collection) from one machine to
multiple machine. It would be great help if some one can provide me steps to
test this.







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-how-to-spread-out-to-multiple-nodes-tp4116326.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud how to spread out to multiple nodes

2014-02-17 Thread soodyogesh
Thanks, Im going to give this  a try



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-how-to-spread-out-to-multiple-nodes-tp4116326p4117728.html
Sent from the Solr - User mailing list archive at Nabble.com.


Grouping performance improvement

2014-02-20 Thread soodyogesh
Im facing slow performance for query where im grouping on a field while
querying.

Size of index 57 million records, and we would be targeting 100 million + 

Im using grouping to create category based autosuggest.

so when user press "a"

I go and search for "a" and group by field say products. Now i have noticed
performance of query is really get bad with group by clause.

Im at experimental stage so I can change schema or try other alternative.

Please let me know if there are way to cleverly design your schema to
improve performance  or im meeting some option to fine tune.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Grouping performance improvement

2014-02-21 Thread soodyogesh
Thanks Alexey for giving some really good points.

Just to make sure I get it right

Are you suggesting

1. do facets on category first lets say I get 10 distinct category
2. do another query where q=search query and fq= facet category values

May be im missing something, however Im not sure how to get factes along
with lets say 5 documents under each facet value.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549p4118844.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Grouping performance improvement

2014-02-27 Thread soodyogesh
Ok so I cannot move forward with this,

If I use format like q=a&fq=category:(value1 value2 value3)

this gives me results with first category.

What i want is top n results per filter category and I dont want to use
group as performance seems to be very bad for groups my observation is group
query doesn't use cache



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549p4120093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Group query not cached in SOLR

2014-02-27 Thread soodyogesh
I noticed group queries are not getting cached in SOLR, is that normal.

I would like to enable caching if possible, any quick pointers would be
helpful.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-query-not-cached-in-SOLR-tp4120159.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Group query not cached in SOLR

2014-02-28 Thread soodyogesh
Any pointer in this will be helpful, is there a way to avoid using group by
queries and achieve similar results

or way to enable caching for group by queries



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-query-not-cached-in-SOLR-tp4120159p4120547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Schema Design:Multiple document vs less number of document with large number of values in multivalue field

2014-03-04 Thread soodyogesh
Hi,

I would like to know when designing index which approach is better, 


Approach-1
Large number of documents (100 Million +) with 5-10 values per document for
one Multi value field

Approach-2
Less number of documents with 50-100 values per document for one multi value
field.


Right now I have approach-1 where I have around 100 million documents and
some group by queries really perform slow. I have freedom to reduce number
of docs and group multiple documents into one and put identifier into multi
value field. As I always search from multivalue field



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Design-Multiple-document-vs-less-number-of-document-with-large-number-of-values-in-multivalued-tp4121093.html
Sent from the Solr - User mailing list archive at Nabble.com.