What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Floyd Wu
Hi, I have many XML Message file formatted like this https://wiki.apache.org/solr/UpdateXmlMessages These files are generated by my index builder daily. Currently I am sending these file through http post to Solr but sometimes I hit OOM exception or pending too many tlog. Do you have better way t

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Alexandre Rafalovitch
When are you doing commit? You can issue one manually, have one with timeout parameter (commitWithin), or you can configure it to happen automatically (in solrconfig.xml). Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating y

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Floyd Wu
Thank you Alex. I'm doing commit every 100 fiels. Maybe there is a better way to do this job, something like DIH(possible?) Sometimes i have much bigger xml file (2MB) and post to SOLR(jetty enabled) may encounter slow or exceed limitation. Floyd 2014-06-15 16:48 GMT+08:00 Alexandre Rafalovitch

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Mikhail Khludnev
Hello Floyd, Did you consider to disable tlog? Does a file consist of many docs? Do you have SolrCloud? Do you use just sh/curl or have a java program? DIH is not really performant so far. Submitting roughly ten huge files in parallel is a way to perform good. Once again, nuke tlog. On Sun, Jun

Cursor deep paging new behavior

2014-06-15 Thread Eyal Zaidman
Hi, I have a quick question about this new implementation - in the old implementation AFAIK, in a real-time indexing scenario, the results gathered from paging would not be consecutive. Meaning you would ask for 50 docs, new docs arrive, when you ask for the next 50 docs - you get an arbitrary

Re: SOLR Cloud Rebuild core

2014-06-15 Thread Shawn Heisey
On 6/14/2014 1:29 PM, Branham, Jeremy [HR] wrote: > We are looking to move from legacy master/slave configuration to the cloud > configuration. > > In the past we have handled rebuilding cores by using a 'live' core and a > core for performing the rebuild on. > When a rebuild is complete, we swa

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Erick Erickson
A couple of things: > Consider indexing them with SolrJ, here's a place to get started: > http://searchhub.org/2012/02/14/indexing-with-solrj/. Especially if you use a > SAX-based parser you have more control over memory consumption, it's on the > client after all. And, you can rack together as

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Shawn Heisey
On 6/15/2014 2:54 AM, Floyd Wu wrote: > Thank you Alex. > I'm doing commit every 100 fiels. > Maybe there is a better way to do this job, something like DIH(possible?) > Sometimes i have much bigger xml file (2MB) and post to SOLR(jetty enabled) > may encounter slow or exceed limitation. If you ar

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Floyd Wu
Hi Mikhail, What is the pros. to disable tlog? Each of my xml file contained to doc, one is main content and the other is acl. Currently I'm not using SolrCloud due to my poor understanding of this architecture and pros/cons. The main system is developed using .Net C# so using SolrJ won't be a solu

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Floyd Wu
Hi Erick, Thanks for your advice. autoCommit is configured 30 sec in my environment. i'm using C# to develop main system and Solr as a service, so using SolrJ would consider as impossible(for now). I;m seeking the better way to directly input(import) the offline generated XML to build index. Curren

Re: What is the best approach to send lots of XML Messages to Solr to build index?

2014-06-15 Thread Floyd Wu
Hi Shawn, I've tried to set 4GB heap for Solr and the OOM exception rellay get reduce and also performance gained. Floyd 2014-06-16 0:00 GMT+08:00 Shawn Heisey : > On 6/15/2014 2:54 AM, Floyd Wu wrote: > > Thank you Alex. > > I'm doing commit every 100 fiels. > > Maybe there is a better way to

Re: Implementing Hive query in Solr

2014-06-15 Thread Vivekanand Ittigi
Hi Erick, We are actually comparing the speed of search. We are trying to run this few hive queries in solr. We are if we can implement this in solr definitely we can migrate our system into solr. Can you please look at this issue also http://stackoverflow.com/questions/24202798/sum-and-groupby-i