Re: Problem running Solr indexing in Amazon EMR

2013-08-13 Thread Michael Della Bitta
If you do end up figuring it out, would you mind letting me know? Right now, our solution is to use an older version of SolrJ, but that means we miss out on some of the improvements/bugfixes around aliases. Thanks, Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7

Re: Problem running Solr indexing in Amazon EMR

2013-08-12 Thread Dmitriy Shvadskiy
Michael, We replaced Lucene jars but run into a problem with incompatible version of Apache HttpComponents. Still figuring it out. Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084121.html Sent from the Solr

Re: Problem running Solr indexing in Amazon EMR

2013-08-12 Thread Michael Della Bitta
hi Dmitriy, Just out of curiosity, have you tried replacing the Lucene jars with a bootstrap action? Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions

Re: Problem running Solr indexing in Amazon EMR

2013-08-12 Thread Dmitriy Shvadskiy
Michael, Amazon Hadoop distribution has Lucene 2.9.4 jars in /lib directory and they conflict with Solr 4.4 we are using. Once we pass that problem we run into conflict with Apache HttpComponents you describe. I think the best bet would be for us to build our own AMI to avoid these dependencies.

Re: Problem running Solr indexing in Amazon EMR

2013-08-12 Thread Michael Della Bitta
Dmitriy, I don't believe that EMR does include Solr or Lucene in their EMR AMIs. But there was a recent AMI update that ruined some things for us. Have you tried using an older AMI? One headache for us has been that the EMR AMI uses an older version of Apache HttpComponents than that of Solr 4.3,

Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Dmitriy Shvadskiy
Erick, It actually suppose to be just one version of Solr that is bundled with our map/reduce jar. To be clear: Map/Reduce job is generating a new index, not reading an existing one. But it fails even before as an instance of EmbeddedSolrServer is created at the first line of the following code.

Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Erick Erickson
Have you checked the luceneMatchVersion in all your solrconfig.xml files? I'm guessing it't set to 40 somewhere in the process as evidenced by the line: org.apache.lucene.codecs.lucene40.Lucene40FieldInfosFormat.( Lucene40FieldInfosFormat.java:99) so it looks like somehow a Lucene 4.0 codec is bein

Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Dmitriy Shvadskiy
Erick, Thank you for the reply. Cloudera image includes Solr 4.3. I'm not sure what version Amazon EMR includes. We are not directly referencing or using their version of Solr but instead build our jar against Solr 4.4 and include all dependencies in our jar file. Also error occurs not while read

Re: Problem running Solr indexing in Amazon EMR

2013-08-11 Thread Erick Erickson
What version of Solr is Cloudera's CDH built on? Looks to me like the Solr you're using to read the M/R produced index is different than the one used to build it. Or the version specified in the Solr configs, evidenced by the LUCENE40 in the error message. See in solrconfig.xml. But probably a be