Another progress report. I 'flattened' all the folders which contained the pdf files with Fileboss and then moved the pdf files to the directory where I found the post.jar file (in solr-4.2.1\solr-4.2.1\example\exampledocs). I then ran "java -Ddata=files -jar post.jar *.pdf" and in the command window it seemed to be working fine (these are just academic articles in pdf-format that I downloaded with ZOtyero from EBSCO): 04/10/2013 12:20 AM 159,224 Vorontsov - 2012 - The Korea- Russia Gas Pipeline Project Past, Pres.pdf 04/10/2013 12:12 AM 3,885,056 Walker - 2012 - Asia competes for energy security.pdf 04/10/2013 12:45 AM 66,195 Whitmill - 2012 - Is UK Energy Policy Dri ving Energy Innovation - or.pdf 04/10/2013 12:29 AM 2,208,367 Wietfeld - 2011 - Understanding Middle Ea st Gas Exporting Behavior.pdf 04/10/2013 12:59 AM 3,011,185 Wiseman - 2011 - Expanding Regional Renew able Governance.pdf 04/10/2013 12:38 AM 180,692 Woudhuysen - 2012 - Innovation in Energy Expressions of a Crisis, and.pdf 04/10/2013 12:49 AM 229,991 Yergin - 2012 - How Is Energy Remaking th e World.pdf 04/10/2013 12:40 AM 3,397,328 Young - 2012 - Industrial Gases. (cover s tory).pdf 04/10/2013 01:36 AM 73,125 Zimmerer - 2011 - New Geographies of Ener gy Introduction to the Spe.pdf ... and so on, all together some 300 articles.
But then when I looked in solr, I saw the following: 04:34:41 SEVERE SolrCore org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1) 04:34:41 SEVERE SolrCore org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1) ... and a lot more of those. I'd like to think I made SOME progress, but it also seems like I'm still not close to being there. Any suggestions from the experts here on what I am doing wrong? Thanks! -Stephan -- View this message in context: http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054920.html Sent from the Solr - User mailing list archive at Nabble.com.