Another progress report. I 'flattened' all the folders which contained the
pdf files with Fileboss and then moved the pdf files to the directory where
I found the post.jar file (in solr-4.2.1\solr-4.2.1\example\exampledocs). I
then ran "java -Ddata=files -jar post.jar *.pdf" and in the command window
it seemed to be working fine (these are just academic articles in pdf-format
that I downloaded with ZOtyero from EBSCO):
04/10/2013  12:20 AM           159,224 Vorontsov - 2012 - The Korea- Russia
Gas
Pipeline Project Past, Pres.pdf
04/10/2013  12:12 AM         3,885,056 Walker - 2012 - Asia competes for
energy
security.pdf
04/10/2013  12:45 AM            66,195 Whitmill - 2012 - Is UK Energy Policy
Dri
ving Energy Innovation - or.pdf
04/10/2013  12:29 AM         2,208,367 Wietfeld - 2011 - Understanding
Middle Ea
st Gas Exporting Behavior.pdf
04/10/2013  12:59 AM         3,011,185 Wiseman - 2011 - Expanding Regional
Renew
able Governance.pdf
04/10/2013  12:38 AM           180,692 Woudhuysen - 2012 - Innovation in
Energy
Expressions of a Crisis, and.pdf
04/10/2013  12:49 AM           229,991 Yergin - 2012 - How Is Energy
Remaking th
e World.pdf
04/10/2013  12:40 AM         3,397,328 Young - 2012 - Industrial Gases.
(cover s
tory).pdf
04/10/2013  01:36 AM            73,125 Zimmerer - 2011 - New Geographies of
Ener
gy Introduction to the Spe.pdf
... and so on, all together some 300 articles.

But then when I looked in solr, I saw the following:
04:34:41
SEVERE
SolrCore
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
char #10,​ byte #-1)
04:34:41
SEVERE
SolrCore
org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
char #10,​ byte #-1)

... and a lot more of those.

I'd like to think I made SOME progress, but it also seems like I'm still not
close to being there. Any suggestions from the experts here on what I am
doing wrong? 

Thanks!

-Stephan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4054920.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to