Help indexing PDF files
Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo
Re: Help indexing PDF files
I am using this page, but in my downloaded version there is no site directory. Thanks 2010/5/7 Markus Jelsma > Hi, > > > > > > The wiki page [1] on this subject will get you started. > > > > [1]: http://wiki.apache.org/solr/ExtractingRequestHandler > > > > > > Cheers > > -Original message- > From: Leonardo Azize Martins > Sent: Fri 07-05-2010 15:37 > To: solr-user@lucene.apache.org; > Subject: Help indexing PDF files > > Hi, > > I am new in Solr. > I would like to index some PDF files. > > How can I do using example schema from 1.4.0 version? > > Regards, > Leo >
Re: Help indexing PDF files
I had Solr in machine A. In machine B I run the command below: curl "http://10.33.19.201:8983/solr/update/extract?&extractOnly=true"; --data-binary @VPSX_V1_R10.pdf and I get the response: java.lang.IllegalStateException: Form too large What I and doing wrong? Is it the right or best way to send PDF files to be indexed? Regards, Leo 2010/5/7 caman > > Take a look at Tika library > > > > From: Leonardo Azize Martins [via Lucene] > [mailto:ml-node+783677-325080270-124...@n3.nabble.com > ] > Sent: Friday, May 07, 2010 6:37 AM > To: caman > Subject: Help indexing PDF files > > > > Hi, > > I am new in Solr. > I would like to index some PDF files. > > How can I do using example schema from 1.4.0 version? > > Regards, > Leo > > > > _ > > View message @ > > http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h > tml > To start a new topic under Solr - User, email > ml-node+472068-464289649-124...@n3.nabble.com > To unsubscribe from Solr - User, click > < (link removed) > GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx> here. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Help indexing PDF files
Hi, Sorry, I am newbie. Using these two commands it works. curl " http://10.33.19.201:8983/solr/update/extract?stream.file=C:\\temp\\VPSX_V1_R10.pdf&stream.contentType=application/pdf&literal.id=M4968\\C$\\temp\\VPSX_V1_R10.pdf&commit=true " curl ' http://10.33.19.201:8983/solr/update/extract?literal.id=doc1&commit=true' -F "te...@vpsx_v1_r10.pdf" Thanks for all help. Going ahead, what is the best choice to index a windows share? Using stream.file or not? Indexing all files all times or verifying if a file was changes and if so, index it? Regards, Leo 2010/5/7 Leonardo Azize Martins > I had Solr in machine A. > > In machine B I run the command below: > curl "http://10.33.19.201:8983/solr/update/extract?&extractOnly=true"; > --data-binary @VPSX_V1_R10.pdf > > and I get the response: > java.lang.IllegalStateException: Form too large > > What I and doing wrong? > Is it the right or best way to send PDF files to be indexed? > > Regards, > Leo > > > > 2010/5/7 caman > > >> Take a look at Tika library >> >> >> >> From: Leonardo Azize Martins [via Lucene] >> [mailto:ml-node+783677-325080270-124...@n3.nabble.com >> ] >> Sent: Friday, May 07, 2010 6:37 AM >> To: caman >> Subject: Help indexing PDF files >> >> >> >> Hi, >> >> I am new in Solr. >> I would like to index some PDF files. >> >> How can I do using example schema from 1.4.0 version? >> >> Regards, >> Leo >> >> >> >> _ >> >> View message @ >> >> http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h >> tml<http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.html> >> To start a new topic under Solr - User, email >> ml-node+472068-464289649-124...@n3.nabble.com >> To unsubscribe from Solr - User, click >> < (link removed) >> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx> here. >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > >