Help indexing PDF files

2010-05-07 Thread Leonardo Azize Martins
Hi,

I am new in Solr.
I would like to index some PDF files.

How can I do using example schema from 1.4.0 version?

Regards,
Leo


Re: Help indexing PDF files

2010-05-07 Thread Leonardo Azize Martins
I am using this page, but in my downloaded version there is no site
directory.

Thanks

2010/5/7 Markus Jelsma 

> Hi,
>
>
>
>
>
> The wiki page [1] on this subject will get you started.
>
>
>
> [1]: http://wiki.apache.org/solr/ExtractingRequestHandler
>
>
>
>
>
> Cheers
>
> -Original message-
> From: Leonardo Azize Martins 
> Sent: Fri 07-05-2010 15:37
> To: solr-user@lucene.apache.org;
> Subject: Help indexing PDF files
>
> Hi,
>
> I am new in Solr.
> I would like to index some PDF files.
>
> How can I do using example schema from 1.4.0 version?
>
> Regards,
> Leo
>


Re: Help indexing PDF files

2010-05-07 Thread Leonardo Azize Martins
I had Solr in machine A.

In machine B I run the command below:
curl "http://10.33.19.201:8983/solr/update/extract?&extractOnly=true";
--data-binary @VPSX_V1_R10.pdf

and I get the response:
java.lang.IllegalStateException: Form too large

What I and doing wrong?
Is it the right or best way to send PDF files to be indexed?

Regards,
Leo



2010/5/7 caman 

>
> Take a look at Tika library
>
>
>
> From: Leonardo Azize Martins [via Lucene]
> [mailto:ml-node+783677-325080270-124...@n3.nabble.com
> ]
> Sent: Friday, May 07, 2010 6:37 AM
> To: caman
> Subject: Help indexing PDF files
>
>
>
>  Hi,
>
> I am new in Solr.
> I would like to index some PDF files.
>
> How can I do using example schema from 1.4.0 version?
>
> Regards,
> Leo
>
>
>
>  _
>
> View message @
>
> http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h
> tml
> To start a new topic under Solr - User, email
> ml-node+472068-464289649-124...@n3.nabble.com
> To unsubscribe from Solr - User, click
> < (link removed)
> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Help indexing PDF files

2010-05-07 Thread Leonardo Azize Martins
Hi,

Sorry, I am newbie.

Using these two commands it works.

 curl "
http://10.33.19.201:8983/solr/update/extract?stream.file=C:\\temp\\VPSX_V1_R10.pdf&stream.contentType=application/pdf&literal.id=M4968\\C$\\temp\\VPSX_V1_R10.pdf&commit=true
"

 curl '
http://10.33.19.201:8983/solr/update/extract?literal.id=doc1&commit=true' -F
"te...@vpsx_v1_r10.pdf"

Thanks for all help.



Going ahead, what is the best choice to index a windows share?
Using stream.file or not?
Indexing all files all times or verifying if a file was changes and if so,
index it?

Regards,
Leo



2010/5/7 Leonardo Azize Martins 

> I had Solr in machine A.
>
> In machine B I run the command below:
> curl "http://10.33.19.201:8983/solr/update/extract?&extractOnly=true";
> --data-binary @VPSX_V1_R10.pdf
>
> and I get the response:
> java.lang.IllegalStateException: Form too large
>
> What I and doing wrong?
> Is it the right or best way to send PDF files to be indexed?
>
> Regards,
> Leo
>
>
>
> 2010/5/7 caman 
>
>
>> Take a look at Tika library
>>
>>
>>
>> From: Leonardo Azize Martins [via Lucene]
>> [mailto:ml-node+783677-325080270-124...@n3.nabble.com
>> ]
>> Sent: Friday, May 07, 2010 6:37 AM
>> To: caman
>> Subject: Help indexing PDF files
>>
>>
>>
>>  Hi,
>>
>> I am new in Solr.
>> I would like to index some PDF files.
>>
>> How can I do using example schema from 1.4.0 version?
>>
>> Regards,
>> Leo
>>
>>
>>
>>  _
>>
>> View message @
>>
>> http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.h
>> tml<http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p783677.html>
>> To start a new topic under Solr - User, email
>> ml-node+472068-464289649-124...@n3.nabble.com
>> To unsubscribe from Solr - User, click
>> < (link removed)
>> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Help-indexing-PDF-files-tp783677p784092.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>