Re: Indexing PDF

Michael McCandless Wed, 05 Oct 2011 05:37:25 -0700

Hmm, no attachment; maybe it's too large?

Can you send it directly to me?


Mike McCandless

http://blog.mikemccandless.com

2011/10/5 Héctor Trujillo <[email protected]>:
> This is the file that give me errors.
>
> 2011/10/5 Michael McCandless <[email protected]>
>>
>> Can you attach this PDF to an email & send to the list?  Or is it too
>> large for that?
>>
>> Or, you can try running Tika directly on the PDF to see if it's able
>> to extract the text.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> 2011/10/5 Héctor Trujillo <[email protected]>:
>> > Sorry you have the reason, this file was indexed with a .Net web service
>> > client, that calls a Java application(a web service) that calls Solr
>> > using
>> > SolrJ.
>> >
>> > I will try to index this in a different way, may be this resolve the
>> > problem.
>> >
>> > Thanks
>> >
>> > Best regards
>> >
>> >
>> >
>> > El 5 de octubre de 2011 08:42, Héctor Trujillo
>> > <[email protected]>escribió:
>> >
>> >>   It seems unreasonable that if I want to index a local file, I have to
>> >> references this local file by an URL.
>> >>
>> >> This isn't a estrange file, this is a file downloaded from lucid web
>> >> portal
>> >> called: Starting a Search Application.pdf
>> >>
>> >> This problem may be a codification problem, or char set problem. I open
>> >> this file with a PDF Reader and I have no problems, and I don’t Know
>> >> why
>> >> referencing this file with and URL will fix this problem, can you help
>> >> me?
>> >>
>> >> I'm working with SolrJ, from Java, does some have the same problem with
>> >> SolrJ?
>> >>
>> >>
>> >>
>> >> Thanks to Paul Libbrecht, for your option.
>> >>
>> >>
>> >>
>> >> Best regards
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> 2011/10/4 Paul Libbrecht <[email protected]>
>> >>
>> >>> full of boxes for me.
>> >>> Héctor, you need another way to reference these!
>> >>> (e.g. a URL)
>> >>>
>> >>> paul
>> >>>
>> >>>
>> >>> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit :
>> >>>
>> >>> > Hi all, I'm indexing pdf's files with SolrJ, and most of them work.
>> >>> > But
>> >>> with
>> >>> > some files I’ve got problems because they stored estrange
>> >>> > characters. I
>> >>> got
>> >>> > stored this content:
>> >>> > +++++++
>> >>> >
>> >>> > Starting a Search Application
>> >>> >
>> >>>
>> >>> 
>> >>> > Abstract
>> >>> >
>> >>>
>> >>> Starting
>> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009
>> >>> > Page
>> >>> i
>> >>> >
>> >>>
>> >>> 
>> >>> > Starting a Search Application A Lucid Imagination White Paper ¥
>> >>> > April
>> >>> 2009
>> >>> > Page ii Do You Need Full-text Search?
>> >>> >
>> >>>
>> >>> ∞
>> >>> >
>> >>>
>> >>> ∞
>> >>> > ∞
>> >>> >
>> >>>
>> >>> Starting
>> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009
>> >>> > Page
>> >>> 1
>> >>> >
>> >>>
>> >>> Identifying
>> >>> > Ideal Results
>> >>> >
>> >>>
>> >>> Starting
>> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009
>> >>> > Page
>> >>> 2
>> >>> >
>> >>>
>> >>> Starting
>> >>> > a Search Application A Lucid Imagination White Paper
>> >>> >
>> >>> >
>> >>> > +++++++
>> >>> >
>> >>> > But if I open the pdf file I have no problem to see the content
>> >>> correctly.
>> >>> >
>> >>> > I think this is a question of the charset encoding, but I don't know
>> >>> > if
>> >>> I
>> >>> > can avoid this behaviour with a different analyzer o tokenizer to be
>> >>> applied
>> >>> > in indexing time, may be.
>> >>> >
>> >>> > I've got this problem with some documents downloaded from Lucid's
>> >>> > Web.
>> >>> >
>> >>> >
>> >>> >
>> >>> > I don't know if some have had the same problem and know how to solve
>> >>> this.
>> >>> >
>> >>> > Thanks
>> >>> >
>> >>> > Best regards
>> >>>
>> >>>
>> >>
>> >
>
>

Re: Indexing PDF

Reply via email to