It seems unreasonable that if I want to index a local file, I have to references this local file by an URL.
This isn't a estrange file, this is a file downloaded from lucid web portal called: Starting a Search Application.pdf This problem may be a codification problem, or char set problem. I open this file with a PDF Reader and I have no problems, and I don’t Know why referencing this file with and URL will fix this problem, can you help me? I'm working with SolrJ, from Java, does some have the same problem with SolrJ? Thanks to Paul Libbrecht, for your option. Best regards 2011/10/4 Paul Libbrecht <p...@hoplahup.net> > full of boxes for me. > Héctor, you need another way to reference these! > (e.g. a URL) > > paul > > > Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : > > > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But > with > > some files I’ve got problems because they stored estrange characters. I > got > > stored this content: > > +++++++ > > > > Starting a Search Application > > > > > Abstract > > > Starting > > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i > > > > > Starting a Search Application A Lucid Imagination White Paper ¥ April > 2009 > > Page ii Do You Need Full-text Search? > > > ∞ > > > ∞ > > ∞ > > > Starting > > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1 > > > Identifying > > Ideal Results > > > Starting > > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 2 > > > Starting > > a Search Application A Lucid Imagination White Paper > > > > > > +++++++ > > > > But if I open the pdf file I have no problem to see the content > correctly. > > > > I think this is a question of the charset encoding, but I don't know if I > > can avoid this behaviour with a different analyzer o tokenizer to be > applied > > in indexing time, may be. > > > > I've got this problem with some documents downloaded from Lucid's Web. > > > > > > > > I don't know if some have had the same problem and know how to solve > this. > > > > Thanks > > > > Best regards > >