full of boxes for me. Héctor, you need another way to reference these! (e.g. a URL)
paul Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with > some files I’ve got problems because they stored estrange characters. I got > stored this content: > +++++++ > > Starting a Search Application > > Abstract > Starting > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i > > Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 > Page ii Do You Need Full-text Search? > ∞ > ∞ > ∞ > Starting > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1 > Identifying > Ideal Results > Starting > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 2 > Starting > a Search Application A Lucid Imagination White Paper > > > +++++++ > > But if I open the pdf file I have no problem to see the content correctly. > > I think this is a question of the charset encoding, but I don't know if I > can avoid this behaviour with a different analyzer o tokenizer to be applied > in indexing time, may be. > > I've got this problem with some documents downloaded from Lucid's Web. > > > > I don't know if some have had the same problem and know how to solve this. > > Thanks > > Best regards