Sorry you have the reason, this file was indexed with a .Net web service client, that calls a Java application(a web service) that calls Solr using SolrJ.
I will try to index this in a different way, may be this resolve the problem. Thanks Best regards El 5 de octubre de 2011 08:42, Héctor Trujillo <hecto...@gmail.com>escribió: > It seems unreasonable that if I want to index a local file, I have to > references this local file by an URL. > > This isn't a estrange file, this is a file downloaded from lucid web portal > called: Starting a Search Application.pdf > > This problem may be a codification problem, or char set problem. I open > this file with a PDF Reader and I have no problems, and I don’t Know why > referencing this file with and URL will fix this problem, can you help me? > > I'm working with SolrJ, from Java, does some have the same problem with > SolrJ? > > > > Thanks to Paul Libbrecht, for your option. > > > > Best regards > > > > > > > 2011/10/4 Paul Libbrecht <p...@hoplahup.net> > >> full of boxes for me. >> Héctor, you need another way to reference these! >> (e.g. a URL) >> >> paul >> >> >> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : >> >> > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But >> with >> > some files I’ve got problems because they stored estrange characters. I >> got >> > stored this content: >> > +++++++ >> > >> > Starting a Search Application >> > >> >> > Abstract >> > >> Starting >> > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page >> i >> > >> >> > Starting a Search Application A Lucid Imagination White Paper ¥ April >> 2009 >> > Page ii Do You Need Full-text Search? >> > >> ∞ >> > >> ∞ >> > ∞ >> > >> Starting >> > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page >> 1 >> > >> Identifying >> > Ideal Results >> > >> Starting >> > a Search Application A Lucid Imagination White Paper ¥ April 2009 Page >> 2 >> > >> Starting >> > a Search Application A Lucid Imagination White Paper >> > >> > >> > +++++++ >> > >> > But if I open the pdf file I have no problem to see the content >> correctly. >> > >> > I think this is a question of the charset encoding, but I don't know if >> I >> > can avoid this behaviour with a different analyzer o tokenizer to be >> applied >> > in indexing time, may be. >> > >> > I've got this problem with some documents downloaded from Lucid's Web. >> > >> > >> > >> > I don't know if some have had the same problem and know how to solve >> this. >> > >> > Thanks >> > >> > Best regards >> >> >