Hmm, no attachment; maybe it's too large? Can you send it directly to me?
Mike McCandless http://blog.mikemccandless.com 2011/10/5 Héctor Trujillo <hecto...@gmail.com>: > This is the file that give me errors. > > 2011/10/5 Michael McCandless <luc...@mikemccandless.com> >> >> Can you attach this PDF to an email & send to the list? Or is it too >> large for that? >> >> Or, you can try running Tika directly on the PDF to see if it's able >> to extract the text. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> 2011/10/5 Héctor Trujillo <hecto...@gmail.com>: >> > Sorry you have the reason, this file was indexed with a .Net web service >> > client, that calls a Java application(a web service) that calls Solr >> > using >> > SolrJ. >> > >> > I will try to index this in a different way, may be this resolve the >> > problem. >> > >> > Thanks >> > >> > Best regards >> > >> > >> > >> > El 5 de octubre de 2011 08:42, Héctor Trujillo >> > <hecto...@gmail.com>escribió: >> > >> >> It seems unreasonable that if I want to index a local file, I have to >> >> references this local file by an URL. >> >> >> >> This isn't a estrange file, this is a file downloaded from lucid web >> >> portal >> >> called: Starting a Search Application.pdf >> >> >> >> This problem may be a codification problem, or char set problem. I open >> >> this file with a PDF Reader and I have no problems, and I don’t Know >> >> why >> >> referencing this file with and URL will fix this problem, can you help >> >> me? >> >> >> >> I'm working with SolrJ, from Java, does some have the same problem with >> >> SolrJ? >> >> >> >> >> >> >> >> Thanks to Paul Libbrecht, for your option. >> >> >> >> >> >> >> >> Best regards >> >> >> >> >> >> >> >> >> >> >> >> >> >> 2011/10/4 Paul Libbrecht <p...@hoplahup.net> >> >> >> >>> full of boxes for me. >> >>> Héctor, you need another way to reference these! >> >>> (e.g. a URL) >> >>> >> >>> paul >> >>> >> >>> >> >>> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : >> >>> >> >>> > Hi all, I'm indexing pdf's files with SolrJ, and most of them work. >> >>> > But >> >>> with >> >>> > some files I’ve got problems because they stored estrange >> >>> > characters. I >> >>> got >> >>> > stored this content: >> >>> > +++++++ >> >>> > >> >>> > Starting a Search Application >> >>> > >> >>> >> >>> >> >>> > Abstract >> >>> > >> >>> >> >>> Starting >> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009 >> >>> > Page >> >>> i >> >>> > >> >>> >> >>> >> >>> > Starting a Search Application A Lucid Imagination White Paper ¥ >> >>> > April >> >>> 2009 >> >>> > Page ii Do You Need Full-text Search? >> >>> > >> >>> >> >>> ∞ >> >>> > >> >>> >> >>> ∞ >> >>> > ∞ >> >>> > >> >>> >> >>> Starting >> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009 >> >>> > Page >> >>> 1 >> >>> > >> >>> >> >>> Identifying >> >>> > Ideal Results >> >>> > >> >>> >> >>> Starting >> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009 >> >>> > Page >> >>> 2 >> >>> > >> >>> >> >>> Starting >> >>> > a Search Application A Lucid Imagination White Paper >> >>> > >> >>> > >> >>> > +++++++ >> >>> > >> >>> > But if I open the pdf file I have no problem to see the content >> >>> correctly. >> >>> > >> >>> > I think this is a question of the charset encoding, but I don't know >> >>> > if >> >>> I >> >>> > can avoid this behaviour with a different analyzer o tokenizer to be >> >>> applied >> >>> > in indexing time, may be. >> >>> > >> >>> > I've got this problem with some documents downloaded from Lucid's >> >>> > Web. >> >>> > >> >>> > >> >>> > >> >>> > I don't know if some have had the same problem and know how to solve >> >>> this. >> >>> > >> >>> > Thanks >> >>> > >> >>> > Best regards >> >>> >> >>> >> >> >> > > >