Hi Chris, thanks for looking at this. I'm using Solr 1.4.0 including the Tika that's in the tgz file which means Tika 0.4.
I've now discovered that only two letters are required. A single line with XE will crash it. This fails: r...@gamma:/home/ross# hexdump -C test.txt 00000000 58 45 0a |XE.| 00000003 r...@gamma:/home/ross# This works r...@gamma:/home/ross# hexdump -C test.txt 00000000 58 46 0a |XF.| 00000003 r...@gamma:/home/ross# XA, XB, XC, XD, XF all work okay. There's just something special about XE. The command I use is: curl "http://localhost:8080/solr-example/update/extract?literal.id=doc1&fmap.content=body&commit=true" -F "myfi...@test.txt" I filed a bug at https://issues.apache.org/jira/browse/TIKA-397 but I guess 0.4 is an old version so I wouldn't expert it to get much attention. It looks like I should upgrade Tika to 0.6. I don't really know how to do that or if Solr 1.4 works with Tika 0.6. The Tika pages talk about using Maven to build it. Sorry, I'm no Linux expert. Ross On Thu, Apr 1, 2010 at 1:07 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : Yes, please report this to the Tika project. > > except that when i run "tika-app-0.6.jar" on a text file like the one Ross > describes, i don't get the error he describes, which means it may be > something off in how Solr is using Tika. > > Ross: I can't reproduce this error on the trunk using the example solr > configs and the text file below. can you verify exactly which version of > SOlr you are using (and which version of tika you are using inside solr) > and the exact byte contents of your simplest problematic text file? > > hoss...@brunner:~/tmp$ cat tmp.txt > x > x > XXBLE > hoss...@brunner:~/tmp$ hexdump -C tmp.txt > 00000000 78 0a 78 0a 58 58 42 4c 45 0a |x.x.XXBLE.| > 0000000a > hoss...@brunner:~/tmp$ curl > "http://localhost:8983/solr/update/extract?literal.id=1&commit=true" -F > "myfi...@tmp.txt" > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"><int name="status">0</int><int > name="QTime">66</int></lst> > </response> > > > -Hoss > >