Jayendra,
I cleared out my local repository, and replayed all of my steps from
Friday and it now it works. The only difference (or the only one that's
obvious to me) was that I applied the patch before doing a full
compile/test/dist. But I assumed that given I was seeing my new log
entries
Hi Gary,
I tried the patch on the the 3.1 source code (@
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/)
as well and it worked fine.
@Patch - https://issues.apache.org/jira/browse/SOLR-2416, which deals
with the Solr Cell module.
You may want to verify the contents from the
Hello again. Unfortunately, I'm still getting nowhere with this. I
have checked-out the 3.1 source and applied Jayendra's patches (see
below) and it still appears that the contents of the files in the
zipfile are not being indexed, only the filenames of those contained files.
I'm using a sim
Awesome. Thanks Jayendra. I hadn't caught these patches yet.
I applied SOLR-2416 patch to the solr-3.1 release tag. This resolved the
problem of archive files not being unpacked and indexed with Solr CELL.
Thanks for the FYI.
https://issues.apache.org/jira/browse/SOLR-2416
On Mon, Apr 11, 2011 a
Jayendra,
Thanks for the info - been keeping an eye on this list in case this
topic cropped up again. It's currently a background task for me, so
I'll try and take a look at the patches and re-test soon.
Joey - glad you brought this issue up again. I haven't progressed any
further with it.
The migration of Tika to the latest 0.8 version seems to have
reintroduced the issue.
I was able to get this working again with the following patches. (Solr
Cell and Data Import handler)
https://issues.apache.org/jira/browse/SOLR-2416
https://issues.apache.org/jira/browse/SOLR-2332
You can try t
Hi Gary,
I have been experiencing the same problem... Unable to extract content from
archive file formats. I just tried again with a clean install of Solr 3.1.0
(using Tika 0.8) and continue to experience the same results. Did you have
any success with this problem with Solr 1.4.1 or 3.1.0 ?
I'
Can anyone shed any light on this, and whether it could be a config
issue? I'm now using the latest SVN trunk, which includes the Tika 0.8
jars.
When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt)
to the ExtractingRequestHandler, I get the following log entry
(formatted
Hi Gary,
The latest Solr Trunk was able to extract and index the contents of the zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.
Tested again with sample url and works fine -
curl "
http://localhost:8080/solr
OK, got past the schema.xml problem, but now I'm back to square one.
I can index the contents of binary files (Word, PDF etc...), as well as
text files, but it won't index the content of files inside a zip.
As an example, I have two txt files - doc1.txt and doc2.txt. If I index
either of the
Thanks Erlend.
Not used SVN before, but have managed to download and build latest trunk
code.
Now I'm getting an error when trying to access the admin page (via
Jetty) because I specify HTMLStripStandardTokenizerFactory in my
schema.xml, but this appears to be no-longer supplied as part of t
On 25.01.11 11.30, Erlend Garåsen wrote:
Tika version 0.8 is not included in the latest release/trunk from SVN.
Ouch, I wrote "not" instead of "now". Sorry, I replied in a hurry.
And to clarify, by "content" I mean the main content of a Word file.
Title and other kinds of metadata are succes
There seems to be a bug with the current 1.4.1 release. You cannot
extract any content at all, regardless of content type.
Try to get a fresh version from the SVN repository. I did that earlier
today and can verify that Tika now will extract the content. I'm not
sure about zip files.
Tika
13 matches
Mail list logo