e="bin">
So it's something related to BinFileDataSource and TikaEntityProcessor.
Thanks,
Gary.
On 26/02/2015 14:24, Gary Taylor wrote:
Alex,
That's great. Thanks for the pointers. I'll try and get more info on
this and
Alex,
That's great. Thanks for the pointers. I'll try and get more info on
this and file a JIRA issue.
Kind regards,
Gary.
On 26/02/2015 14:16, Alexandre Rafalovitch wrote:
On 26 February 2015 at 08:32, Gary Taylor wrote:
Alex,
Same results on recursive=true / recursive=fals
Alex,
Same results on recursive=true / recursive=false.
I also tried importing plain text files instead of epub (still using
TikeEntityProcessor though) and get exactly the same result - ie. all
files fetched, but only one document indexed in Solr.
With verbose output, I get a row for each f
Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 25 February 2015 at 11:14, Gary Taylor wrote:
I can't get the FileListEntityProcessor and TikeEntityProcessor to correctly
add a Solr document for each epub file in my local directory.
I've just downloaded
ed over 58!
No errors are reported in the logs.
I can search on the contents of that first epub document, so it's
extracting OK in Tika, but there's a problem somewhere in my config
that's causing only 1 document to be indexed in Solr.
Thanks for any assistance / pointers.
Regards,
o the end user, how can i limit
the number of characters while querying the search results ... is there any
feature where we can achieve this ... the concept of snippet kind of thing
...
Thanks
Naveen
On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor wrote:
Naveen,
For indexing Zip files with Tika,
Naveen,
For indexing Zip files with Tika, take a look at the following thread :
http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html
I got it to work with the 3.1 source and a couple of patches.
Hope this helps.
Regards,
Gary.
On 08/
Jayendra,
I cleared out my local repository, and replayed all of my steps from
Friday and it now it works. The only difference (or the only one that's
obvious to me) was that I applied the patch before doing a full
compile/test/dist. But I assumed that given I was seeing my new log
entries
so I know I'm
running those patched files in the build.
If anyone can shed any light on what's happening here, I'd be very grateful.
Thanks and kind regards,
Gary.
On 11/04/2011 11:12, Gary Taylor wrote:
Jayendra,
Thanks for the info - been keeping an eye on this list in case
chive files. Based on the email chain associated with your first message,
some people have been able to get this functionality to work as desired.
--
Gary Taylor
INOVEM
Tel +44 (0)1488 648 480
Fax +44 (0)7092 115 933
gary.tay...@inovem.com
www.inovem.com
INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
As an example, I run this in the same directory as the msword1.doc file:
curl
"http://localhost:8983/solr/core0/update/extract?literal.docid=74&literal.type=5";
-F "file=@msword1.doc"
The "type" literal is just part of my schema.
Gary.
On 03/03/2011 11:45, Ken Foskey wrote:
On Thu, 2011-0
Can anyone shed any light on this, and whether it could be a config
issue? I'm now using the latest SVN trunk, which includes the Tika 0.8
jars.
When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt)
to the ExtractingRequestHandler, I get the following log entry
(formatted
cess the zip file when used
standalone with the tika-app jar - it outputs both the filenames and
contents. Should I be able to index the contents of files stored in a
zip by using extract ?
Thanks and kind regards,
Gary.
On 25/01/2011 15:32, Gary Taylor wrote:
Thanks Erlend.
Not used SVN before
Thanks Erlend.
Not used SVN before, but have managed to download and build latest trunk
code.
Now I'm getting an error when trying to access the admin page (via
Jetty) because I specify HTMLStripStandardTokenizerFactory in my
schema.xml, but this appears to be no-longer supplied as part of t
Hi,
I posted a question in November last year about indexing content from
multiple binary files into a single Solr document and Jayendra responded
with a simple solution to zip them up and send that single file to Solr.
I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't
curre
to the ExtractingRequestHandler for
indexing and included as a part of single Solr document.
Regards,
Jayendra
On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor <g...@inovem.com> wrote:
> Hi,
>
> We're trying to use Solr to replace a custom Lucene server. One
> require
Hi,
We're trying to use Solr to replace a custom Lucene server. One
requirement we have is to be able to index the content of multiple
binary files into a single Solr document. For example, a uniquely named
object in our app can have multiple attached-files (eg. Word, PDF etc.),
and we want
17 matches
Mail list logo