Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-27 Thread Gary Taylor
e="bin"> So it's something related to BinFileDataSource and TikaEntityProcessor. Thanks, Gary. On 26/02/2015 14:24, Gary Taylor wrote: Alex, That's great. Thanks for the pointers. I'll try and get more info on this and

Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor
Alex, That's great. Thanks for the pointers. I'll try and get more info on this and file a JIRA issue. Kind regards, Gary. On 26/02/2015 14:16, Alexandre Rafalovitch wrote: On 26 February 2015 at 08:32, Gary Taylor wrote: Alex, Same results on recursive=true / recursive=fals

Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor
Alex, Same results on recursive=true / recursive=false. I also tried importing plain text files instead of epub (still using TikeEntityProcessor though) and get exactly the same result - ie. all files fetched, but only one document indexed in Solr. With verbose output, I get a row for each f

Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor
Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 25 February 2015 at 11:14, Gary Taylor wrote: I can't get the FileListEntityProcessor and TikeEntityProcessor to correctly add a Solr document for each epub file in my local directory. I've just downloaded

Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor
ed over 58! No errors are reported in the logs. I can search on the contents of that first epub document, so it's extracting OK in Tika, but there's a problem somewhere in my config that's causing only 1 document to be indexed in Solr. Thanks for any assistance / pointers. Regards,

Re: tika integration exception and other related queries

2011-06-09 Thread Gary Taylor
o the end user, how can i limit the number of characters while querying the search results ... is there any feature where we can achieve this ... the concept of snippet kind of thing ... Thanks Naveen On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor wrote: Naveen, For indexing Zip files with Tika,

Re: tika integration exception and other related queries

2011-06-08 Thread Gary Taylor
Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/

Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-23 Thread Gary Taylor
Jayendra, I cleared out my local repository, and replayed all of my steps from Friday and it now it works. The only difference (or the only one that's obvious to me) was that I applied the patch before doing a full compile/test/dist. But I assumed that given I was seeing my new log entries

Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-20 Thread Gary Taylor
so I know I'm running those patched files in the build. If anyone can shed any light on what's happening here, I'd be very grateful. Thanks and kind regards, Gary. On 11/04/2011 11:12, Gary Taylor wrote: Jayendra, Thanks for the info - been keeping an eye on this list in case

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-11 Thread Gary Taylor
chive files. Based on the email chain associated with your first message, some people have been able to get this functionality to work as desired. -- Gary Taylor INOVEM Tel +44 (0)1488 648 480 Fax +44 (0)7092 115 933 gary.tay...@inovem.com www.inovem.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE

Re: adding a document using curl

2011-03-03 Thread Gary Taylor
As an example, I run this in the same directory as the msword1.doc file: curl "http://localhost:8983/solr/core0/update/extract?literal.docid=74&literal.type=5"; -F "file=@msword1.doc" The "type" literal is just part of my schema. Gary. On 03/03/2011 11:45, Ken Foskey wrote: On Thu, 2011-0

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-31 Thread Gary Taylor
Can anyone shed any light on this, and whether it could be a config issue? I'm now using the latest SVN trunk, which includes the Tika 0.8 jars. When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt) to the ExtractingRequestHandler, I get the following log entry (formatted

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
cess the zip file when used standalone with the tika-app jar - it outputs both the filenames and contents. Should I be able to index the contents of files stored in a zip by using extract ? Thanks and kind regards, Gary. On 25/01/2011 15:32, Gary Taylor wrote: Thanks Erlend. Not used SVN before

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
Thanks Erlend. Not used SVN before, but have managed to download and build latest trunk code. Now I'm getting an error when trying to access the admin page (via Jetty) because I specify HTMLStripStandardTokenizerFactory in my schema.xml, but this appears to be no-longer supplied as part of t

Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
Hi, I posted a question in November last year about indexing content from multiple binary files into a single Solr document and Jayendra responded with a simple solution to zip them up and send that single file to Solr. I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't curre

Re: Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
to the ExtractingRequestHandler for indexing and included as a part of single Solr document. Regards, Jayendra On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor <g...@inovem.com> wrote: > Hi, > > We're trying to use Solr to replace a custom Lucene server. One > require

Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
Hi, We're trying to use Solr to replace a custom Lucene server. One requirement we have is to be able to index the content of multiple binary files into a single Solr document. For example, a uniquely named object in our app can have multiple attached-files (eg. Word, PDF etc.), and we want