Re: Solr 6.4. Can't index MS Visio vsdx files

2017-07-03 Thread Gytis Mikuciunas
collections4 (which > is new in POI w Tika 1.14). (I assume you have already added curvesapi?) > > -Original Message- > From: Gytis Mikuciunas [mailto:gyt...@gmail.com] > Sent: Saturday, June 3, 2017 5:39 AM > To: solr-user@lucene.apache.org > Subject: RE: Solr 6.4. Ca

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-06-03 Thread Gytis Mikuciunas
ox 2.0.5 and 2.0.6-SNAPSHOT on ~500k pdfs, see: http://162.242.228.174/ reports/reports_pdfbox_2_0_6.tar.gz -----Original Message- From: Gytis Mikuciunas [mailto:gyt...@gmail.com] Sent: Tuesday, May 9, 2017 7:17 AM To: solr-user@lucene.apache.org Subject: Re: Solr 6.4. Can't index MS V

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-05-09 Thread Gytis Mikuciunas
here: https://builds.apache.org/ > > Please ask on the POI or Tika users lists for how to get the latest/latest > running, and thank you, again, for opening the issue on POI's Bugzilla. > > Best, > >Tim > > -Original Message- > From: Gytis

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
e that POI shouldn't throw a Runtime > exception. Perhaps open an issue in POI, or maybe we should catch this > special example at the Tika level? > > For "Caused by: java.lang.ArrayIndexOutOfBoundsException:", the POI team > _might_ be able to modify the parser to ignore

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
Thanks for your responses. Are there any posibilities to ignore parsing errors and continue indexing? because now solr/tika stops parsing whole document if it finds any exception On Apr 11, 2017 19:51, "Allison, Timothy B." wrote: > You might want to drop a note to the dev or user's list on Apac

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-04-11 Thread Gytis Mikuciunas
t;, "org.apache.solr.common.SolrException", "root-error-class", "java.lang.ArrayIndexOutOfBoundsException" ] } } Regards, Gytis On Mon, Feb 6, 2017 at 6:54 PM, Allison, Timothy B. wrote: > Shouldn't have take

Re: how to get modified field data if it doesn't exist in meta

2017-02-13 Thread Gytis Mikuciunas
re/src/java/org/apache/solr/update/processor/ > TemplateUpdateProcessorFactory.java > > Alternatively, you could implement your URP in Javascript, but I am > not sure that has an API to check file dates. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources

Re: how to get modified field data if it doesn't exist in meta

2017-02-12 Thread Gytis Mikuciunas
work. Regards, Alex On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: Hi, We have started to use solr for our documents indexing (vsd, vsdx, xls,xlsx, doc, docx, pdf, txt). Modified date values is needed for each file. MS Office's files, pdfs have this value. Problem is with txt

Re: how to get modified field data if it doesn't exist in meta

2017-02-12 Thread Gytis Mikuciunas
M, Alexandre Rafalovitch > wrote: > > Custom update request processor that looks up a file from the name and > gets > > the date should work. > > > > Regards, > > Alex > > > > On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" wrote: > >

how to get modified field data if it doesn't exist in meta

2017-02-09 Thread Gytis Mikuciunas
Hi, We have started to use solr for our documents indexing (vsd, vsdx, xls,xlsx, doc, docx, pdf, txt). Modified date values is needed for each file. MS Office's files, pdfs have this value. Problem is with txt files as they don't have this value in their meta. Is there any possibility to get it

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-02-06 Thread Gytis Mikuciunas
3 > > See also [1] > > [1] http://apache-poi.1045710.n5.nabble.com/support-for- > reading-Microsoft-Visio-2013-vsdx-format-td5721500.html > > -----Original Message- > From: Gytis Mikuciunas [mailto:gyt...@gmail.com] > Sent: Monday, February 6, 2017 8:19 AM > To:

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-02-06 Thread Gytis Mikuciunas
che Tika that is > > shipped with Solr. However Solr does not ship every possible parser > > with its installation. So, I think you are hitting Tika where it > > manages to figure out what type of content you have, but does not have > > (Apache POI - another O/S project) library insta

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-02-05 Thread Gytis Mikuciunas
a/POI's > project/download and make it visible to Solr (probably as an extension jar > in a lib folder somewhere - I am a bit hazy on that for latest Solr). > > The version of Tika that Solr uses is part of the changes notes. For 6.4, > it is https://github.com/apache/lucene-so

RE: Solr 6.4. Can't index MS Visio vsdx files

2017-02-03 Thread Gytis Mikuciunas
s an extension jar > in a lib folder somewhere - I am a bit hazy on that for latest Solr). > > The version of Tika that Solr uses is part of the changes notes. For 6.4, > it is https://github.com/apache/lucene-solr/blob/releases/ > lucene-solr/6.4.0/solr/CHANGES.txt >

Solr 6.4. Can't index MS Visio vsdx files

2017-02-03 Thread Gytis Mikuciunas
Hi, I'm using single core Solr 6.4 instance on windows server (windows server 2012 R2 standard), Java v8, (build 1.8.0_121-b13). All works more or less ok, except MS Visio vsdx files indexing. Every time it throws an error (no matters if it tries to index vsdx file or for example docx with vis