solr tika extraction video creation date problem (hours ahead)

2019-04-03 Thread Where is Where
Hello , I was following the instruction
https://lucene.apache.org/solr/guide/7_1/uploading-data-with-solr-cell-using-apache-tika.html
to upload files with metadata stored and indexed in solr. I was checking
the extracted creation date ( attr_meta_creation_date ), for image, jpg
etc, the creation dates are correct but all creation dates for video are 11
hours ahead of the actual creation date. (The dates are correct when viewed
in other applications) It causes problem with searching due to this
inconsistency. Any idea is much appreciated. Thanks!


Re: solr tika extraction video creation date problem (hours ahead)

2019-04-07 Thread Where is Where
Thank you very much Alex for the great suggestion.

On Fri, Apr 5, 2019 at 7:25 PM Alexandre Rafalovitch 
wrote:

> Well, Tika would use different libraries to extract different formats.
> So maybe there is a bug. I would just get a standalone tika (of
> matching version to the one in Solr) and see what the output from two
> sample files are. Then, I would check with the latest Tika, just in
> case.
>
> I would also use some non-Tika way to check what the dates are, just
> in case the date is wrong during encoding rather than during indexing.
> A low-probability chance, but just covering all the bases.
>
> Regards,
>Alex.
>
> On Fri, 5 Apr 2019 at 01:39, whisere  wrote:
> >
> > Thanks Alex. The problem is image creation date is correct, but the video
> > creation date is wrong (hours behind), if I set the time_zone I think the
> > image creation date will be wrong then. wonder what the difference
> between
> > image and video extraction in tika.
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


problem indexing GPS metadata for video upload

2019-05-01 Thread Where is Where
uploading video to solr via tika
https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
The index has no video GPS metadata which is extracted and indexed for
images such as jpeg. I have checked both MP4 and MOV files, the files I
checked all have GPS Exif data embedded in the same fields as image. Any
idea? Thanks!


Re: problem indexing GPS metadata for video upload

2019-05-02 Thread Where is Where
Thank you Alex and Tim.
I have looked at the solrconfig.xml file (I am trying the techproducts demo
config), the only related place I can find is the extract handle



  true
  

  
  true
  links
  ignored_

  

I am using this command bin/post -c techproducts example/exampledocs/1.mp4
-params "literal.id=mp4_1&uprefix=attr_"

I have tried commenting out ignored_ and changing
to div
but still not working. I don't quite get why image is getting gps etc
metadata but video is acting differently while it is using the same
solrconfig and the gps metadata are in the same fields. There is no
differentiation in solrconfig setting between image and video.

Tim yes this is related to the TIKA link. Thank you!

Here is the output in solr for mp4.

{
"attr_meta":["stream_size",
  "5721559",
  "date",
  "2019-03-29T04:36:39Z",
  "X-Parsed-By",
  "org.apache.tika.parser.DefaultParser",
  "X-Parsed-By",
  "org.apache.tika.parser.mp4.MP4Parser",
  "stream_content_type",
  "application/octet-stream",
  "meta:creation-date",
  "2019-03-29T04:36:39Z",
  "Creation-Date",
  "2019-03-29T04:36:39Z",
  "tiff:ImageLength",
  "1080",
  "resourceName",
  "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
  "dcterms:created",
  "2019-03-29T04:36:39Z",
  "dcterms:modified",
  "2019-03-29T04:36:39Z",
  "Last-Modified",
  "2019-03-29T04:36:39Z",
  "Last-Save-Date",
  "2019-03-29T04:36:39Z",
  "xmpDM:audioSampleRate",
  "1000",
  "meta:save-date",
  "2019-03-29T04:36:39Z",
  "modified",
  "2019-03-29T04:36:39Z",
  "tiff:ImageWidth",
  "1920",
  "xmpDM:duration",
  "2.64",
  "Content-Type",
  "video/mp4"],
"id":"mp4_4",
"attr_stream_size":["5721559"],
"attr_date":["2019-03-29T04:36:39Z"],
"attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
  "org.apache.tika.parser.mp4.MP4Parser"],
"attr_stream_content_type":["application/octet-stream"],
"attr_meta_creation_date":["2019-03-29T04:36:39Z"],
"attr_creation_date":["2019-03-29T04:36:39Z"],
"attr_tiff_imagelength":["1080"],

"resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
"attr_dcterms_created":["2019-03-29T04:36:39Z"],
"attr_dcterms_modified":["2019-03-29T04:36:39Z"],
"last_modified":"2019-03-29T04:36:39Z",
        "attr_last_save_date":["2019-03-29T04:36:39Z"],
"attr_xmpdm_audiosamplerate":["1000"],
"attr_meta_save_date":["2019-03-29T04:36:39Z"],
"attr_modified":["2019-03-29T04:36:39Z"],
"attr_tiff_imagewidth":["1920"],
"attr_xmpdm_duration":["2.64"],
"content_type":["video/mp4"],
"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
 \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
"_version_":1632383499325407232}]
  }}

JPEG is getting these:
"attr_meta":[
"GPS Latitude",
  "37° 47' 41.99\"",

"attr_gps_latitude":["37° 47' 41.99\""],


On Wed, May 1, 2019 at 2:57 PM Where is Where  wrote:

> uploading video to solr via tika
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> The index has no video GPS metadata which is extracted and indexed for
> images such as jpeg. I have checked both MP4 and MOV files, the files I
> checked all have GPS Exif data embedded in the same fields as image. Any
> idea? Thanks!
>


Re: problem indexing GPS metadata for video upload

2019-05-03 Thread Where is Where
Thank you very much Tim, I wonder how to make the Tika change apply to
Solr? I saw Tika core, parse and xml jar files tika-core.jar
tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
just  replace these files? Thanks!

On Thu, May 2, 2019 at 12:16 PM Where is Where  wrote:

> Thank you Alex and Tim.
> I have looked at the solrconfig.xml file (I am trying the techproducts
> demo config), the only related place I can find is the extract handle
>
>startup="lazy"
>   class="solr.extraction.ExtractingRequestHandler" >
> 
>   true
>   
>
>   
>   true
>   links
>   ignored_
> 
>   
>
> I am using this command bin/post -c techproducts
> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
>
> I have tried commenting out ignored_ and
> changing to div
> but still not working. I don't quite get why image is getting gps etc
> metadata but video is acting differently while it is using the same
> solrconfig and the gps metadata are in the same fields. There is no
> differentiation in solrconfig setting between image and video.
>
> Tim yes this is related to the TIKA link. Thank you!
>
> Here is the output in solr for mp4.
>
> {
> "attr_meta":["stream_size",
>   "5721559",
>   "date",
>   "2019-03-29T04:36:39Z",
>   "X-Parsed-By",
>   "org.apache.tika.parser.DefaultParser",
>   "X-Parsed-By",
>   "org.apache.tika.parser.mp4.MP4Parser",
>   "stream_content_type",
>   "application/octet-stream",
>   "meta:creation-date",
>   "2019-03-29T04:36:39Z",
>   "Creation-Date",
>   "2019-03-29T04:36:39Z",
>   "tiff:ImageLength",
>   "1080",
>   "resourceName",
>   "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>   "dcterms:created",
>   "2019-03-29T04:36:39Z",
>   "dcterms:modified",
>   "2019-03-29T04:36:39Z",
>   "Last-Modified",
>   "2019-03-29T04:36:39Z",
>   "Last-Save-Date",
>   "2019-03-29T04:36:39Z",
>   "xmpDM:audioSampleRate",
>   "1000",
>   "meta:save-date",
>   "2019-03-29T04:36:39Z",
>   "modified",
>   "2019-03-29T04:36:39Z",
>   "tiff:ImageWidth",
>   "1920",
>   "xmpDM:duration",
>   "2.64",
>   "Content-Type",
>   "video/mp4"],
> "id":"mp4_4",
> "attr_stream_size":["5721559"],
> "attr_date":["2019-03-29T04:36:39Z"],
> "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>   "org.apache.tika.parser.mp4.MP4Parser"],
> "attr_stream_content_type":["application/octet-stream"],
> "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
> "attr_creation_date":["2019-03-29T04:36:39Z"],
> "attr_tiff_imagelength":["1080"],
> 
> "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
> "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
> "last_modified":"2019-03-29T04:36:39Z",
> "attr_last_save_date":["2019-03-29T04:36:39Z"],
> "attr_xmpdm_audiosamplerate":["1000"],
> "attr_meta_save_date":["2019-03-29T04:36:39Z"],
> "attr_modified":["2019-03-29T04:36:39Z"],
> "attr_tiff_imagewidth":["1920"],
> "attr_xmpdm_duration":["2.64"],
> "content_type":["video/mp4"],
> "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  
> \n  \n  \n  \n  \n  \n  \n  \n \n   "],
> "_version_":1632383499325407232}]
>   }}
>
> JPEG is getting these:
> "attr_meta":[
> "GPS Latitude",
>   "37° 47' 41.99\"",
> 
> "attr_gps_latitude":["37° 47' 41.99\""],
>
>
> On Wed, May 1, 2019 at 2:57 PM Where is Where  wrote:
>
>> uploading video to solr via tika
>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
>> The index has no video GPS metadata which is extracted and indexed for
>> images such as jpeg. I have checked both MP4 and MOV files, the files I
>> checked all have GPS Exif data embedded in the same fields as image. Any
>> idea? Thanks!
>>
>


Re: problem indexing GPS metadata for video upload

2019-05-29 Thread Where is Where
Sorry Tim! I missed your last message about this issue! Thank you very much
for the information.
Is the latest 1.21 Tika Incorporated with the change already? and how about
solr?

Thanks!

On Fri, May 3, 2019 at 11:28 AM Where is Where  wrote:

> Thank you very much Tim, I wonder how to make the Tika change apply to
> Solr? I saw Tika core, parse and xml jar files tika-core.jar
> tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
> just  replace these files? Thanks!
>
> On Thu, May 2, 2019 at 12:16 PM Where is Where  wrote:
>
>> Thank you Alex and Tim.
>> I have looked at the solrconfig.xml file (I am trying the techproducts
>> demo config), the only related place I can find is the extract handle
>>
>> >   startup="lazy"
>>   class="solr.extraction.ExtractingRequestHandler" >
>> 
>>   true
>>   
>>
>>   
>>   true
>>   links
>>   ignored_
>> 
>>   
>>
>> I am using this command bin/post -c techproducts
>> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
>>
>> I have tried commenting out ignored_ and
>> changing to div
>> but still not working. I don't quite get why image is getting gps etc
>> metadata but video is acting differently while it is using the same
>> solrconfig and the gps metadata are in the same fields. There is no
>> differentiation in solrconfig setting between image and video.
>>
>> Tim yes this is related to the TIKA link. Thank you!
>>
>> Here is the output in solr for mp4.
>>
>> {
>> "attr_meta":["stream_size",
>>   "5721559",
>>   "date",
>>   "2019-03-29T04:36:39Z",
>>   "X-Parsed-By",
>>   "org.apache.tika.parser.DefaultParser",
>>   "X-Parsed-By",
>>   "org.apache.tika.parser.mp4.MP4Parser",
>>   "stream_content_type",
>>   "application/octet-stream",
>>   "meta:creation-date",
>>   "2019-03-29T04:36:39Z",
>>   "Creation-Date",
>>   "2019-03-29T04:36:39Z",
>>   "tiff:ImageLength",
>>   "1080",
>>   "resourceName",
>>   "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>>   "dcterms:created",
>>   "2019-03-29T04:36:39Z",
>>   "dcterms:modified",
>>   "2019-03-29T04:36:39Z",
>>   "Last-Modified",
>>   "2019-03-29T04:36:39Z",
>>   "Last-Save-Date",
>>   "2019-03-29T04:36:39Z",
>>   "xmpDM:audioSampleRate",
>>   "1000",
>>   "meta:save-date",
>>   "2019-03-29T04:36:39Z",
>>   "modified",
>>   "2019-03-29T04:36:39Z",
>>   "tiff:ImageWidth",
>>   "1920",
>>   "xmpDM:duration",
>>   "2.64",
>>   "Content-Type",
>>   "video/mp4"],
>> "id":"mp4_4",
>> "attr_stream_size":["5721559"],
>> "attr_date":["2019-03-29T04:36:39Z"],
>> "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>>   "org.apache.tika.parser.mp4.MP4Parser"],
>> "attr_stream_content_type":["application/octet-stream"],
>> "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
>> "attr_creation_date":["2019-03-29T04:36:39Z"],
>> "attr_tiff_imagelength":["1080"],
>> 
>> "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>> "attr_dcterms_created":["2019-03-29T04:36:39Z"],
>> "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
>> "last_modified":"2019-03-29T04:36:39Z",
>> "attr_last_save_date":["2019-03-29T04:36:39Z"],
>> "attr_xmpdm_audiosamplerate":["1000"],
>> "attr_meta_save_date":["2019-03-29T04:36:39Z"],
>> "attr_modified":["2019-03-29T04:36:39Z"],
>> "attr_tiff_imagewidth":["1920"],
>> "attr_xmpdm_duration":["2.64"],
>> "content_type":["video/mp4"],
>> "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  
>> \n  \n  \n  \n  \n  \n  \n  \n \n   "],
>> "_version_":1632383499325407232}]
>>   }}
>>
>> JPEG is getting these:
>> "attr_meta":[
>> "GPS Latitude",
>>   "37° 47' 41.99\"",
>> 
>> "attr_gps_latitude":["37° 47' 41.99\""],
>>
>>
>> On Wed, May 1, 2019 at 2:57 PM Where is Where  wrote:
>>
>>> uploading video to solr via tika
>>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
>>> The index has no video GPS metadata which is extracted and indexed for
>>> images such as jpeg. I have checked both MP4 and MOV files, the files I
>>> checked all have GPS Exif data embedded in the same fields as image. Any
>>> idea? Thanks!
>>>
>>