solr tika extraction video creation date problem (hours ahead)
Hello , I was following the instruction https://lucene.apache.org/solr/guide/7_1/uploading-data-with-solr-cell-using-apache-tika.html to upload files with metadata stored and indexed in solr. I was checking the extracted creation date ( attr_meta_creation_date ), for image, jpg etc, the creation dates are correct but all creation dates for video are 11 hours ahead of the actual creation date. (The dates are correct when viewed in other applications) It causes problem with searching due to this inconsistency. Any idea is much appreciated. Thanks!
Re: solr tika extraction video creation date problem (hours ahead)
Thank you very much Alex for the great suggestion. On Fri, Apr 5, 2019 at 7:25 PM Alexandre Rafalovitch wrote: > Well, Tika would use different libraries to extract different formats. > So maybe there is a bug. I would just get a standalone tika (of > matching version to the one in Solr) and see what the output from two > sample files are. Then, I would check with the latest Tika, just in > case. > > I would also use some non-Tika way to check what the dates are, just > in case the date is wrong during encoding rather than during indexing. > A low-probability chance, but just covering all the bases. > > Regards, >Alex. > > On Fri, 5 Apr 2019 at 01:39, whisere wrote: > > > > Thanks Alex. The problem is image creation date is correct, but the video > > creation date is wrong (hours behind), if I set the time_zone I think the > > image creation date will be wrong then. wonder what the difference > between > > image and video extraction in tika. > > > > > > > > -- > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
problem indexing GPS metadata for video upload
uploading video to solr via tika https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html The index has no video GPS metadata which is extracted and indexed for images such as jpeg. I have checked both MP4 and MOV files, the files I checked all have GPS Exif data embedded in the same fields as image. Any idea? Thanks!
Re: problem indexing GPS metadata for video upload
Thank you Alex and Tim. I have looked at the solrconfig.xml file (I am trying the techproducts demo config), the only related place I can find is the extract handle true true links ignored_ I am using this command bin/post -c techproducts example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_" I have tried commenting out ignored_ and changing to div but still not working. I don't quite get why image is getting gps etc metadata but video is acting differently while it is using the same solrconfig and the gps metadata are in the same fields. There is no differentiation in solrconfig setting between image and video. Tim yes this is related to the TIKA link. Thank you! Here is the output in solr for mp4. { "attr_meta":["stream_size", "5721559", "date", "2019-03-29T04:36:39Z", "X-Parsed-By", "org.apache.tika.parser.DefaultParser", "X-Parsed-By", "org.apache.tika.parser.mp4.MP4Parser", "stream_content_type", "application/octet-stream", "meta:creation-date", "2019-03-29T04:36:39Z", "Creation-Date", "2019-03-29T04:36:39Z", "tiff:ImageLength", "1080", "resourceName", "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", "dcterms:created", "2019-03-29T04:36:39Z", "dcterms:modified", "2019-03-29T04:36:39Z", "Last-Modified", "2019-03-29T04:36:39Z", "Last-Save-Date", "2019-03-29T04:36:39Z", "xmpDM:audioSampleRate", "1000", "meta:save-date", "2019-03-29T04:36:39Z", "modified", "2019-03-29T04:36:39Z", "tiff:ImageWidth", "1920", "xmpDM:duration", "2.64", "Content-Type", "video/mp4"], "id":"mp4_4", "attr_stream_size":["5721559"], "attr_date":["2019-03-29T04:36:39Z"], "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.mp4.MP4Parser"], "attr_stream_content_type":["application/octet-stream"], "attr_meta_creation_date":["2019-03-29T04:36:39Z"], "attr_creation_date":["2019-03-29T04:36:39Z"], "attr_tiff_imagelength":["1080"], "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", "attr_dcterms_created":["2019-03-29T04:36:39Z"], "attr_dcterms_modified":["2019-03-29T04:36:39Z"], "last_modified":"2019-03-29T04:36:39Z", "attr_last_save_date":["2019-03-29T04:36:39Z"], "attr_xmpdm_audiosamplerate":["1000"], "attr_meta_save_date":["2019-03-29T04:36:39Z"], "attr_modified":["2019-03-29T04:36:39Z"], "attr_tiff_imagewidth":["1920"], "attr_xmpdm_duration":["2.64"], "content_type":["video/mp4"], "content":[" \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n "], "_version_":1632383499325407232}] }} JPEG is getting these: "attr_meta":[ "GPS Latitude", "37° 47' 41.99\"", "attr_gps_latitude":["37° 47' 41.99\""], On Wed, May 1, 2019 at 2:57 PM Where is Where wrote: > uploading video to solr via tika > https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html > The index has no video GPS metadata which is extracted and indexed for > images such as jpeg. I have checked both MP4 and MOV files, the files I > checked all have GPS Exif data embedded in the same fields as image. Any > idea? Thanks! >
Re: problem indexing GPS metadata for video upload
Thank you very much Tim, I wonder how to make the Tika change apply to Solr? I saw Tika core, parse and xml jar files tika-core.jar tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we just replace these files? Thanks! On Thu, May 2, 2019 at 12:16 PM Where is Where wrote: > Thank you Alex and Tim. > I have looked at the solrconfig.xml file (I am trying the techproducts > demo config), the only related place I can find is the extract handle > >startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > > true > > > > true > links > ignored_ > > > > I am using this command bin/post -c techproducts > example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_" > > I have tried commenting out ignored_ and > changing to div > but still not working. I don't quite get why image is getting gps etc > metadata but video is acting differently while it is using the same > solrconfig and the gps metadata are in the same fields. There is no > differentiation in solrconfig setting between image and video. > > Tim yes this is related to the TIKA link. Thank you! > > Here is the output in solr for mp4. > > { > "attr_meta":["stream_size", > "5721559", > "date", > "2019-03-29T04:36:39Z", > "X-Parsed-By", > "org.apache.tika.parser.DefaultParser", > "X-Parsed-By", > "org.apache.tika.parser.mp4.MP4Parser", > "stream_content_type", > "application/octet-stream", > "meta:creation-date", > "2019-03-29T04:36:39Z", > "Creation-Date", > "2019-03-29T04:36:39Z", > "tiff:ImageLength", > "1080", > "resourceName", > "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", > "dcterms:created", > "2019-03-29T04:36:39Z", > "dcterms:modified", > "2019-03-29T04:36:39Z", > "Last-Modified", > "2019-03-29T04:36:39Z", > "Last-Save-Date", > "2019-03-29T04:36:39Z", > "xmpDM:audioSampleRate", > "1000", > "meta:save-date", > "2019-03-29T04:36:39Z", > "modified", > "2019-03-29T04:36:39Z", > "tiff:ImageWidth", > "1920", > "xmpDM:duration", > "2.64", > "Content-Type", > "video/mp4"], > "id":"mp4_4", > "attr_stream_size":["5721559"], > "attr_date":["2019-03-29T04:36:39Z"], > "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser", > "org.apache.tika.parser.mp4.MP4Parser"], > "attr_stream_content_type":["application/octet-stream"], > "attr_meta_creation_date":["2019-03-29T04:36:39Z"], > "attr_creation_date":["2019-03-29T04:36:39Z"], > "attr_tiff_imagelength":["1080"], > > "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", > "attr_dcterms_created":["2019-03-29T04:36:39Z"], > "attr_dcterms_modified":["2019-03-29T04:36:39Z"], > "last_modified":"2019-03-29T04:36:39Z", > "attr_last_save_date":["2019-03-29T04:36:39Z"], > "attr_xmpdm_audiosamplerate":["1000"], > "attr_meta_save_date":["2019-03-29T04:36:39Z"], > "attr_modified":["2019-03-29T04:36:39Z"], > "attr_tiff_imagewidth":["1920"], > "attr_xmpdm_duration":["2.64"], > "content_type":["video/mp4"], > "content":[" \n \n \n \n \n \n \n \n \n \n \n \n \n \n > \n \n \n \n \n \n \n \n \n "], > "_version_":1632383499325407232}] > }} > > JPEG is getting these: > "attr_meta":[ > "GPS Latitude", > "37° 47' 41.99\"", > > "attr_gps_latitude":["37° 47' 41.99\""], > > > On Wed, May 1, 2019 at 2:57 PM Where is Where wrote: > >> uploading video to solr via tika >> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html >> The index has no video GPS metadata which is extracted and indexed for >> images such as jpeg. I have checked both MP4 and MOV files, the files I >> checked all have GPS Exif data embedded in the same fields as image. Any >> idea? Thanks! >> >
Re: problem indexing GPS metadata for video upload
Sorry Tim! I missed your last message about this issue! Thank you very much for the information. Is the latest 1.21 Tika Incorporated with the change already? and how about solr? Thanks! On Fri, May 3, 2019 at 11:28 AM Where is Where wrote: > Thank you very much Tim, I wonder how to make the Tika change apply to > Solr? I saw Tika core, parse and xml jar files tika-core.jar > tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we > just replace these files? Thanks! > > On Thu, May 2, 2019 at 12:16 PM Where is Where wrote: > >> Thank you Alex and Tim. >> I have looked at the solrconfig.xml file (I am trying the techproducts >> demo config), the only related place I can find is the extract handle >> >> > startup="lazy" >> class="solr.extraction.ExtractingRequestHandler" > >> >> true >> >> >> >> true >> links >> ignored_ >> >> >> >> I am using this command bin/post -c techproducts >> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_" >> >> I have tried commenting out ignored_ and >> changing to div >> but still not working. I don't quite get why image is getting gps etc >> metadata but video is acting differently while it is using the same >> solrconfig and the gps metadata are in the same fields. There is no >> differentiation in solrconfig setting between image and video. >> >> Tim yes this is related to the TIKA link. Thank you! >> >> Here is the output in solr for mp4. >> >> { >> "attr_meta":["stream_size", >> "5721559", >> "date", >> "2019-03-29T04:36:39Z", >> "X-Parsed-By", >> "org.apache.tika.parser.DefaultParser", >> "X-Parsed-By", >> "org.apache.tika.parser.mp4.MP4Parser", >> "stream_content_type", >> "application/octet-stream", >> "meta:creation-date", >> "2019-03-29T04:36:39Z", >> "Creation-Date", >> "2019-03-29T04:36:39Z", >> "tiff:ImageLength", >> "1080", >> "resourceName", >> "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", >> "dcterms:created", >> "2019-03-29T04:36:39Z", >> "dcterms:modified", >> "2019-03-29T04:36:39Z", >> "Last-Modified", >> "2019-03-29T04:36:39Z", >> "Last-Save-Date", >> "2019-03-29T04:36:39Z", >> "xmpDM:audioSampleRate", >> "1000", >> "meta:save-date", >> "2019-03-29T04:36:39Z", >> "modified", >> "2019-03-29T04:36:39Z", >> "tiff:ImageWidth", >> "1920", >> "xmpDM:duration", >> "2.64", >> "Content-Type", >> "video/mp4"], >> "id":"mp4_4", >> "attr_stream_size":["5721559"], >> "attr_date":["2019-03-29T04:36:39Z"], >> "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser", >> "org.apache.tika.parser.mp4.MP4Parser"], >> "attr_stream_content_type":["application/octet-stream"], >> "attr_meta_creation_date":["2019-03-29T04:36:39Z"], >> "attr_creation_date":["2019-03-29T04:36:39Z"], >> "attr_tiff_imagelength":["1080"], >> >> "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4", >> "attr_dcterms_created":["2019-03-29T04:36:39Z"], >> "attr_dcterms_modified":["2019-03-29T04:36:39Z"], >> "last_modified":"2019-03-29T04:36:39Z", >> "attr_last_save_date":["2019-03-29T04:36:39Z"], >> "attr_xmpdm_audiosamplerate":["1000"], >> "attr_meta_save_date":["2019-03-29T04:36:39Z"], >> "attr_modified":["2019-03-29T04:36:39Z"], >> "attr_tiff_imagewidth":["1920"], >> "attr_xmpdm_duration":["2.64"], >> "content_type":["video/mp4"], >> "content":[" \n \n \n \n \n \n \n \n \n \n \n \n \n \n >> \n \n \n \n \n \n \n \n \n "], >> "_version_":1632383499325407232}] >> }} >> >> JPEG is getting these: >> "attr_meta":[ >> "GPS Latitude", >> "37° 47' 41.99\"", >> >> "attr_gps_latitude":["37° 47' 41.99\""], >> >> >> On Wed, May 1, 2019 at 2:57 PM Where is Where wrote: >> >>> uploading video to solr via tika >>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html >>> The index has no video GPS metadata which is extracted and indexed for >>> images such as jpeg. I have checked both MP4 and MOV files, the files I >>> checked all have GPS Exif data embedded in the same fields as image. Any >>> idea? Thanks! >>> >>