I feel like you just experienced a wonderful lesson that we all peridodically 
experience….  “Extracting data at scale”

I wonder, is there any, way of coming up with hueristics to predict how long 
the process would take?  “Based on your settings, based on your doc types, 
based on sizes, based on historical records….   It will take 20 hours to run”…



> On Nov 22, 2019, at 8:25 AM, Tim Allison <[email protected]> wrote:
> 
> All,
>  I started the regression tests on a random set of 500k files.  I found
> this morning that it was _still_ going.  It turns out I had accidentally
> configured extract images for PDFs, which adds to the processing time and
> leads to more OOMs.
>  I restarted the regression tests this morning with that feature turned
> off.
> 
>       Best,
> 
>                   Tim

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to