Re: Using Tesseract OCR to extract PDF files in EML file attachment

2017-04-04 Thread AJ Weber
You'll need to use something like javax mail (or some of the jars that have been built on top of it for higher-level access) to open the EML files and extract the attachments, then operate on the extracted attachments as you would any file. There are alternative, paid, libraries to parse and e

Re: The book: Solr 4.x Deep Dive - Early Access Release #1

2013-06-21 Thread AJ Weber
On 6/21/2013 9:22 AM, Alexandre Rafalovitch wrote: I might be however confused regarding your strategy. I thought you were going to do several different volumes, rather than one large one. Or is this all a 'first' volume discussion so far. Pricing: $7.99 feels better for the book this size. U

newbie questions about cache stats & query perf

2013-01-09 Thread AJ Weber
Sorry, I did search for an answer, but didn't find an applicable one. I'm currently stuck on 1.4.1 (running in Tomcat 6 on 64bit Linux) for the time being... When I see stats like this: name: documentCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU