Re: Question about indexing PDFs

2016-08-26 Thread Betsey Benagh
;is very easy to do without knowing it. > >> not actually having 'indexed="true" set in your schema > >> not committing after inserting the doc > >Best, >Erick > >On Thu, Aug 25, 2016 at 11:19 AM, Betsey Benagh < >betsey.ben...@stresearch.com

Re: Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
x? >Often >these are defined by dynamic fields and the like in the schema files. > >Take a look at the admin UI>>schema browser>>drop down and you'll see all >the actual fields in your index... > >Best, >Erick > >On Thu, Aug 25, 2016 at 8:39 AM, Betse

Re: Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
It looks like the metadata of the PDFs was indexed, but not the content (which is what I was interested in). Searches on terms I know exist in the content come up empty. On 8/25/16, 2:16 PM, "Betsey Benagh" wrote: >Right, that¹s where I looked. No Œcontent¹. Which is wha

Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
Following the instructions in the quick start guide, I imported a bunch of PDF documents into my Solr 6.0 instance. As far as I can tell from the documentation, there should be a 'content' field indexing, well, the content, but I don't see it in the schema for that collection. Is there somethi

Oddity with importing documents...

2016-05-06 Thread Betsey Benagh
Since it appears that using a recent version of Tika with Solr is not really feasible, I'm trying to run Grobid on my files, and then import the corresponding XML into Solr. I don't see any errors on the post: bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml /Library/Java/Java

Re: Integrating grobid with Tika in solr

2016-05-04 Thread Betsey Benagh
What am I missing? On 5/4/16, 10:55 AM, "Shawn Heisey" wrote: >On 5/4/2016 8:38 AM, Betsey Benagh wrote: >> Thanks, I¹m currently using 5.5, and will try upgrading to 6.0. >> >> >> On 5/4/16, 10:37 AM, "Allison, Timothy B." wrote: >>&g

Re: Integrating grobid with Tika in solr

2016-05-04 Thread Betsey Benagh
eisey" wrote: >On 5/4/2016 8:38 AM, Betsey Benagh wrote: >> Thanks, I¹m currently using 5.5, and will try upgrading to 6.0. >> >> >> On 5/4/16, 10:37 AM, "Allison, Timothy B." wrote: >>> Y. Solr 6.0.0 is shipping with Tika 1.7. Grobid came in with Ti

Re: Integrating grobid with Tika in solr

2016-05-04 Thread Betsey Benagh
t;upgrades to Tika 1.13 (soon to be released...I think). SOLR-8981. > >-Original Message- >From: Betsey Benagh [mailto:betsey.ben...@stresearch.com] >Sent: Wednesday, May 4, 2016 10:07 AM >To: solr-user@lucene.apache.org >Subject: Re: Integrating grobid with Tika in solr >

Re: Integrating grobid with Tika in solr

2016-05-04 Thread Betsey Benagh
g.Class.forName(Class.java:348) at org.apache.tika.config.ServiceLoader.getServiceClass(ServiceLoader.java:189) at org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:338) ... 35 more 500 On 5/4/16, 10:00 AM, "Shawn Heisey" mailto:apa...@elyograg.org>> wrote:

Integrating grobid with Tika in solr

2016-05-04 Thread Betsey Benagh
(X-posted from stack overflow) This feels like a basic, dumb question, but my reading of the documentation has not led me to an answer. i'm using Solr to index journal articles. Using the out-of-the-box configuration, it indexed the text of the documents, but I'm looking to use Grobid to pull

Re: Growing memory?

2016-04-14 Thread Betsey Benagh
heap is based on number of documents or whatever? We¹re just playing around with it right now, but it sounds like we may need a different machine in order to load in all of the data we want to have available. Thanks, betsey On 4/14/16, 3:08 PM, "Shawn Heisey" wrote: >On 4/14/2016 12:

Re: Growing memory?

2016-04-14 Thread Betsey Benagh
to that level after forcing GCs, you'll be fine. > >Best, >Erick > >On Thu, Apr 14, 2016 at 11:45 AM, Betsey Benagh > wrote: >> X-posted from stack overflow... >> >> I'm running solr 6.0.0 in server mode. I have one core. I loaded about >>2000 doc

Growing memory?

2016-04-14 Thread Betsey Benagh
X-posted from stack overflow... I'm running solr 6.0.0 in server mode. I have one core. I loaded about 2000 documents in, and it was using about 54 MB of memory. No problem. Nobody was issuing queries or doing anything else, but over the course of about 4 hours, the memory usage had tripled to