Ingesting/Querying Documents with Nested/Related Documents and extracting Full-text

2018-11-07 Thread Stephon Harris
Hi,

I want to ingest a collection of documents along with extracted full-text
from PDFs using solr 'update/extract' endpoint to store the text in a field
called "fullText". I want to relate some documents to other documents so
when I query the "fullText" field  with user terms, solr returns the first
matching document with "contentType" field equal to "overview", and several
related documents with different values for "contentType" like this:


{
"id":"1",
"contentType":"overview",
"fullText":"Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Etiam consectetur ipsum libero, at egestas ante laoreet nec. Aliquam sem
elit, rhoncus efficitur laoreet sodales, hendrerit eget mi. Nulla facilisis
tincidunt tortor vel placerat. Phasellus blandit velit eget semper
tristique. Maecenas convallis orci purus, ac scelerisque erat pulvinar id.
Donec semper enim id justo cursus, vitae bibendum magna interdum. Maecenas
eu laoreet nibh. Quisque magna massa, semper et lorem sed, volutpat
pulvinar quam. Quisque a urna et risus feugiat fermentum nec et orci.
Pellentesque ac neque sed tortor convallis finibus sit amet id purus. Sed
blandit eget ante et semper. Vivamus.",
"product":"paper & goods"
},
{
"id":"2",
"contentType":"support",
"title":"The latest support boards",
"points":["Nulla facilisis tincidunt tortor vel placerat."," Phasellus
blandit velit eget semper tristique."],
"product":"paper & goods",
"parentID":"1"
},{
"id":"3",
"contentType":"boards",
"title":"",
"points":["Nulla facilisis tincidunt tortor vel placerat."," Phasellus
blandit velit eget semper tristique."],
"product":"paper & goods",
"parentID":"1"
}


I'm looking for any recommendations on ingesting and querying these
documents. Can I ingest these documents by nesting child documents in the
overview document and also extract full-text from a PDF? If so, how can I
query for both the parent and the children documents?
Or should I not nest related documents and instead match the overview's ID
field with a field in the related document called "parentID"? If so, how do
I form my query to match documents whose parentID field matches the value
of a document's ID field?

-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*E-mail:* shar...@enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*Cell:* 832-628-8352


Fwd: Ingesting/Querying Documents with Nested/Related Documents and extracting Full-text

2018-11-08 Thread Stephon Harris
Following up on this to see if anyone has thoughts.

-- Forwarded message -
From: Stephon Harris 
Date: Wed, Nov 7, 2018 at 12:21 PM
Subject: Ingesting/Querying Documents with Nested/Related Documents and
extracting Full-text
To: 


Hi,

I want to ingest a collection of documents along with extracted full-text
from PDFs using solr 'update/extract' endpoint to store the text in a field
called "fullText". I want to relate some documents to other documents so
when I query the "fullText" field  with user terms, solr returns the first
matching document with "contentType" field equal to "overview", and several
related documents with different values for "contentType" like this:


{
"id":"1",
"contentType":"overview",
"fullText":"Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Etiam consectetur ipsum libero, at egestas ante laoreet nec. Aliquam sem
elit, rhoncus efficitur laoreet sodales, hendrerit eget mi. Nulla facilisis
tincidunt tortor vel placerat. Phasellus blandit velit eget semper
tristique. Maecenas convallis orci purus, ac scelerisque erat pulvinar id.
Donec semper enim id justo cursus, vitae bibendum magna interdum. Maecenas
eu laoreet nibh. Quisque magna massa, semper et lorem sed, volutpat
pulvinar quam. Quisque a urna et risus feugiat fermentum nec et orci.
Pellentesque ac neque sed tortor convallis finibus sit amet id purus. Sed
blandit eget ante et semper. Vivamus.",
"product":"paper & goods"
},
{
"id":"2",
"contentType":"support",
"title":"The latest support boards",
"points":["Nulla facilisis tincidunt tortor vel placerat."," Phasellus
blandit velit eget semper tristique."],
"product":"paper & goods",
"parentID":"1"
},{
"id":"3",
"contentType":"boards",
"title":"",
"points":["Nulla facilisis tincidunt tortor vel placerat."," Phasellus
blandit velit eget semper tristique."],
"product":"paper & goods",
"parentID":"1"
}


I'm looking for any recommendations on ingesting and querying these
documents. Can I ingest these documents by nesting child documents in the
overview document and also extract full-text from a PDF? If so, how can I
query for both the parent and the children documents?
Or should I not nest related documents and instead match the overview's ID
field with a field in the related document called "parentID"? If so, how do
I form my query to match documents whose parentID field matches the value
of a document's ID field?

-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*E-mail:* shar...@enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*Cell:* 832-628-8352



-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*E-mail:* shar...@enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*Cell:* 832-628-8352


Nested Child document doesn't return in query

2018-12-17 Thread Stephon Harris
I ingested some nested documents into a Solr 7.4 core . When I search with
the following it's not returning a child document that I expected:



```

{!child of=cont_type:overview}id:2

```



I can see that the document I'm looking for exists with the query:



```

q=id:2-1

```



I'm wondering why the document with id "2-1" now doesn't return with the
Block Join Child Query Parser? It previously did. I'm wondering is there a
way someone could have un-nested a child document?

-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*E-mail:* shar...@enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*Cell:* 832-628-8352


Setting Solr Home via installation script

2019-01-07 Thread Stephon Harris
I am trying to install solr as a service so that when a restart takes place
the solr home directory is set to `example/schemaless/solr`where there are
cores I have created while running solr in the schemaless example.



As instructed in taking Solr to Production
<https://lucene.apache.org/solr/guide/7_4/taking-solr-to-production.html>,
I ran the command sudo bash ./install_solr_service.sh solr-7.4.0.tgz -i
/opt/ -d example/schemaless/solr -u solr -s solr -p 8983 and it started
solr successfully, however the solr home was set to /var/solr/data. I
thought that giving the -d option Solr home would be set to
example/schemaless/solr. What should I do to get solr home set to
example/schemaless/solr? Is there another way I should go about getting the
cores that I created under the schemaless directory in the solr home
directory?

-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*E-mail:* shar...@enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*Cell:* 832-628-8352


Fwd: Setting Solr Home via installation script

2019-01-08 Thread Stephon Harris
Seeing if anyone has any thoughts on this again.

-- Forwarded message -
From: Stephon Harris 
Date: Mon, Jan 7, 2019 at 10:05 AM
Subject: Setting Solr Home via installation script
To: 



I am trying to install solr as a service so that when a restart takes place
the solr home directory is set to `example/schemaless/solr`where there are
cores I have created while running solr in the schemaless example.



As instructed in taking Solr to Production
<https://lucene.apache.org/solr/guide/7_4/taking-solr-to-production.html>,
I ran the command sudo bash ./install_solr_service.sh solr-7.4.0.tgz -i
/opt/ -d example/schemaless/solr -u solr -s solr -p 8983 and it started
solr successfully, however the solr home was set to /var/solr/data. I
thought that giving the -d option Solr home would be set to
example/schemaless/solr. What should I do to get solr home set to
example/schemaless/solr? Is there another way I should go about getting the
cores that I created under the schemaless directory in the solr home
directory?

-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*E-mail:* shar...@enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*Cell:* 832-628-8352



-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*E-mail:* shar...@enterprise-knowledge.com/
<http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg>
*Cell:* 832-628-8352