Re: parent/child rows in solr

John Smith Fri, 07 Sep 2018 18:45:02 -0700

Thanks Shawn, for your comments. The reason why I don't want to go flat
file structure, is due to all the wasted/duplicated data. If a department
has 100 employees, then it's very wasteful in terms of disk space to repeat
the header data over and over again, 100 times. In this example there is
only a few doc types, but my real-life data is much larger, and the problem
is a "scaling" problem; with just a little bit of data, no problem in
duplicating header fields, but with massive amounts of data it's a large
problem.


My understanding of both graph traversal and block joins, is that the
header data would only be present once, so that's why I'm gravitating
towards those solutions. I just can't seem to line up the "fq" and queries
correctly such that I am able to join 3+ document types together, filter on
them, and return my requested columns.

On Fri, Sep 7, 2018 at 9:32 PM Shawn Heisey <apa...@elyograg.org> wrote:

> On 9/7/2018 3:06 PM, John Smith wrote:
> > Hi, I have a document structure like this (this is a made up schema, my
> > data has nothing to do with departments and employees, but the structure
> > holds true to my real data):
> >
> > department 1
> >      employee 11
> >      employee 12
> >      employee 13
> >      room 11
> >      room 12
> >      room 13
> >
> > department 2
> >      employee 21
> >      employee 22
> >      room 21
> >
> > ... etc
> >
> > I'm trying to figure out the best way to index this, and perform queries.
> > Due to the sheer volume of data, I cannot do a simple "flat file"
> approach,
> > repeating the header data for each child entry.
>
> Why not?
>
> For the precise use case you have outlined, Solr will work better if you
> only have the child documents and simply have every document contain a
> "department" field which contains an identifier for the department.
> Since this precise structure is not what you are doing, you'll need to
> adapt what I'm saying to your actual data.
>
> The volume of data should be irrelevant to this decision. Solr will
> always work best with a flat document structure.
>
> I have never used the parent/child document feature in Solr, so I cannot
> offer any advice on it.  Somebody else will need to help you if you
> choose to use that feature.
>
> Thanks,
> Shawn
>
>

Re: parent/child rows in solr

Reply via email to