Thanks Shawn, for your comments. The reason why I don't want to go flat file structure, is due to all the wasted/duplicated data. If a department has 100 employees, then it's very wasteful in terms of disk space to repeat the header data over and over again, 100 times. In this example there is only a few doc types, but my real-life data is much larger, and the problem is a "scaling" problem; with just a little bit of data, no problem in duplicating header fields, but with massive amounts of data it's a large problem.
My understanding of both graph traversal and block joins, is that the header data would only be present once, so that's why I'm gravitating towards those solutions. I just can't seem to line up the "fq" and queries correctly such that I am able to join 3+ document types together, filter on them, and return my requested columns. On Fri, Sep 7, 2018 at 9:32 PM Shawn Heisey <apa...@elyograg.org> wrote: > On 9/7/2018 3:06 PM, John Smith wrote: > > Hi, I have a document structure like this (this is a made up schema, my > > data has nothing to do with departments and employees, but the structure > > holds true to my real data): > > > > department 1 > > employee 11 > > employee 12 > > employee 13 > > room 11 > > room 12 > > room 13 > > > > department 2 > > employee 21 > > employee 22 > > room 21 > > > > ... etc > > > > I'm trying to figure out the best way to index this, and perform queries. > > Due to the sheer volume of data, I cannot do a simple "flat file" > approach, > > repeating the header data for each child entry. > > Why not? > > For the precise use case you have outlined, Solr will work better if you > only have the child documents and simply have every document contain a > "department" field which contains an identifier for the department. > Since this precise structure is not what you are doing, you'll need to > adapt what I'm saying to your actual data. > > The volume of data should be irrelevant to this decision. Solr will > always work best with a flat document structure. > > I have never used the parent/child document feature in Solr, so I cannot > offer any advice on it. Somebody else will need to help you if you > choose to use that feature. > > Thanks, > Shawn > >