Hi Jack,
Thanks, for you kind comment.

I am truly in the beginning of data modeling my schema over an existing
working DB.
I have used the school-teachers-student db as an example scenario.
(a, I have written it as a disclaimer in my first post. b. I really do not
know anyone that has 300 hobbies too.)

In real life my db is obviously much different,
I just used this as an example of potential pitfalls that will occur if I
use my old db data modeling notions.
obviously, the old relational modeling idioms do not apply here.

Now, my question was referring to the fact that I would really like to
avoid a flat table/join/view because of the reason listed above.
So, my scenario is answering a plain user generated text search over a
MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship).

So, I come here for tips. Should I use one combined index (treat it as a
nosql source) or separate indices or another. any other ways to define
relation data ?
Thanks.



On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> It sounds like you still have a lot of work to do on your data model. No
> matter how you slice it, 8 billion rows/fields/whatever is still way too
> much for any engine to search on a single server. If you have 8 billion of
> anything, a heavily sharded SolrCloud cluster is probably warranted. Don't
> plan ahead to put more than 100 million rows on a single node; plan on a
> proof of concept implementation to determine that number.
>
> When we in Solr land say "flattened" or "denormalized", we mean in an
> intelligent, "smart", thoughtful sense, not a mindless, mechanical
> flattening. It is an opportunity for you to reconsider your data models,
> both old and new.
>
> Maybe data modeling is beyond your skill set. If so, have a chat with your
> boss and ask for some assistance, training, whatever.
>
> Actually, I am suspicious of your 8 billion number - change each of those
> 300's to realistic, average numbers. Each teacher teaches 300 courses?
> Right. Each Student has 300 hobbies? If you say so, but...
>
> Don't worry about schema.xml until you get your data model under control.
>
> For an initial focus, try envisioning the use cases for user queries. That
> will guide you in thinking about how the data would need to be organized to
> satisfy those user queries.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mysurf Mail
> Sent: Tuesday, June 18, 2013 2:20 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to define my data in schema.xml
>
>
> Thanks for your reply.
> I have tried the simplest approach and it works absolutely fantastic.
> Huge table - 0s to result.
>
> two problems as I described earlier, and that is what I try to solve:
> 1. I create a flat table just for solar. This requires maintenance and
> develop. Can I run solr over my regular tables?
>    This is my simplest approach. Working over my relational tables,
> 2. When you query a flat table by school name, as I described, if the
> school has 300 student, 300 teachers, 300  with 300 teacherCourses, 300
> studentHobbies,
>    you get 8.1 Billion rows (300*300*300*300). As I am sure this will work
> great on solar - searching for the school name will retrieve 8.1 B rows.
> 3. Lets say all my searches are user generated free text search that is
> searching name and comments columns.
> Thanks.
>
>
> On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty <g...@mimirtech.com> wrote:
>
>  On 18 June 2013 01:10, Mysurf Mail <stammail...@gmail.com> wrote:
>> > Thanks for your quick reply. Here are some notes:
>> >
>> > 1. Consider that all tables in my example have two columns: Name &
>> > Description which I would like to index and search.
>> > 2. I have no other reason to create flat table other than for solar. So
>> > I
>> > would like to see if I can avoid it.
>> > 3. If in my example I will have a flat table then obviously it will hold
>> a
>> > lot of rows for a single school.
>> >     By searching the exact school name I will likely receive a lot of
>> rows.
>> > (my flat table has its own pk)
>>
>> Yes, all of this is definitely the case, but in practice
>> it does not matter. Solr can efficiently search through
>> millions of rows. To start with, just try the simplest
>> approach, and only complicate things as and when
>> needed.
>>
>> >     That is something I would like to avoid and I thought I can avoid
>> this
>> > by defining teachers and students as multiple value or something like
>> this
>> > and than teacherCourses and studentHobbies  as 1:n respectively.
>> >     This is quite similiar to my real life demand, so I came here to get
>> > some tips as a solr noob.
>>
>> You have still not described what are the searches that
>> you would want to do. Again, I would suggest starting
>> with the most straightforward approach.
>>
>> Regards,
>> Gora
>>
>>
>

Reply via email to