Hi Jack, Thanks, for you kind comment. I am truly in the beginning of data modeling my schema over an existing working DB. I have used the school-teachers-student db as an example scenario. (a, I have written it as a disclaimer in my first post. b. I really do not know anyone that has 300 hobbies too.)
In real life my db is obviously much different, I just used this as an example of potential pitfalls that will occur if I use my old db data modeling notions. obviously, the old relational modeling idioms do not apply here. Now, my question was referring to the fact that I would really like to avoid a flat table/join/view because of the reason listed above. So, my scenario is answering a plain user generated text search over a MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship). So, I come here for tips. Should I use one combined index (treat it as a nosql source) or separate indices or another. any other ways to define relation data ? Thanks. On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky <j...@basetechnology.com>wrote: > It sounds like you still have a lot of work to do on your data model. No > matter how you slice it, 8 billion rows/fields/whatever is still way too > much for any engine to search on a single server. If you have 8 billion of > anything, a heavily sharded SolrCloud cluster is probably warranted. Don't > plan ahead to put more than 100 million rows on a single node; plan on a > proof of concept implementation to determine that number. > > When we in Solr land say "flattened" or "denormalized", we mean in an > intelligent, "smart", thoughtful sense, not a mindless, mechanical > flattening. It is an opportunity for you to reconsider your data models, > both old and new. > > Maybe data modeling is beyond your skill set. If so, have a chat with your > boss and ask for some assistance, training, whatever. > > Actually, I am suspicious of your 8 billion number - change each of those > 300's to realistic, average numbers. Each teacher teaches 300 courses? > Right. Each Student has 300 hobbies? If you say so, but... > > Don't worry about schema.xml until you get your data model under control. > > For an initial focus, try envisioning the use cases for user queries. That > will guide you in thinking about how the data would need to be organized to > satisfy those user queries. > > -- Jack Krupansky > > -----Original Message----- From: Mysurf Mail > Sent: Tuesday, June 18, 2013 2:20 AM > To: solr-user@lucene.apache.org > Subject: Re: How to define my data in schema.xml > > > Thanks for your reply. > I have tried the simplest approach and it works absolutely fantastic. > Huge table - 0s to result. > > two problems as I described earlier, and that is what I try to solve: > 1. I create a flat table just for solar. This requires maintenance and > develop. Can I run solr over my regular tables? > This is my simplest approach. Working over my relational tables, > 2. When you query a flat table by school name, as I described, if the > school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300 > studentHobbies, > you get 8.1 Billion rows (300*300*300*300). As I am sure this will work > great on solar - searching for the school name will retrieve 8.1 B rows. > 3. Lets say all my searches are user generated free text search that is > searching name and comments columns. > Thanks. > > > On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty <g...@mimirtech.com> wrote: > > On 18 June 2013 01:10, Mysurf Mail <stammail...@gmail.com> wrote: >> > Thanks for your quick reply. Here are some notes: >> > >> > 1. Consider that all tables in my example have two columns: Name & >> > Description which I would like to index and search. >> > 2. I have no other reason to create flat table other than for solar. So >> > I >> > would like to see if I can avoid it. >> > 3. If in my example I will have a flat table then obviously it will hold >> a >> > lot of rows for a single school. >> > By searching the exact school name I will likely receive a lot of >> rows. >> > (my flat table has its own pk) >> >> Yes, all of this is definitely the case, but in practice >> it does not matter. Solr can efficiently search through >> millions of rows. To start with, just try the simplest >> approach, and only complicate things as and when >> needed. >> >> > That is something I would like to avoid and I thought I can avoid >> this >> > by defining teachers and students as multiple value or something like >> this >> > and than teacherCourses and studentHobbies as 1:n respectively. >> > This is quite similiar to my real life demand, so I came here to get >> > some tips as a solr noob. >> >> You have still not described what are the searches that >> you would want to do. Again, I would suggest starting >> with the most straightforward approach. >> >> Regards, >> Gora >> >> >