It sounds like you still have a lot of work to do on your data model. No
matter how you slice it, 8 billion rows/fields/whatever is still way too
much for any engine to search on a single server. If you have 8 billion of
anything, a heavily sharded SolrCloud cluster is probably warranted. Don't
plan ahead to put more than 100 million rows on a single node; plan on a
proof of concept implementation to determine that number.
When we in Solr land say "flattened" or "denormalized", we mean in an
intelligent, "smart", thoughtful sense, not a mindless, mechanical
flattening. It is an opportunity for you to reconsider your data models,
both old and new.
Maybe data modeling is beyond your skill set. If so, have a chat with your
boss and ask for some assistance, training, whatever.
Actually, I am suspicious of your 8 billion number - change each of those
300's to realistic, average numbers. Each teacher teaches 300 courses?
Right. Each Student has 300 hobbies? If you say so, but...
Don't worry about schema.xml until you get your data model under control.
For an initial focus, try envisioning the use cases for user queries. That
will guide you in thinking about how the data would need to be organized to
satisfy those user queries.
-- Jack Krupansky
-----Original Message-----
From: Mysurf Mail
Sent: Tuesday, June 18, 2013 2:20 AM
To: solr-user@lucene.apache.org
Subject: Re: How to define my data in schema.xml
Thanks for your reply.
I have tried the simplest approach and it works absolutely fantastic.
Huge table - 0s to result.
two problems as I described earlier, and that is what I try to solve:
1. I create a flat table just for solar. This requires maintenance and
develop. Can I run solr over my regular tables?
This is my simplest approach. Working over my relational tables,
2. When you query a flat table by school name, as I described, if the
school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300
studentHobbies,
you get 8.1 Billion rows (300*300*300*300). As I am sure this will work
great on solar - searching for the school name will retrieve 8.1 B rows.
3. Lets say all my searches are user generated free text search that is
searching name and comments columns.
Thanks.
On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty <g...@mimirtech.com> wrote:
On 18 June 2013 01:10, Mysurf Mail <stammail...@gmail.com> wrote:
> Thanks for your quick reply. Here are some notes:
>
> 1. Consider that all tables in my example have two columns: Name &
> Description which I would like to index and search.
> 2. I have no other reason to create flat table other than for solar. So
> I
> would like to see if I can avoid it.
> 3. If in my example I will have a flat table then obviously it will hold
a
> lot of rows for a single school.
> By searching the exact school name I will likely receive a lot of
rows.
> (my flat table has its own pk)
Yes, all of this is definitely the case, but in practice
it does not matter. Solr can efficiently search through
millions of rows. To start with, just try the simplest
approach, and only complicate things as and when
needed.
> That is something I would like to avoid and I thought I can avoid
this
> by defining teachers and students as multiple value or something like
this
> and than teacherCourses and studentHobbies as 1:n respectively.
> This is quite similiar to my real life demand, so I came here to get
> some tips as a solr noob.
You have still not described what are the searches that
you would want to do. Again, I would suggest starting
with the most straightforward approach.
Regards,
Gora