Re: Solr 5 options

Charlie Hull Wed, 15 Jul 2015 01:14:43 -0700

On 14/07/2015 17:04, Erick Erickson wrote:

Well, Shawn I for one am in your corner.


Schemaless is great for getting thing running, but it's
not an AI. And it can get into trouble guessing. Say
it guesses a field should be an int because the first one
it sees is 123 but it's really a part number. Then when
a part number 123-456 comes through the doc will fail
to index  with "illegal number format".

This is my issue with 'schemaless' - it makes far too many assumptionsabout data types. Elasticsearch suffers from this as well:https://orchestrate.io/blog/2014/09/30/improved-elasticsearch-indexing/


Charlie


bq: Also, does the fact that I intend to use a data import handler to run feeds
from large numbers of oracle schemas have any impact on the above?

Yes. You have to map the DB schemas into Solr
somehow. Schemaless will try to guess, but as above it doesn't
have any real understanding of the data. Dynamic fields are certainly
a viable option, you'll be assigning columns to fields for each schema
variant though.

Best,
Erick

On Tue, Jul 14, 2015 at 6:15 AM, Shawn Heisey <apa...@elyograg.org> wrote:

On 7/14/2015 4:44 AM, spleenboy wrote:

Many Thanks to those who helped me on my last post: I'm almost there.
So here is the doc I need to index:
{
   "doc":
   {
     "id":"2",
     "cus_name_s":"Paul Brown",
     "cus_email_t":["paul.br...@here.net"],
     "com_id_i":201,
     "com_name_s":"Berenices",
     "url_s":"domain.net/integration/"}}

I only need to be able to search on email.
My plan was to to use classic, as I was going to run this on a single node.
I am happy to use dynamic fields to define the structure of the doc, so I
don't think I need a schema.xml: I think this is classic/schemaless (?)
I am still a little confused between schemaless and managed schema.
Do I implement this using the right combination of parameters in my bin/solr
create_core command.
Also, does the fact that I intend to use a data import handler to run feeds
from large numbers of oracle schemas have any impact on the above?


The "schemaless" mode isn't really schemaless ... it just means that
Solr will automatically guess what fieldType to use for a field that has
never been seen before, and then modify the schema to include that field
with the guessed fieldType.  It's sort of like the managed schema,
except it's managed automatically instead of by the admin.

I personally would not want Solr to guess on the schema, I would want to
explicitly define Solr's behavior ... but not everyone does things the
same way that I do.

Thanks,
Shawn



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Solr 5 options

Reply via email to