On 4/15/2013 8:40 AM, Marko Asplund wrote:
I'm implementing a backend service that stores data in JSON format and I'd
like to provide a search operation in the service.
The data model is dynamic and will contain arbitrarily complex object
graphs.
How do I index object graphs with Solr?
Does the data need to be flattened before indexing?
Solr does have some *very* limited capability for doing joins between
indexes, but generally speaking, you need to flatten the data.
Apparently the service needs to deliver new data and updates to Solr,
but which one should be responsible for converting the data model to adhere
to Solr schema? The service or Solr?
Should the service deliver data to Solr in a form that adheres to Solr
schema or should Solr be extended to digest data provided by the service?
Solr's ability to change your data after receiving it is fairly limited.
The schema has some ability in this regard for indexed values, but the
stored data is 100% verbatim as Solr receives it. If you will be using
the dataimport handler, it does have some transform capability before
sending to Solr. Most of the time, the rule of thumb is that changing
the data on the Solr side will require contrib/custom plugins, so it may
be easier to do it before Solr receives it.
How does Solr handle dynamic data models?
Solr seems to support dynamic data models with the "dynamic fields" feature
in schemas.
How are data types inferred when using dynamic fields?
A wildcard field name is used, like "i_*" or "*_int" and that definition
includes the data type.
An alternative to using dynamic fields seems to be to change the schema
when the data model changes.
How easy is it to modify an existing schema?
Do I need to reindex all the data?
Can you do it online using an API?
Changing the schema is as simple as modifying schema.xml and reloading
the core or restarting Solr. An API for online schema changes is
coming, I don't know if it will be ready in time for 4.3 or if it will
get pushed back to 4.4. No matter how you make the change, the
following applies:
If you add fields, reindexing is not necessary, but existing documents
will not have the new fields until you do. If you change the query
analyzer chain, no reindex is required. If you change the index
analyzer chain or options that affect indexing, reindexing IS required.
Thanks,
Shawn