Dynamic data model design questions
I'm implementing a backend service that stores data in JSON format and I'd like to provide a search operation in the service. The data model is dynamic and will contain arbitrarily complex object graphs. How do I index object graphs with Solr? Does the data need to be flattened before indexing? Apparently the service needs to deliver new data and updates to Solr, but which one should be responsible for converting the data model to adhere to Solr schema? The service or Solr? Should the service deliver data to Solr in a form that adheres to Solr schema or should Solr be extended to digest data provided by the service? How does Solr handle dynamic data models? Solr seems to support dynamic data models with the "dynamic fields" feature in schemas. How are data types inferred when using dynamic fields? An alternative to using dynamic fields seems to be to change the schema when the data model changes. How easy is it to modify an existing schema? Do I need to reindex all the data? Can you do it online using an API? I'm planning on using Solr 4.2. marko
Re: Dynamic data model design questions
Shawn Heisey wrote: > Solr does have some *very* limited capability for doing joins between indexes, but generally speaking, you need to flatten the data. thanks! So, using a dynamic schema I'd flatten the following JSON object graph { 'id':'xyz123', 'obj1': { 'child1': { 'prop1': ['val1', 'val2', 'val3'] 'prop2': 123 } 'prop3': 'val4' }, 'obj2': { 'child2': { 'prop3': true } } } to a Solr document something like this? { 'id':'xyz123', 'obj1/child1/prop1_ss': ['val1', 'val2', 'val3'], 'obj1/child1/prop2_i': 123, 'obj1/prop3_s': 'val4', 'obj2/child2/prop3_b': true } I'm using Java, so I'd probably push docs for indexing to Solr and do the searches using SolrJ, right? > Solr's ability to change your data after receiving it is fairly limited. The schema has some ability in this regard for indexed values, > but the stored data is 100% verbatim as Solr receives it. If you will be using the dataimport handler, it does have some transform > capability before sending to Solr. Most of the time, the rule of thumb is that changing the data on the Solr side will require > contrib/custom plugins, so it may be easier to do it before Solr receives it. The data import handler is a Solr server side feature and not a client side? Does Solr or SolrJ have any support for doing transformations on the client side? Doing the above transformation should be fairly straight forward, so it could be also done by code on the client side. marko
Re: Dynamic data model design questions
Jack Krupansky wrote: > In general, Solr is much more friendly towards static data models. Yes, you > can use dynamic fields, but use them in moderation. The more heavily you > lean on them, the more likely that you will eventually become unhappy with > Solr. Can you concrete examples of what kinds of issues should I expect to face when using a data model with only dynamic fields? We've requirements that quite explicitly direct us into using dynamic fields and I'd like to understand what kinds of problems we might end up having. > How many fields are we talking about here? The data model is designed to be dynamic, so the number is not fixed, but I'm expecting there'll be perhaps about 20-40 fields. > The trick with Solr is not to brute-force flatten your data model (as you > appear to be doing), but to REDESIGN your data model so that it is more > amenable to a flat data model, and takes advantage of Solr's features. You > can use multiple collections for different types of data. And you can > simulate joins across tables by doing a sequence of queries (although it > would be nice to have a SolrJ client-side method to do that in one API > call.) We're storing arbitrarily complex object graphs in a data store and want to use Solr for implementing search property field search. It may be difficult to use a flatter data model, but I'll consider this option as well. thanks! marko