Thanks a lot. Appreciate the pointers about indexing engine vs database . I ended up using concatenating the fields to generate a key.
On Wed, Apr 24, 2019 at 11:39 AM David Hastings < hastings.recurs...@gmail.com> wrote: > another thing to consider doing is just merge the two fields into the id > value: > "id": "USER_RECORD_12334", > since its a string. > > > > On Wed, Apr 24, 2019 at 2:35 PM Gus Heck <gus.h...@gmail.com> wrote: > > > Hi Vivek > > > > Solr is not a database, nor should one try to use it as such. You'll need > > to adjust your thinking some in order to make good use of Solr. In Solr > > there is normally an id field and it should be unique across EVERY > document > > in the entire collection. Thus there's no concept of a primary key, > because > > there are no tables. In some situations (streaming expressions for > example) > > you might want to use collections like tables, creating a collection per > > data type, but there's no way to define uniqueness in terms of more than > > one field within a collection. If your data comes from a database with > > complex keys, concatenating the values to form the single unique ID is a > > possibility. If you form keys that way of course you also want to retain > > the values as individual fields. This duplication might seem odd from a > > database perspective where one often works hard to normalize data, but > for > > search, denormalization is very common. The focus with search engines is > > usually speed of retrieval rather than data correctness. Solr should > serve > > as an index into some other canonical source of truth for your data, and > > that source of truth should be in charge of guaranteeing data > correctness. > > > > Another alternative is to provide a field that denotes the type (table) > for > > the document (such as id_type in your example). In that case, all queries > > looking for a specific object type as a result should add a filter (fq > > parameter) to denote the "table" and you may want to store a db_id field > to > > correlate the data with a database if that's where it came from. When > using > > the field/filter strategy you tend to inflate the number of fields in the > > index with some fields being sparsely populated and this can have some > > performance implications, and furthermore if one "table" gets updated > > frequently you wind up interfering with the caching for all data due to > > frequent opening of new searchers. On the plus side such a strategy makes > > it easier to query across multiple types simultaneously, so these > > considerations should be balanced against your usage patterns, > performance > > needs, ease of management and ease of programming. > > > > Best, > > Gus > > > > On Fri, Apr 19, 2019 at 2:10 PM Vivekanand Sahay > > <vsa...@walmartlabs.com.invalid> wrote: > > > > > Hello, > > > > > > I have a use case like below. > > > > > > USE CASE > > > I have a document with fields like > > > > > > Id, > > > Id_type, > > > Field_1. > > > Filed_2 > > > > > > 2 sample messages will look like > > > > > > { > > > "id": "12334", > > > "id_type": "USER_RECORD", > > > "field_1": null, > > > "field_2": null > > > } > > > > > > > > > { > > > "id": "31321", > > > "id_type": "OWNER_RECORD", > > > "field_1": null, > > > "field_2": null > > > } > > > > > > > > > QUESTIONS > > > > > > I’d like to define the unique key as a compound key from fields id and > > > id_type > > > > > > 1. Could someone give me an example of how to do this ? Or point to > > the > > > relevant section in the docs? > > > 2. Is this the best way to define a compound primary key ? Is there > a > > > more efficient way ? > > > > > > Regards, > > > Vivek > > > > > > > > > -- > > http://www.the111shift.com > > >