Thanks Walter,
Existing media sets will rarely change but new media sets will be added
relatively frequently. (There is a many to many relationship between media
sets and media sources.) Given the size of data, a new Media Set that only
includes 1% of the collection would include 6 million rows.
A join may seem clean, but it will be slow and (currently) doesn't work in a
cluster.
You find all the sentences in a media set by searching for that set id and
requesting only the sentence_id (yes, you need that). Then you reindex them.
With small documents like this, it is probably fairly fas
We'd like to be able to easily update the media set to source mapping. I'm
concerned that if we store the media_sets_id in the sentence documents, it
will be very difficult to add additional media set to source mapping. I
imagine that adding a new media set would either require reimporting all
600
Denormalize. Add media_set_id to each sentence document. Done.
wunder
On Jul 29, 2013, at 7:58 AM, David Larochelle wrote:
> I'm setting up SolrCloud with around 600 million documents. The basic
> structure of each document is:
>
> stories_id: integer, media_id: integer, sentence: text_en
>
>