Re: Solr Indexing Patterns

Judioo Mon, 06 Jun 2011 12:27:28 -0700

Thanks

On 6 June 2011 19:32, Erick Erickson <erickerick...@gmail.com> wrote:


> #Everybody# (including me) who has any RDBMS background
> doesn't want to flatten data, but that's usually the way to go in
> Solr.
>
> Part of whether it's a good idea or not depends on how big the index
> gets, and unfortunately the only way to figure that out is to test.
>
> But that's the first approach I'd try.
>
> Good luck!
> Erick
>
> On Mon, Jun 6, 2011 at 11:42 AM, Judioo <cont...@judioo.com> wrote:
> > On 5 June 2011 14:42, Erick Erickson <erickerick...@gmail.com> wrote:
> >
> >> See: http://wiki.apache.org/solr/SchemaXml
> >>
> >> By adding ' "multiValued="true" ' to the field, you can add
> >> the same field multiple times in a doc, something like
> >>
> >> <add>
> >> <doc>
> >>  <field name="mv">value1</field>
> >>  <field name="mv">value2</field>
> >> </doc>
> >> </add>
> >>
> >> I can't see how that would work as one would need to associate the right
> > start / end dates and price.
> > As I understand using multivalued and thus flattening the  discounts
> would
> > result in:
> >
> > {
> >    "name":"The Book",
> >    "price":"$9.99",
> >    "price":"$3.00",
> >    "price":"$4.00",    "synopsis":"thanksgiving special",
> >    "starts":"11-24-2011",
> >    "starts":"10-10-2011",
> >    "ends":"11-25-2011",
> >    "ends":"10-11-2011",
> >    "synopsis":"Canadian thanksgiving special",
> >  },
> >
> > How does one differentiate the different offers?
> >
> >
> >
> >> But there's no real ability  in Solr to store "sub documents",
> >> so you'd have to get creative in how you encoded the discounts...
> >>
> >
> > This is what I'm asking :)
> > What is the best / recommended / known patterns for doing this?
> >
> >
> >
> >>
> >> But I suspect a better approach would be to store each discount as
> >> a separate document. If you're in the trunk version, you could then
> >> group results by, say, ISBN and get responses grouped together...
> >>
> >
> > This is an option but seems sub optimal. So say I store the discounts in
> > multiple documents with ISDN as an attribute and also store the title
> again
> > with ISDN as an attribute.
> >
> > To get
> > "all books currently discounted"
> >
> > requires 2 request
> >
> > * get all discounts currently active
> > * get all books  using ISDN retrieved from above search
> >
> > Not that bad. However what happens when I want
> > "all books that are currently on discount in the "horror" genre
> containing
> > the word 'elm' in the title."
> >
> > The only way I can see in catering for the above search is to duplicate
> all
> > searchable fields in my "book" document in my "discount" document. Coming
> > from a RDBM background this seems wrong.
> >
> > Is this the correct approach to take?
> >
> >
> >
> >>
> >> Best
> >> Erick
> >>
> >> On Sat, Jun 4, 2011 at 1:42 AM, Judioo <cont...@judioo.com> wrote:
> >> > Hi,
> >> > Discounts can change daily. Also there can be a lot of them (over time
> >> and
> >> > in a given time period ).
> >> >
> >> > Could you give an example of what you mean buy multi-valuing the
> field.
> >> >
> >> > Thanks
> >> >
> >> > On 3 June 2011 14:29, Erick Erickson <erickerick...@gmail.com> wrote:
> >> >
> >> >> How often are the discounts changed? Because you can simply
> >> >> re-index the book information with a multiValued "discounts" field
> >> >> and get something similar to your example (&wt=json)....
> >> >>
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >> On Fri, Jun 3, 2011 at 8:38 AM, Judioo <cont...@judioo.com> wrote:
> >> >> > What is the "best practice" method to index the following in Solr:
> >> >> >
> >> >> > I'm attempting to use solr for a book store site.
> >> >> >
> >> >> > Each book will have a price but on occasions this will be
> discounted.
> >> The
> >> >> > discounted price exists for a defined time period but there may be
> >> many
> >> >> > discount periods. Each discount will have a brief synopsis, start
> and
> >> end
> >> >> > time.
> >> >> >
> >> >> > A subset of the desired output would be as follows:
> >> >> >
> >> >> > .......
> >> >> > "response":{"numFound":1,"start":0,"docs":[
> >> >> >  {
> >> >> >    "name":"The Book",
> >> >> >    "price":"$9.99",
> >> >> >    "discounts":[
> >> >> >        {
> >> >> >         "price":"$3.00",
> >> >> >         "synopsis":"thanksgiving special",
> >> >> >         "starts":"11-24-2011",
> >> >> >         "ends":"11-25-2011",
> >> >> >        },
> >> >> >        {
> >> >> >         "price":"$4.00",
> >> >> >         "synopsis":"Canadian thanksgiving special",
> >> >> >         "starts":"10-10-2011",
> >> >> >         "ends":"10-11-2011",
> >> >> >        },
> >> >> >     ]
> >> >> >  },
> >> >> >  .........
> >> >> >
> >> >> > A requirement is to be able to search for just discounted
> >> publications. I
> >> >> > think I could use date faceting for this ( return publications that
> >> are
> >> >> > within a discount window ). When a discount search is performed no
> >> >> > publications that are not currently discounted will be returned.
> >> >> >
> >> >> > My question are:
> >> >> >
> >> >> >   - Does solr support this type of sub documents
> >> >> >
> >> >> > In the above example the discounts are the sub documents. I know
> solr
> >> is
> >> >> not
> >> >> > a relational DB but I would like to store and index the above
> >> >> representation
> >> >> > in a single document if possible.
> >> >> >
> >> >> >   - what is the best method to approach the above
> >> >> >
> >> >> > I can see in many examples the authors tend to denormalize to solve
> >> >> similar
> >> >> > problems. This suggest that for each discount I am required to
> >> duplicate
> >> >> the
> >> >> > book data or form a document
> >> >> > association<
> >> http://stackoverflow.com/questions/2689399/solr-associations
> >> >> >.
> >> >> > Which method would you advise?
> >> >> >
> >> >> > It would be nice if solr could return a response structured as
> above.
> >> >> >
> >> >> > Much Thanks
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Solr Indexing Patterns

Reply via email to