Re: Solr Indexing Patterns

Jonathan Rochkind Mon, 06 Jun 2011 12:56:43 -0700

This is a start, for many common best practices:

http://wiki.apache.org/solr/SolrRelevancyFAQ

Many of the questions in there have an answer that involvesde-normalizing. As an example. It may be that even if your specificproblem isn't in there, I myself anyway found reading through theregave me a general sense of common patterns in Solr.

( It's certainly true that some things are hard to do in Solr. It turnsout that an RDBMS is a remarkably flexible thing -- but when it doesn'tdo something you need well, and you turn to a specialized tool insteadlike Solr, you certainly give up some things

One of the biggest areas of limitation involves hieararchical orrelationship data, definitely. There are a variety of features, somemore fully baked than others, some not yet in a Solr release, meant toprovide tools to get at different aspects of this. Including "pivotfacetting", "join" (https://issues.apache.org/jira/browse/SOLR-2272),and field-collapsing. Each, IMO, is trying to deal with differentaspects of dealing with hieararchical or multi-class data, or data thatis entities with relationships. ).


On 6/6/2011 3:43 PM, Judioo wrote:

I do think that Solr would be better served if there was a *best practice
section *of the site.

Looking at the majority of emails to this list they resolve around "how do I
do X?".

Seems like tutorials with real world examples would serve Solr no end of
good.

I still do not have an example of the best method to approach my problem,
although Erick has  help me understand the limitations of Solr.

Just thought I'd say.






On 6 June 2011 20:26, Judioo<cont...@judioo.com>  wrote:

Thanks


On 6 June 2011 19:32, Erick Erickson<erickerick...@gmail.com>  wrote:

#Everybody# (including me) who has any RDBMS background
doesn't want to flatten data, but that's usually the way to go in
Solr.

Part of whether it's a good idea or not depends on how big the index
gets, and unfortunately the only way to figure that out is to test.

But that's the first approach I'd try.

Good luck!
Erick

On Mon, Jun 6, 2011 at 11:42 AM, Judioo<cont...@judioo.com>  wrote:

On 5 June 2011 14:42, Erick Erickson<erickerick...@gmail.com>  wrote:

See: http://wiki.apache.org/solr/SchemaXml

By adding ' "multiValued="true" ' to the field, you can add
the same field multiple times in a doc, something like

<add>
<doc>
  <field name="mv">value1</field>
  <field name="mv">value2</field>
</doc>
</add>

I can't see how that would work as one would need to associate the

right

start / end dates and price.
As I understand using multivalued and thus flattening the  discounts

would

result in:

{
    "name":"The Book",
    "price":"$9.99",
    "price":"$3.00",
    "price":"$4.00",    "synopsis":"thanksgiving special",
    "starts":"11-24-2011",
    "starts":"10-10-2011",
    "ends":"11-25-2011",
    "ends":"10-11-2011",
    "synopsis":"Canadian thanksgiving special",
  },

How does one differentiate the different offers?

But there's no real ability  in Solr to store "sub documents",
so you'd have to get creative in how you encoded the discounts...

This is what I'm asking :)
What is the best / recommended / known patterns for doing this?

But I suspect a better approach would be to store each discount as
a separate document. If you're in the trunk version, you could then
group results by, say, ISBN and get responses grouped together...

This is an option but seems sub optimal. So say I store the discounts in
multiple documents with ISDN as an attribute and also store the title

again

with ISDN as an attribute.

To get
"all books currently discounted"

requires 2 request

* get all discounts currently active
* get all books  using ISDN retrieved from above search

Not that bad. However what happens when I want
"all books that are currently on discount in the "horror" genre

containing

the word 'elm' in the title."

The only way I can see in catering for the above search is to duplicate

all

searchable fields in my "book" document in my "discount" document.

Coming

from a RDBM background this seems wrong.

Is this the correct approach to take?

Best
Erick

On Sat, Jun 4, 2011 at 1:42 AM, Judioo<cont...@judioo.com>  wrote:

Hi,
Discounts can change daily. Also there can be a lot of them (over

time

and

in a given time period ).

Could you give an example of what you mean buy multi-valuing the

field.

Thanks

On 3 June 2011 14:29, Erick Erickson<erickerick...@gmail.com>

wrote:

How often are the discounts changed? Because you can simply
re-index the book information with a multiValued "discounts" field
and get something similar to your example (&wt=json)....


Best
Erick

On Fri, Jun 3, 2011 at 8:38 AM, Judioo<cont...@judioo.com>  wrote:

What is the "best practice" method to index the following in Solr:

I'm attempting to use solr for a book store site.

Each book will have a price but on occasions this will be

discounted.

The

discounted price exists for a defined time period but there may be

many

discount periods. Each discount will have a brief synopsis, start

and

end

time.

A subset of the desired output would be as follows:

.......
"response":{"numFound":1,"start":0,"docs":[
  {
    "name":"The Book",
    "price":"$9.99",
    "discounts":[
        {
         "price":"$3.00",
         "synopsis":"thanksgiving special",
         "starts":"11-24-2011",
         "ends":"11-25-2011",
        },
        {
         "price":"$4.00",
         "synopsis":"Canadian thanksgiving special",
         "starts":"10-10-2011",
         "ends":"10-11-2011",
        },
     ]
  },
  .........

A requirement is to be able to search for just discounted

publications. I

think I could use date faceting for this ( return publications

that

are

within a discount window ). When a discount search is performed no
publications that are not currently discounted will be returned.

My question are:

   - Does solr support this type of sub documents

In the above example the discounts are the sub documents. I know

solr

is

not

a relational DB but I would like to store and index the above

representation

in a single document if possible.

   - what is the best method to approach the above

I can see in many examples the authors tend to denormalize to

solve

similar

problems. This suggest that for each discount I am required to

duplicate

the

book data or form a document
association<

http://stackoverflow.com/questions/2689399/solr-associations

.
Which method would you advise?

It would be nice if solr could return a response structured as

above.

Much Thanks

Re: Solr Indexing Patterns

Reply via email to