Re: Index relational database

Erick Erickson Thu, 31 Aug 2017 08:05:44 -0700

To pile on here: When you denormalize you also get some functionality
that you do not get with Solr joins, they've been called "pseudo
joins" in Solr for a reason.


If you just use the simple approach of indexing the two tables then
joining across them you can't return fields from both tables in a
single document. To do that you need to use parent/child docs which
has its own restrictions.

So rather than worry excessively about which is faster, I'd recommend
you decide on the functionality you need as a starting point.

Best,
Erick

On Thu, Aug 31, 2017 at 7:34 AM, Walter Underwood <wun...@wunderwood.org> wrote:
> There is no way tell which is faster without trying it.
>
> Query speed depends on the size of the data (rows), the complexity of the 
> join, which database, what kind of disk, etc.
>
> Solr speed depends on the size of the documents, the complexity of your 
> analysis chains, what kind of disk, how much CPU is available, etc.
>
> We have one query that extracts 9 million documents from MySQL in about 20 
> minutes. We have another query on a different MySQL database that takes 90 
> minutes to get 7 million documents.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Aug 31, 2017, at 12:54 AM, Renuka Srishti <renuka.srisht...@gmail.com> 
>> wrote:
>>
>> Thanks Erick, Walter
>> But I think join query will reduce the performance. Denormalization will be
>> the better way than join query, am I right?
>>
>>
>>
>> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>>
>>> Think about making a denormalized view, with all the fields needed in one
>>> table. That view gets sent to Solr. Each row is a Solr document.
>>>
>>> It could be implemented as a view or as SQL, but that is a useful mental
>>> model for people starting from a relational background.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Aug 30, 2017, at 9:14 AM, Erick Erickson <erickerick...@gmail.com>
>>> wrote:
>>>>
>>>> First, it's often best, by far, to denormalize the data in your solr
>>> index,
>>>> that's what I'd explore first.
>>>>
>>>> If you can't do that, the join query parser might work for you.
>>>>
>>>> On Aug 30, 2017 4:49 AM, "Renuka Srishti" <renuka.srisht...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Susheel for your response.
>>>>> Here is the scenario about which I am talking:
>>>>>
>>>>>  - Let suppose there are two documents doc1 and doc2.
>>>>>  - I want to fetch the data from doc2 on the basis of doc1 fields which
>>>>>  are related to doc2.
>>>>>
>>>>> How to achieve this efficiently.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Renuka Srishti
>>>>>
>>>>>
>>>>> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar <susheel2...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Renuka,
>>>>>>
>>>>>> I would suggest to start with your use case(s). May be start with your
>>>>>> first use case with the below questions
>>>>>>
>>>>>> a) What is that you want to search (which fields like name, desc, city
>>>>>> etc.)
>>>>>> b) What is that you want to show part of search result (name, city
>>> etc.)
>>>>>>
>>>>>> Based on above two questions, you would know what data to pull in from
>>>>>> relational database and create solr schema and index the data.
>>>>>>
>>>>>> You may first try to denormalize / flatten the structure so that you
>>> deal
>>>>>> with one collection/schema and query upon it.
>>>>>>
>>>>>> HTH.
>>>>>>
>>>>>> Thanks,
>>>>>> Susheel
>>>>>>
>>>>>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
>>>>>> renuka.srisht...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hii,
>>>>>>>
>>>>>>> What is the best way to index relational database, and how it impacts
>>>>> on
>>>>>>> the performance?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Renuka Srishti
>>>>>>>
>>>>>>
>>>>>
>>>
>>>
>

Re: Index relational database

Reply via email to