Re: schemaless slow indexing

Alexandre Rafalovitch Mon, 23 Mar 2015 10:56:46 -0700

I looked at SOLR-7290, but I think the discussion should stay on the
mailing list for at least one more iteration.


My understanding that the reason copyField exists is so that a search
actually worked out of the box. Without knowing the field names, one
cannot say what to search. So, the copyField to a general field and
search that is a classic strategy. Though usually it is not with a
*match all* wildcard. But for schemaless, *match all* is all we get as
we don't even have prefix/suffix strategies to rely on.

So, saying *remove* without offering an alternative way to achieve
easy search is not - to me - a terribly useful contribution for a
default setup.

Regards,
    Alex.
P.s. As to the field renaming, I have no opinion. It would be nice if
somebody checked the consistency now that a couple more special names
were introduced with smart JSON parsing.

----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 22 March 2015 at 20:32, Erick Erickson <erickerick...@gmail.com> wrote:
> I think you mean https://issues.apache.org/jira/browse/SOLR-7290?
>
> Erick
>
> On Sun, Mar 22, 2015 at 2:30 PM, Mike Murphy <mmurphy3...@gmail.com> wrote:
>> That's it!
>> I hand edited the file that says you are not supposed to edit it and
>> removed that copyField.
>> Indexing performance is now back to expected levels.
>>
>> I created an issue for this, https://issues.apache.org/jira/browse/SOLR-7284
>>
>> --Mike
>>
>> On Sun, Mar 22, 2015 at 3:29 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>>> I took a quick look at the stock schemaless configs... unfortunately
>>> they contain a performance trap.
>>> There's a copyField by default that copies *all* fields to a catch-all
>>> field called "_text".
>>>
>>> IMO, that's not a great default.  Double the index size (well, the
>>> "index" portion of it at least... not stored fields), and slower
>>> indexing performance.
>>>
>>> The other unfortunate thing is the name.  No where else in solr (that
>>> I know of) do we have a single underscore field name.  _text looks
>>> more like a dynamicField pattern.  Our other fields with underscores
>>> look like _version_ and _root_.  If we're going to start a new naming
>>> convention (or expand the naming conventions) we need to have some
>>> consistency and logic behind it.
>>>
>>> -Yonik
>>>
>>> On Sun, Mar 22, 2015 at 12:32 PM, Mike Murphy <mmurphy3...@gmail.com> wrote:
>>>> I start up solr schemaless and index a bunch of data, and it takes a
>>>> lot longer to finish indexing.
>>>> No configuration changes, just straight schemaless.
>>>>
>>>> --Mike
>>>>
>>>> On Sun, Mar 22, 2015 at 12:27 PM, Erick Erickson
>>>> <erickerick...@gmail.com> wrote:
>>>>> Please review: http://wiki.apache.org/solr/UsingMailingLists
>>>>>
>>>>> You haven't quantified the slowdown. Or given any details on how
>>>>> you're measuring the "slowdown". Or how you've configured your setups
>>>>> in 4.10 and 5.0. Or... Ad Hossman would say "details matter".
>>>>>
>>>>> Best,
>>>>> Erick
>>>>>
>>>>> On Sun, Mar 22, 2015 at 8:35 AM, Mike Murphy <mmurphy3...@gmail.com> 
>>>>> wrote:
>>>>>> I'm trying out schemaless in solr 5.0, but the indexing seems quite a
>>>>>> bit slower than it did in the past on 4.10.  Any pointers?
>>>>>>
>>>>>> --Mike

Re: schemaless slow indexing

Reply via email to