Re: Proposal on Custom Indexes - Google Summer of Code

Markus Holtermann Wed, 09 Mar 2016 22:17:33 -0800

Hi Akshesh,

thank you for your proposal! Sounds like a good plan.


On Thursday, March 10, 2016 at 8:16:10 AM UTC+11, akki wrote:

Once the index is added to a model, the model can inject itself into the 
>> index, or the schema editor can use the originating model to get data it 
>> needs. No need to pass the model into the index type I think. Regarding the 
>> name, I'm unsure if I prefer a name=Index() or an Index(name='') approach. 
>> It'd be good to see that choice in your proposal with a couple of pros and 
>> cons.
>>
>
> Well, since we are defining these indexes in a list (to be stored in 
> Meta.indexes) I would go with the Index(name='') approach.
>

I'd prefer the Index(name='') approach

class MyModel(models.Model):
    field1 = models.CharField()
    field2 = models.CharField()

    class Meta:
        indexes = [
            Index('field1', 'field2', name='some_index_across_two_columns'),
            'field2',  # Would be the same as Index('field2'), an index on 
this one column -- name derived from field name: field2_index
            Index(ToLower('field1')),  # A functional index on field1 -- 
name derived from field name and functions: field1_tolower_index
        ]

That way you should also be able to check for duplicate index names.
 

> 1) I don't think it's necessary to have an IndexTogether type. 
>> Index('field_a', 'field_b') should be sufficient I think. If it's not 
>> sufficient, call out why. I'm also in favour of translating any 
>> index_together definitions into appropriate Index types. Then SchemaEditor 
>> only needs to work with Index expressions and things remain simpler at the 
>> boundary between model definition and migrations.
>>
>
> My major concern was about the syntax. Would Index(['f1', 'f2']) mean two 
> indexes (for each field) or one multi-column index. Here we can do 
> something like Index(['f1', 'f2', ['f3', 'f4']]), each element of the list 
> corresponding to a new index.
>
Index(...) would create one index from my perspective. Thus 
Index('field_a', 'field_b') would do the same as 
index_together=(('field_a', 'field_b'),) these days.
Index(['f1', 'f2', ['f3', 'f4']]) looks way to confusing to me. That would 
probably be better Index('f1'), Index('f2'), Index('f3', 'f4')

Since we're talking about index_together, please keep unique_together in 
mind ;)

Cheers

/Markus
 

>
>
>> Of course, this is a great thing to have. But I wonder if we can do 
>> something to prevent pushing non-indexes to the indexes attribute of 
>> Meta. Having Meta.indexes = [CheckTypeConstraint(foo__in=['foo', 
>> 'bar'])] doesn't read correctly. 
>>
>
> Perhaps it would be best not to mix constraints and indexes. I would like 
> to know your thoughts on stopping to create indexes explicitly for unique 
> constraints because all database do that automatically because that only 
> gives a performance degradation as far as I know.
>
>
> On Wednesday, 9 March 2016 12:44:43 UTC+5:30, Anssi Kääriäinen wrote:
>>
>> If the CREATE INDEX part of the generated SQL isn't hard-coded 
>> somewhere outside the Index class, the exact same mechanism can be 
>> used to run other DDL commands, like ALTER TABLE <foobar> ADD 
>> CONSTRAINT check_type_choices CHECK (type IN ('foo', 'bar')); 
>>
>> Of course, this is a great thing to have. But I wonder if we can do 
>> something to prevent pushing non-indexes to the indexes attribute of 
>> Meta. Having Meta.indexes = [CheckTypeConstraint(foo__in=['foo', 
>> 'bar'])] doesn't read correctly. 
>>
>>  - Anssi 
>>
>> On Wed, Mar 9, 2016 at 4:20 AM, Josh Smeaton <josh.s...@gmail.com> 
>> wrote: 
>> > Hi Akshesh, 
>> > 
>> > The proposal looks really good so far. I haven't gone through it in 
>> depth 
>> > though, but I'll set aside some time to try and do so. A few comments: 
>> > 
>> > - Using the schema editor rather than sqlcompiler is the right choice. 
>> > 
>> > - The earlier part of your proposal seems to focus on allowing postgres 
>> > users to define more complex indexes that they previously haven't been 
>> able 
>> > to use. Oracle is then highlighted also for allowing functional 
>> indexes. I 
>> > think one of the biggest "why's" of this project is to allow users to 
>> define 
>> > custom indexes. It doesn't matter what kind of index. They get access 
>> to the 
>> > stuff that creates them. That should be highlighted before all else. 
>> > 
>> > - Should the type of index (BTree, Spatial, etc) be an index type in 
>> and of 
>> > itself, or should it be a property of a specific index? My knowledge 
>> here is 
>> > limited, so I'm not sure if there are certain constraints on which kind 
>> of 
>> > indexes can use spatial/btree/partial storage types. If certain index 
>> types 
>> > are very heavily restricted, then it probably makes sense to highlight 
>> them 
>> > (as you've done) as a top level class rather than an attribute of 
>> indexes in 
>> > general. If most indexes can be arranged with different storage types 
>> > though, it'd probably make more sense to expose that functionality as 
>> an 
>> > attribute of all indexes. 
>> > 
>> > - "Index() subclasses will also have a supports(backend) method which 
>> would 
>> > return True or False depending on whether the backend supports that 
>> index 
>> > type." Backends (operations.py) already supports 
>> > check_expression_support(self, expression). You can bake index support 
>> into 
>> > these methods from within each backend. Especially helpful for third 
>> party 
>> > backends, as they won't need to monkey patch the index classes. The 
>> only 
>> > issue I can see with reusing check_expression_support is the overhead 
>> of all 
>> > other expressions being checked. I don't think that's a big concern, 
>> but 
>> > it's worth being conscious of. 
>> > 
>> > - "Index classes will take 3 arguments fields, model and name - the 
>> later 
>> > two being optional.". Index classes should accept a list of 
>> expressions, not 
>> > just field names. This will allow trivial support of functional indexes 
>> > without a specific functional index class. You'll also benefit from 
>> existing 
>> > expression subclasses which process field names as F() expressions. F() 
>> > expressions may have to be changed (or a new concept introduced) which 
>> can 
>> > resolve the field, but doesn't try to do any joining. 
>> > 
>> > fullname_index=Index(ToLower('first_name'), ToLower('last_name')) -> 
>> create 
>> > index <table_name>_fullname_index on <table_name>(Lower(first_name), 
>> > Lower(last_name)); 
>> > 
>> > Once the index is added to a model, the model can inject itself into 
>> the 
>> > index, or the schema editor can use the originating model to get data 
>> it 
>> > needs. No need to pass the model into the index type I think. Regarding 
>> the 
>> > name, I'm unsure if I prefer a name=Index() or an Index(name='') 
>> approach. 
>> > It'd be good to see that choice in your proposal with a couple of pros 
>> and 
>> > cons. 
>> > 
>> > 
>> > Ok, onto the questions you actually asked. 
>> > 
>> > 1) I don't think it's necessary to have an IndexTogether type. 
>> > Index('field_a', 'field_b') should be sufficient I think. If it's not 
>> > sufficient, call out why. I'm also in favour of translating any 
>> > index_together definitions into appropriate Index types. Then 
>> SchemaEditor 
>> > only needs to work with Index expressions and things remain simpler at 
>> the 
>> > boundary between model definition and migrations. 
>> > 
>> > 2) Again I think it's a good idea to internally create index classes 
>> based 
>> > on older (unique=True) definitions. Unique may not have to have it's 
>> own 
>> > index type as it could (maybe) be a property of an Index. Again, I'll 
>> leave 
>> > that choice up to you, but it'd be good to see the pros/cons briefly 
>> > discussed. 
>> > 
>> > Again, really good proposal! 
>> > 
>> > On Wednesday, 9 March 2016 01:46:02 UTC+11, akki wrote: 
>> >> 
>> >> Hi 
>> >> 
>> >> My name is Akshesh Doshi (akki). I am a student at Indian Institute Of 
>> >> Technology, Roorkee (IITR). I have been contributing to Django for 
>> quite 
>> >> some time now and my experience has been really great till now. I 
>> found the 
>> >> community to be very welcoming and have learnt a lot in this period of 
>> time. 
>> >> 
>> >> With this spirit I would like to work on the idea of "Custom Indexes" 
>>  and 
>> >> extend it as my proposal for Google Summer of Code 2016. 
>> >> 
>> >> I have started preparing my proposal and here is the initial draft of 
>> it. 
>> >> I would like to hear your thoughts regarding this. Also I wanted to 
>> discuss 
>> >> some points mentioned below. The timeline still needs work as I am 
>> still 
>> >> digging into the code of expressions to see how we can use them in 
>> >> FunctionalIndex. Any pointers/thoughts on that would be appreciated. 
>> >> 
>> >> Key points: 
>> >>   - Introduction to classed based indexes. 
>> >>   - Support for custom classes to db_index. 
>> >>   - Introduction of Meta.indexes. 
>> >>   - Allowing fields to specify their own index type. 
>> >>   - Support for a indexes/constraints API. 
>> >>   - Extend expressions into indexes. 
>> >>   - Bring spatial indexes under the hood. 
>> >> 
>> >> Points I would like to discuss: 
>> >>  1) Would it be right to create a IndexTogether class (subclass of 
>> Index) 
>> >> and internally translate Meta.index_together to it ? 
>> >>       This would allow something like - 
>> >>           class Meta: 
>> >>               indexes = [IndexTogether(['field1', 'field2'])] 
>> >>     I think this would let us keep a better track of any indexes 
>> >> (including those by `index_together`) that are being created. 
>> >> `Meta.index_together` would be internally translated to Meta.indexes. 
>> This 
>> >> might also be followed by deprecation of `index_together` if we want. 
>> >> 
>> >>  2) Handling Unique constraints via indexes. 
>> >>       For handling of constraints, I was thinking of creating a 
>> >> UniqueIndex class which would handle any unique constraint on any 
>> column and 
>> >> allow other options like to make it deferrable. 
>> >>       Right now fields with ``unique=True`` apply the unique 
>> constraints 
>> >> by using both the UNIQUE constraint in the CREATE TABLE statement and 
>> by 
>> >> creating an index for it. For example, in this model, multiple 
>> (repeating) 
>> >> indexes are being generated for the `name` field (one by UNIQUE 
>> constraint, 
>> >> 2 others manually, on postgresql). This takes more space and is also 
>> not 
>> >> good performancewise. This situation can also be mitigated by keeping 
>> a 
>> >> track of all unique constraints at only one place. 
>> >>       So I was thinking of bringing the unique constraint totally 
>> under 
>> >> indexes. Maybe some `models.UniqueIndex()` could be used in 
>> meta.indexes to 
>> >> add constraints ? Any thoughts on it ? 
>> >> 
>> >> 
>> >> I had also prepared a DEP (based on an earlier DEP by Marc Tamlyn), 
>> which 
>> >> is now a subset of this proposal. 
>> >> 
>> >> Regards 
>> >> Akshesh Doshi 
>> >> (akki) 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups 
>> > "Django developers (Contributions to Django itself)" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an 
>> > email to django-develop...@googlegroups.com. 
>> > To post to this group, send email to django-d...@googlegroups.com. 
>> > Visit this group at https://groups.google.com/group/django-developers. 
>> > To view this discussion on the web visit 
>> > 
>> https://groups.google.com/d/msgid/django-developers/c2245bd3-b05d-435c-b8df-bbe652a5f63b%40googlegroups.com.
>>  
>>
>> > 
>> > For more options, visit https://groups.google.com/d/optout. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/d91913e3-ddc6-4539-9053-7b857c197c46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Proposal on Custom Indexes - Google Summer of Code

Reply via email to