[google-appengine] Re: Expando and Index partitioning

Tim Hoffman Tue, 03 Nov 2009 18:22:23 -0800

HI Eli

Thats true there are many cases where you don't need composite
indexes, as per the documentation I provided a link to.
However the specific example you gave does.  (Now maybe you don't
actually plan to use the specific example you provided
but I don't have anything else to go on)


Now I did actually try it myself before posting \ and the specific
indexes I mentioned get created in index.yaml. And if you run the
dev_server with  --require_indexes
and the indexes in question are not present

You get

NeedIndexError: This query requires a composite index that is not
defined. You must update the index.yaml file in your application root.
This query needs this index:
meStats
  properties:
  - name: y2009
  - name: June

Rgds

T



On Nov 4, 5:37 am, Eli <[email protected]> wrote:
> I suggest you watch the IO talk where Brett Slatkin discusses Merge
> Joins and pre-computing ranges.
>
> http://www.youtube.com/watch?v=AgaL6NGpkB8
>
> Watch the last half (past 34 min).. and maybe pay attention to the
> section that's just after (41 minutes).
>
> This implies you do not need composite indexes (or to create any new
> indexes beyond the default ones) for all sorts of queries if you
> construct your data in the right way.
>
> I will test this out tonight to provide a proof of concept.
>
> On Nov 3, 10:12 am, Tim Hoffman <[email protected]> wrote:
>
> > Hi
>
> > On Nov 3, 10:26 pm, Eli Jones <[email protected]> wrote:
>
> > > I haven't done any testing on this yet since I'd have to fill up tens
> > > of gigs of information to see real live performance numbers.
>
> > > I'm hoping the implicit partitioning makes it so that one doesn't need
> > > manually created indexes (just thedefault ones.)
>
> > > The example I showed would be a schema for storing a daily int statistic.
>
> > > The 'June' column entries would show the day of that month and the
> > > 'y2009' column would have the 6 value since June is the 6th month of
> > > the year.
>
> > > If I wanted stats for June, my select would look like this:
>
> > > Select * From meStats Where y2009 = 6 AND June > 15
>
> > But the minute you do this ">" you will then need an index that looks
> > like
>
> > - kind: meStats
> >   properties:
> >   - name: y2009
> >   - name: June
>
> > and so on for every year month combination where you do a >
> > comparison.
>
> > I think you should have a read about how indexes are created and
> > accessed before you try optimising something that probably doesn't
> > need it.
>
> > Note the rules from defining index 
> > dochttp://code.google.com/appengine/docs/python/datastore/queriesandinde...
>
> > Other forms of queries require their indexes to be specified in
> > index.yaml, including:
>
> >     * queries with multiple sort orders
> >     * queries with a sort order on keys in descending order
> >     * queries with one or more inequality filters on a property and
> > one or more equality filters over other properties
> >     * queries with inequality filters and ancestor filters
>
> > You fall into the third rule. Which as I said eariler will mean you
> > need to manually specify in index.yaml a massive number of indexes
>
> > Rgds
>
> > T
>
> > > This would/should implicitly hit the june rows for 2009 and get the
> > > stats for every day after the 15th.
>
> > > You could munge around your column names and the values inserted to
> > > get different data reporting behaviour..
>
> > > The main, potential value is the implicit partitioning (where you
> > > don't need to manually define a bunch of schemas up front).
>
> > > On 11/3/09, Tim Hoffman <[email protected]> wrote:
>
> > > > Hi
>
> > > > Have you tried this?
>
> > > > For starters you can't assign values to numbers.
>
> > > > ie no matter what you do you can't assign 2009 = 'abc'
>
> > > > You would need to use some other identifier as you mentioned and then
> > > > specify something like
> > > > year_2009 = db.IntegerProperty(name=2009) or something similiar.
>
> > > > I also see a problem with this strategy with regard to index
> > > > definitions.
> > > > Whilst running the SDK the indexes will get created as you define data
> > > > however once you are running
> > > > in real google environment you will need to make sure you have already
> > > > defined all possible indexes that you
> > > > plan to use before you create any new data (or reindex everything),
> > > > which means indexes for all years you plan to hold data for and
> > > > search,
> > > > and months, and combinations of the two.
>
> > > > I am not sure this is a particularly good approach, but then I am not
> > > > sure I get what you are actually doing.
>
> > > > Have you compared the performance of lookups between the two
> > > > strategies, also remembering if you are actually interested in year/
> > > > month then you are
> > > > actually using composite indexes,  I wonder if you will ever use the
> > > > month only index (apart from comparing months with months for all
> > > > years in no particular order)
>
> > > > Rgds
>
> > > > T
>
> > > > On Nov 3, 12:22 am, Eli <[email protected]> wrote:
> > > >> Here's something I've been wondering about Expando.
>
> > > >> Say you define an Expando model like so:
>
> > > >> class meStats(db.Expando):
> > > >>     meNumber = db.IntegerProperty(required=True)
>
> > > >> And, then you begin populating it like so:
>
> > > >> meEntity1 = meStats(meNumber = 200,
> > > >>                                 June          = 14,
> > > >>                                 2009          = 6)
>
> > > >> meEntity.put()
>
> > > >> meEntity2 = meStats(meNumber = 381,
> > > >>                                 July           = 21,
> > > >>                                 2009          = 7)
>
> > > >> meEntity2.put()
>
> > > >> ..and so on.
>
> > > >> The "July" column only has indexes for entities that have "July"
> > > >> defined.. correct?  So, in effect, I am creating a partitioned index
> > > >> for a table that can grow indefinitely.. and each time I get to a new
> > > >> year/month combo, I am inserting into new indexes..? (instead of
> > > >> inserting into an ever increasing, monolithic "Month" column index..)
>
> > > >> Mainly, I'm packing the pertinent information into the column names
> > > >> and column values (instead of making the column name just some dummy
> > > >> value like "Month").. this allows me to implicitly create the
> > > >> partitioned table/index (I think of it as a partitioned index since it
> > > >> is, schematically [as far as I'm concerned], one table.)
>
> > > >> You could give the columns better names.. maybe "June_Day" and maybe
> > > >> "2009_Month" if you wanted...
>
> > > >> Does this make sense?  Have I misunderstood how Expando handles
> > > >> indexes?
>
> > > >> Another way to word this question would be:
>
> > > >> Is there a difference between the indexes created for the June and
> > > >> July entries in the above Expando model and the below Model models:
>
> > > >> class meJune09Stats(db.Model):
> > > >>     meNumber = db.IntegerProperty(required=True)
> > > >>     June = db.IntegerProperty(required=True)
> > > >>     2009 = db.IntegerProperty(required=True)
>
> > > >> class meJuly09Stats(db.Model):
> > > >>     meNumber = db.IntegerProperty(required=True)
> > > >>     July = db.IntegerProperty(required=True)
> > > >>     2009 = db.IntegerProperty(required=True)
>
> > > >> Thanks for any information.
>
> > > --
> > > Sent from my mobile device
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Expando and Index partitioning

Reply via email to