I think we can safely assume that the programmer was trying to speed up things a little by writing 12 thousand objects in a single operation.
Now if that gets to be faster or slower than writing each object separately, it is a matter of the internal implementation of the data store. I prefer to do no hacks, but OTOH it is better sometimes to be clear bout what you want (API wise). The point here is that the programmer wants to insert 15 thousand objects in a second, you seem to imply that is possible. "While it's an interesting thought exercise to see if BigTable can do it through App Engine's interface (hint: it can, globally, easily)". I rest my case ;-) Do we need to do anything to test that? Is there anything we could do to help? Cheers, Guillermo. On 24 feb, 18:06, "Ikai L (Google)" <[email protected]> wrote: > Simple key-only writes can definitely do it, but there's a few places where > you can introduce overhead: > > - serialization > - network I/O > - indexes > > My point wasn't necessarily that it wasn't possible. makePersistentAll does > use a batch write, and there are definitely sites that can do 12,000+ writes > a second (and well above that), but I don't know of any that will attempt to > do that in a single request. While it's an interesting thought exercise to > see if BigTable can do it through App Engine's interface (hint: it can, > globally, easily), I can't think of a single use case for a site to need to > do this all the time and with the sub-second requirement. I think it's > reasonable to ask why this design exists and why the requirements exist and > rethink one or the other. > > On Wed, Feb 24, 2010 at 12:35 PM, Guillermo Schwarz < > > > > [email protected]> wrote: > > Ikai, > > > Maybe you are right. Maybe not. I'm not an expert in datastore > > internals, but here is my point of view. > > > This paper claims that Berkeley DB Java edition can insert about > > 15,000 records per second. > > >http://www.oracle.com/database/docs/bdb-je-architecture-whitepaper.pdf > > > The graphic is on page 22. The main reason they claim to be able to do > > that is that they don't need to actually sync the write to disk, they > > can queue the write, update in-memory data and write a log file. > > Writing the log file is for transactional purposes and it is the only > > write really needed.That is pretty fast. > > > Cheers, > > Guillermo. > > > On 24 feb, 16:51, "Ikai L (Google)" <[email protected]> wrote: > > > I also remember hearing (and this is not verified so don't quote me on > > this > > > or come after me if I'm wrong) from a friend of mine running KV stores in > > > production that there were issues with certain distributed key/value > > stores > > > that actually managed to slow down as a function of the number of objects > > in > > > the store - and Tokyo Tyrant was on his list. A key property of scalable > > > stores is that the opposite of this is true. > > > > 12,000 synchronous, serialized writes in a single sub-second request is > > > pretty serious. I am not aware of a single website in the world that does > > > this. > > > > On Wed, Feb 24, 2010 at 11:35 AM, Jeff Schnitzer <[email protected] > > >wrote: > > > > > I think this is actually an interesting question, and brings up a > > > > discussion worth having: > > > > > Is datastore performance reasonable? > > > > > I don't want to make this a discussion of reliability, which is a > > > > separate issue. It just seems to me that the datastore is actually > > > > kinda pokey, taking seconds to write a few hundred entities. When > > > > people benchmark Tokyo Tyrant, I hear numbers thrown around like > > > > 22,000 writes/second sustained across 1M records: > > > > >http://blog.hunch.se/2009/02/28-tokyo-cabinet > > > > > You might argue that the theoretical scalability of BigTable's > > > > distributed store is higher... but we're talking about two full orders > > > > of magnitude difference. Will I ever near the 100-google-server > > > > equivalent load? Could I pay for it if I did? 100 CPUs (measured) > > > > running for 1 month is about $7,200. Actual CPU speed is at least > > > > twice the measured rate, so a single Tokyo Tyrant is theoretically > > > > equivalent to almost $15,000/month of appengine hosting. Ouch. > > > > > Maybe this isn't an apples to apples comparison. Sure, there aren't > > > > extra indexes on those Tyrant entities... but to be honest, few of my > > > > entities have extra indexes. What other factors could change this > > > > analysis? > > > > > Thoughts? > > > > > BTW Tim, you may very well have quite a few indexes on your entities. > > > > In JDO, nearly all single fields are indexed by default. You must > > > > explicitly add an annotation to your fields to make them unindexed. > > > > With Objectify, you can declare your entity as @Indexed or @Unindexed > > > > and then use the same annotation on individual fields to override the > > > > default. > > > > > Jeff > > > > > On Wed, Feb 24, 2010 at 12:43 AM, Tim Cooper <[email protected]> wrote: > > > > > I have been trying to write 12,000 objects in a single page request. > > > > > These objects are all very small and the total amount of memory is > > not > > > > > large. There is no index on these objects - the only GQL queries I > > > > > make on them are based on the primary key. > > > > > > Ikai has said: "That is - if you have to delete or create 150 > > > > > persistent, indexed objects, you may want to rethink what problems > > you > > > > > are trying to solve." > > > > > > So I have been thinking about the problems I'm trying to solve, > > > > > including looking at the BuddyPoke blog and reading the GAE > > > > > documentation. I'm trying to populate the database with entries > > > > > relating to high school timetables. > > > > > > * I could do the writes asynchronously, but that looks like a lot of > > > > > additional effort. On my C++ app, writing the same information to my > > > > > laptop drive, this happens in under a second, because the amount of > > > > > data is actually quite small, but it times out on GAE. > > > > > * I am using pm.makePersistentAll(), but this doesn't help. > > > > > * There is no index on the objects - I access them only through the > > > > > primary key. (I'm pretty sure there's no index - but how can I > > > > > confirm this via the development server dashboard?) > > > > > * The objects constitute 12,000 entity groups. I could merge them > > > > > into fewer entity groups, but there's no natural groupings I could > > > > > use, so it could get quite complex to introduce a contrived grouping, > > > > > and also this would complicate the multi-user updating of the > > objects. > > > > > The AppEngine team seem to generally recommend using more entity > > > > > groups, but it's difficult to integrate that advice with the contrary > > > > > advice to use fewer entity groups for acceptable performance. > > > > > * I'd be happy if the GAE database was < 10 times slower than a > > > > > non-cloud RDBMS, but the way I'm using it, it's currently not. > > > > > > Does anyone have any advice? > > > > > > -- > > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Google App Engine for Java" group. > > > > > To post to this group, send email to > > > > [email protected]. > > > > > To unsubscribe from this group, send email to > > > > [email protected]<google-appengine-java%[email protected]> > > <google-appengine-java%[email protected]<google-appengine-java%[email protected]> > > > > > . > > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine-java?hl=en. > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Google App Engine for Java" group. > > > > To post to this group, send email to > > > > [email protected]. > > > > To unsubscribe from this group, send email to > > > > [email protected]<google-appengine-java%[email protected]> > > <google-appengine-java%[email protected]<google-appengine-java%[email protected]> > > > > > . > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine-java?hl=en. > > > > -- > > > Ikai Lan > > > Developer Programs Engineer, Google App Enginehttp:// > > googleappengine.blogspot.com|http://twitter.com/app_engine > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine for Java" group. > > To post to this group, send email to > > [email protected]. > > To unsubscribe from this group, send email to > > [email protected]<google-appengine-java%[email protected]> > > . > > For more options, visit this group at > >http://groups.google.com/group/google-appengine-java?hl=en. > > -- > Ikai Lan > Developer Programs Engineer, Google App > Enginehttp://googleappengine.blogspot.com|http://twitter.com/app_engine -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
