Day One at Google IO

May 29th, 2008  |  3 Comments

I’m at the Google IO Conference in San Francisco. It’s late at night…I’m still decompressing and trying to make sense of my notes from the day’s sessions.

I’ve put myself on a self-imposed “App Engine Only” track, so I haven’t really followed the news and announcements regarding Android, or OpenSocial, or Google Gears, or any other Google initiative.

Though they did show a way cool Android mobile phone demo in the Keynote. The phone had a built-in compass, so you could run the Google Maps application and look at the Street View, and then as you moved the phone, the Street View would sync to the new compass direction. In other words, as you moved the phone from North to South, the Street View photos would also pan from the North view to the South view. Spontaneous cheering from the crowd.

App Engine Talks

In any event, here are my takeaways from today:

  • Use Django. Don’t use the one that’s bundled with App Engine (v. 0.96). No, check out the latest development version of Django from subversion (currently 0.97). What’s interesting is that even though Google App Engine ships with the webapp framework and with Django 0.96, Guido van Rossum chose to devote his talk to how to install and use Django 0.97 on Google App Engine. I was curious what that meant for webapp, so I asked if there was ever a reason to use webapp instead of Django. Guido’s answer was that webapp was simple and enabled you to get started very quickly. But, Django was more powerful.

  • If you use the Datastore correctly, scaling comes for free. But, it takes a lot of work to use the datastore correctly. Further complicating the issue is that nobody knows exactly what “correctly” means yet.

  • Pay special attention to Datastore writes. “Reads are cheap! Writes are expensive!” Every write is a serialized transaction that hits the disk. Rule of thumb says you can’t do more than ~100 seeks/sec, so that’s the upper limit for write speed.

  • For high-contention write situations (like a counter), don’t write to a single, global counter class. Instead, use “sharded writes” where you write to several entities, and then sum the totals from all the shards to get the total count.

  • Entity Groups are for transactions. I did not understand the point of parent-child hierarchies for entities before today, but now I see that their only purpose is to group entities that need to be changed together as a part of a transaction.

  • The Datastore will not help you maintain data integrity. It’s entirely up to you, especially when you are updating or deleting entities with ReferenceProperties. Seeing the code examples today that showed how to handle deletes properly makes me appreciate how nice it is to have a relational DB handle this automatically.

  • Don’t use count(). Ever. This was stressed in several presentations. Not only can it not count past 1000, but it also requires a scan of every entity, thereby using way too much processing power.

  • You can’t use JOINs in queries, but you can utilize the foreign-key-like ReferenceProperty to associate entities and therefore, do JOIN-like queries on them. In Rafe Kaplan’s talk about “Working with Google App Engine Models”, he walked through one way to model one-to-many and many-to-many relationships. I found this particularly interesting, in that it demonstrates how to have a relatively normalized data model, directly contradicting the emerging idea that the best way to take advantage of the Datastore is aggressive de-normalization.

  • The index.yaml file defines the composite keys needed for complex queries. All queries rely on the Indexes, which are separate BigTable tables. If a property value isn’t indexed, it can’t be found by a query. Now, that’s not a problem for querying by kind or single property values, since indexes are created for these automatically, but it is relevant for complex queries, since composite keys are not created automatically. (The dev server automatically updates the index.yaml file as complex queries are run in the dev environment, so as long as you test every query on dev, you’ll be fine.)

Several themes came up over and over in the Q&As that followed the talks:

  • Maintaining Data Integrity, Especially over Time. How do I migrate data models? How do I delete columns, or rename columns? How do I rename classes? How can I make changes and not break data integrity? Nobody’s figured out good answers to these questions yet.

  • Bulk Data Operations. How can I import and export large data sets? How can I bulk delete? The App Engine team stressed that they were well aware of this limitation.

  • Full-Text Search. How can I do full-text search over my data? Shockingly, developers uniformly expect Google to be really good at full-text search…imagine that! Sadly, it’s not there yet, though Google engineers mentioned that there’s “currently a hack in place that kind of works”, and that they would provide this feature sometime in the future.

In his presentation on how the Datastore works under the hood, Ryan Barrett offhandedly mentioned the three big priorities for the Google App Engine Team right now:

  1. Data Import/Export
  2. Additional languages
  3. Billing

All in all, the Google App Engine talks were fascinating. Using Datastore is so new, and it’s such a departure from using a relational DB, that it’s exciting to see everyone try to figure out how to best utilize it. Though, one thing I was struck by was how tentative some of the suggestions were from the Google engineers. I had expected that since they have several years of experience building applications on top of BigTable, that they would have created more definitive best practices for how to handle common data models and for maintaining data integrity. Perhaps it’s so new that even Google is still figuring out how best to work with it.

Responses

  1. Benjamin Burke says:

    May 30th, 2008 at 5:52 pm (#)

    I was at a couple of the App Engine sessions. BigTable is going to be the hardest transition for me. It’s just a total paradigm shift from relational databases and the way that I write code to utilize those databases. I missed the session where they discussed setting up django 0.97 and I’m finding that task a little cumbersome at the moment.

  2. Tom Offermann says:

    May 30th, 2008 at 6:26 pm (#)

    Be sure to use the Google App Engine Django Helper. That was the recommended way to get the latest development version of Django up and running on App Engine.

  3. rob hawkins says:

    May 31st, 2008 at 8:35 am (#)

    Nice post. Very informative.

    Just wanted to say thank you.