Skip to main content

Python: Google App Engine

Here are some things I found interesting, deep, or surprising while reading the documentation for Google App Engine:

Being able to easily roll back to a previous version of your application is a really nice feature.

webapp uses "self.request" instead of a magic global named "request" like Pylons. (Note, magic globals act as proxies for the actual request object which is stored in thread local storage.)

I'm a little confused by:
 run_wsgi_app() is similar to the WSGI-to-CGI adaptor provided by the
wsgiref module in the Python standard library, but includes a few
additional features.
Does that mean GoogleAppEngineLauncher's default template should be updated since it doesn't use run_wsgi_app()?

GQL is meant to be familiar for SQL users, but it knows what you mean when you say:
 greetings = Greeting.gql("ORDER BY date DESC LIMIT 10")
Bret told me:
A good rule of thumb for whether you need transactions is whether you have a set that is based on a get and they essentially have to be atomic.
Concerning keys:
Each entity also has a key that uniquely identifies the entity. The simplest key has a kind and a unique numeric ID provided by the datastore. The ID can also be a string provided by the application.
Use "dev_appserver.py --clear_datastore appname" to wipe the datastore.

Note that if you elect to use the free appspot.com domain name, the full URL for the application will be "http://application-id.appspot.com/". I think that means other apps can set cookies for appspot.com. Those can impact my app. I think this is a security problem.

This is interesting:
 type = db.StringProperty(required=True, choices=set(["cat", "dog", "bird"]))
I almost forgot to notice that I didn't really need to load the schema (i.e. model) ahead of time. I just created an object, and that was enough.

Concerning schemas:
A model defined using the Model class establishes a fixed set of properties that every instance of the class must have (perhaps with default values). This is a useful way to model data objects, but the datastore does not require that every entity of a given kind have the same set of properties...An expando model can have both fixed and dynamic properties.
Concerning lists:
A property can have multiple values, represented in the datastore API as a Python list...A single list property may even have values of different types.
 hobbies = db.StringListProperty()
More about schemas:
Two entities of the same kind can have different types of values for the same dynamic property.
Concerning references:
The ReferenceProperty class models a key value, and enforces that all values refer to entities of a given kind.
Here's how to set a reference:
 obj2.reference = obj1.key()  # Or:
obj2.reference = obj1
Concerning orphans:
When an entity whose key is the value of a reference property is deleted, the reference property does not change. A reference property value can be a key that is no longer valid. If an application expects that a reference could be invalid, it can test for the existence of the object using an if statement.
Concerning back-references:
ReferenceProperty has another handy feature: back-references. When a model has a ReferenceProperty to another model, each referenced entity gets a property whose value is a Query that returns all of the entities of the first model that refer to it.
The way they did this is a bit cleaner in my opinion than in Rails since you don't have to duplicate the knowledge about the relationship in both models.

Dynamic properties of expando classes don't do as much. For instance, they're not even validated. There are other limitations as well.

Concerning query laziness:
Query and GqlQuery objects do not execute the query until the application tries to access the results.
Concerning query limits:
The datastore returns a maximum of 1000 results in response to a query, regardless of the limit and offset used to fetch the results. The 1000 results includes any that are skipped using an offset, so a query with more than 1000 results using an offset of 100 will return 900 results.
Concerning keys:
The string encoding of a Key is opaque, but not encrypted. If your application needs keys to not be guessable, you should further encrypt the string-encoded Key before sending it to the user.
Concerning deleting ancestors:
Deleting an entity that is an ancestor for other entities does not affect the other entities. As long as the application does not depend on the existence of the ancestor to build keys for the descendant entities, the application can still access the descendants.
Concerning entity groups:
Only use entity groups when they are needed for transactions. For other relationships between entities, use ReferenceProperty properties and Key values, which can be used in queries...A good rule of thumb for entity groups is that they should be about the size of a single user's worth of data or smaller.
Concerning numeric IDs:
An application should not rely on numeric IDs being assigned in increasing order with the order of entity creation. This is generally the case, but not guaranteed.
Concerning indexes:
An App Engine application defines its indexes in a configuration file named index.yaml. The development web server automatically adds suggestions to this file as it encounters queries that do not yet have indexes configured...The App Engine datastore maintains an index for every query an application intends to make. As the application makes changes to datastore entities, the datastore updates the indexes with the correct results.
The query engine does support IN filters.

More on indexes:
An index only contains entities that have every property referred to by the index. If an entity does not have a property referred to by an index, the entity will not appear in the index, and will never be the result for the query that uses the index...If you want every entity of a kind to be a potential result for a query, you can use a data model that assigns a default value (such as None) to the properties used by filters in the query.
Concerning properties with mismatched types:
When two entities have properties of the same name but of different value types, an index of the property sorts the entities first by value type, then by an order appropriate to the type...A property with the integer value 38 is sorted before a property with the floating point value 37.5, because all integers are sorted before floats.
This one hurts my brain:
If a query has both a filter with an inequality comparison and one or more sort orders, the query must include a sort order for the property used in the inequality, and the sort order must appear before sort orders on other properties.
This one is fun:
This sort order has the unusual consequence that [1, 9] comes before [4, 5, 6, 7] in both ascending and descending order.
Concerning exploding indexes:
You can avoid exploding indexes by avoiding queries that would require a custom index using a list property. As described above, this includes queries with descending sort orders, multiple sort orders, a mix of equality and inequality filters, and ancestor filters.
Concerning transactions:
A transaction cannot perform queries using Query or GqlQuery.
Concerning create-or-update:
Create-or-update is so useful that there is a built-in method for it: Model.get_or_insert() takes a key name, an optional parent, and arguments to pass to the model constructor if an entity of that name and path does not exist.
There is a tool for bulk loading data.

When I went looking for benchmarks several months ago, I couldn't find any. Hence, I used ab to benchmark the hello world that is developed in the tutorial. The app only uses webapp and only has one query. Note, I did "prime the pump" to give GAE a chance to warm up:
ab -c 20 -n 1000 'http://jjinuxgaehelloworld.appspot.com/'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking jjinuxgaehelloworld.appspot.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software: Google
Server Hostname: jjinuxgaehelloworld.appspot.com
Server Port: 80

Document Path: /
Document Length: 759 bytes

Concurrency Level: 20
Time taken for tests: 9.510 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 924924 bytes
HTML transferred: 759759 bytes
Requests per second: 105.16 [#/sec] (mean)
Time per request: 190.192 [ms] (mean)
Time per request: 9.510 [ms] (mean, across all concurrent requests)
Transfer rate: 94.98 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 15 32 12.0 31 91
Processing: 106 157 49.9 146 846
Waiting: 105 157 49.9 146 846
Total: 130 189 50.8 178 872

Percentage of the requests served within a certain time (ms)
50% 178
66% 190
75% 200
80% 209
90% 229
95% 245
98% 290
99% 357
100% 872 (longest request)
Latency is the thing I care about most (I assume GAE scales well). Those numbers look pretty decent ;)

Comments

kumar said…
I was perpetually confused by offset not allowing more than 1000 records (I think the docs are wrong or else just poorly worded). I ran a test and you can actually retrieve more than 1000 records by using offset. Here is the test: http://aintjustsoul.appspot.com/scratch/view

which prints:

Count contains one column, number, and is filled with lots of numbers

Count.all().order('number').fetch(1000, offset=5)[-1] : 1005

in other words, offset can be used as a paging device, as you'd expect, but contrary to the docs.
> I was perpetually confused by offset not allowing more than 1000 records

Interesting. Thanks for the comment. Maybe you should submit that as a doc bug.