DjangoCon 2008

DjangoCon 2008

Sat 09.06.08 – I am at the DjangoCon at the Googleplex in Mountain View, Calif. Per usual, I will be live blogging the event, please click on the “Continue Reading” link to get my transcript/notes.


Sat Sept 6, 2008 – DjangoCon 2008
Guido Van Rossum
Google App Engine and Django
Keynote
Talk Overview:
Google App Engine
Using Django w/ App Engine
The “Google App Engine Helper for Django”
Google App Engine:
Does one thing well: running web apps (not meant for other computing)
Simple app configuration: it is cool and easy to start system. It requires one to use Python.
Scalable
Secure
App Engine Does One Thing Well
* App Engine hangles Http requests, nothing else
Think RPC: request in, processing, response out
Works well for the web and AJAX; also for other services (doesn’t have to be a web browser, can be other applications)
*App configuration is dead simple
No performance tuning needed
*Everything is build to scale
“infinite” number of apps, request/sec, storage capacity – will be able to buy more storage when it is out of beta.
APIs are simple, stupid
(Guido shows a diagram of the system, we have been asked not to take indoor photos, so look on line for the App Engine Architecture diagram. He is now walking through each bit in the diagram, not going to write all of it out. Watch the video on the Google Video.)
(He is now having fun talking about memcache.)
Scaling
The scaling problems are enormous due to the different needs of low-usage vs. high-usage apps. Some of the infrastructure is built on Google structure and some is customized for the Apps Engine.
* Low-usage apps: many apps per physical host
* High-usage apps: multiple physical hosts per app
(Guido likes details)
* Stateless APIs are trivial to replicate
* Memcache is trivial to shard
* Datastore built on top of Bigtable; designed to scale well (The datastore is the hardest part to scale, but by putting it into Bigtable then there is almost infinite scalability [paraphrased])
Abstraction on top of Bigtable
API influenced by scalability
No joins
Recommendations: denormalize schema; precompute joins
Security
* Main goal: Prevent the bad guys from breaking (into) your app (by making it hard to break out of App Engine)
* Constrain driect OS functionality
no processes, threads, dynamic library loading
no sockets (use urlfetch API)
can’t write files (use datastore)
disallow unsafe Python extensions (eg ctypes)
* Limit resource usage
Limit 1000 files per app, each at most 1MB
Hard time limit of 10 seconds per request (10 seconds is quite a lot of time to make a call)
Most requests must use less than 300 msec CPU time
Hard limit of 1MB on request/reposnse size, APR call size, etc.
Quota system for number of requests, API calls, emails sent, etc.
Why not LAMP?
* Linux, Apache, MySQL/PostgreSQL, Python/Perl/PHP/Ruby
* LAMP is industry standard
* But management is a hassle:
Configuration, tuning
Backup and recovery, disk space management
Hardware failures, system crashes
Software updates, security patches
Log rotation, cron jobs, and much more
Redesign needed once your database exceeds one box
* Guido’s slogan “We carry pages so you don’t have to”
“The idea is that App Engine takes all of that (hassle) out of your hands.”
The Future – What is next? What language? What…?
* Big things we’re working on:
Large file uploads and downloads
Pay-as-you-go billing (for usage over free quota) – hopefully before end of year
More capacity
More languages
Batch processing
* No timeline – agile development process
When will the next language be rolled out? I can’t tell you. Every next language that needs to go through the Google security team (paraphrase).
Django on App Engine
App Engine has a long history with Django.
* App Engine’s own mini-framework, webapp, uses Django
Alas, it’s stuck on Django 0,96; only templates supported. (Can’t really update, as it will break extensions. Upgrading really isn’t an option.)
Can’t really tell it is running Django, but the templates give it away (paraphrase).
* App Engine can also run a subset of Django 1.0
User must add a copy of Django to their app
Supported:
URL dispatch, request/response, views, error handling
templates
form generation and validation
Not supported:
models, ORM, db backends (but App Engine models are similar) [ Unfortunately, you can’t use Django models with App Engine]
admin interface; management script, tests
i18n (?)
Cutting Django 1.0 Down to Shape
* App Engine limits each app to 1000 files, max 1MB each
* Alas, Django 1.0 has over 1000 files!
* And all zipped up, it’s over 1MB…
* Solution: Lots of those 1000+ files aren’t used
* Create a zipfile with only the essentials
381 files, 1717459 bytes, 505754 bytes compressed
speeds up the deployment tremendously, too
but, some funcitonality…
Supporting More of Django
* Google App Engine Helper for Django
Separate open source project by Matt Brown & Andy Smith
Monkeypatches bits of Django
Must inherit models from BaseModel instead of db.Model
Supports manage.py script and unit testing
* Going the other way (Guido tells a story about how to port to another environment)
Andi Albrecht has managed to port a major App Engine Django app (Rietveld code review) to a pur Django environment
Guido now gives a demo
Demo App Structure
* app.yaml App Engine configure & top-level URL dispatch
* main.py bootstrap; 100% reusable boilerplate (contains some nasty hack)
* settings.py Django config; 90% boilerplate (Guido- I like small settings files, I started with the standard and only kept what was necessary)
* urls.py Django top-level URL dispatch
* shouts/ my demo app code goes here
__init__.py empty
urls.py the URL dispatch
views.py views
models.py database models
* templates templates go here
500.html
… Guido moves to demo before I am finished typing.
Guido demos…
***********
High Performance Django
David Cramer from Curse (formerly), now of iBegin
Curse – a gaming website
Peak daily traffic of approx. 15 million pages, 150 million hits
Average month 120 m traffic, 6 m uniques
Python, MySQL, Squid, memcached, mod_python, lighty.
Most devs came strictly from PHP (myself included)
12 web servers, 4 database servers, 2 squid caches
Now David is at iBegin
massive amounts of data, 100million + rows
Python, MySQL, mod_wsgi
Small team of developers
Complex database partitioning/sync tasks
Areas of Concern in Scaling
Database (ORM)
Webserver
Caching
Template REndering
Profiling
Tools of the Trade
Webserver (Apache, Nginx, Lighttpd)
Object Cache (memcached)
Database (MySQL, PostgreSQL)
Page Cache (Squid, Nginx, Varnish)
Load Balancing (Nginx, Perlball)
How We Did it
Primary web serves serving Django using mod_python
Media servers using Django on Lighttpd
Static served using additional instances of lighttpd
Load balancers passing requests to multiple Squids
Squids passing requests to multiple Squids
Squids passing requests to multiple Squids
Lessons Learned
Don’t be afraid to experiment. You’re not limited to a single server. (Apache is heavy and is intensive on resources)
mod_wsgi is a huge step forward from mod_python (don’t have to restart Apache to update code)
Serving static files using different software can help
Send proper HTTP headers where they are needed (saves lots of requests)
Use services like S3, Akamai, Limelight, etc. – Don’t reinvent the wheel.
David runs through the Webserver Software that scales well. See his slides.
Database (ORM) – A big thing when you are scaling.
Won’t make your queries effient. Make your own indexes.
select_related() can be good, as well as bad.
Inherited ordering (Meta: ordering) wil lget you
Hundreds of queries on a page is never a good thing
Know when to not use the ORM
“Use the power of your database, don’t use Django. Django is meant to be generic.”
Handling JOINs – shows a slide with lots of code.
Template Rendering
Django is a sandboxed engine, which is safe.
Sandboxed engines are typically slower by nature
Keep logic in views and template tags
Be aware of performance in loops, and group by (regroup)
Loaded templates can be cached to avoid disk reads
Switching template engines is easy, but may not give you any worthwhile performance gain.
Template Engine Chart
Genshi
Django
Mako
Jinja
Caching
Two flavors of caching: object cache and browser cache
Django provides built-in support for both
Invalidation is a headache without a well thought out plan
Caching isn’t a solution for slow loading pages or improper indexes
Use a reverse proxy in between the browser and your web servers: Squid, Varnish, Nginx, etc.
Caching with a Plan
Build your pages to use proper cache headers
Create a plan for object cache expiration and invalidation
For typical web apps you can serve the same cached page for both anonymous and authenticated users – use javascript, use headers, makes a lot smoother
Contain commonly used query sets in managers for transparent caching and invalidation.
Cache Commonly Used Items
Profiling Code
Finding the bottleneck can be time consuming (95% of the time can be spent finding the problem)
Tools exist to help identify common problematic bits – cProfile (see docs on python.org)
Profile Database Queries
Summary
Database efficiency is the typical problem in web apps
Develop and deploy a caching plan early on
Use profiling tools to find your problematic areas. Don’t pre-optimize unless there is good reason
Find someone who knows more than me o configure you server software. (David has a smiley face here)

http://www.davidcramer.net/djangocon/
(slides and code will be here)
**********
Lunch and Jacob Kaplan Moss / Adrian Holovaty Keynote
Adrian trots down Memory Lane by doing a Show & Tell with Django from the very beginning to 0.96.
Jacob talks about Django from 0.96 to 1.0. He is now talking about Django 1.0. I am falling down on the job and not transcribing his every word. ;o)
**********
Satchmo
Chris Moffitt
Satchmo – The Beginning
April 2006 – proposed on the django mailing list
What is Satchmo?
* Django based framework for developing unique and highly customized ecommerce site
* Designed for developers that have a unique store need
* Supported by over 570 people in Satchmo-users Google Group
* Deployed to 9 known production sites & more in development
* Translated into multiple languages
* Prepping for its 1.0 release
Components
Core apps and independent non-core apps, plus middleware
Unique Features – Custom Shipping
Satchmo supports many shipping modules but the real power is in the flexibility to create your own
[Started blogging about Google’s heated toilet seats rather than paying attention, sorry.]
Unique Features – Payment Modules
Satchmo’s payment module design supports a broad range of payment processors:
URL posting & response – Authorize.net
Full XML message creation and respponse – Cyberource
Complex site redirection
Signal Usage in Satchmo
We currently have 13 signals in the base Satchmo
“satchmo_order_success” and ‘satchmo_cart_changed are the two most heavily used.
The use of signals allowed us to remove ciricular dependencies, clear up responsibilities, and make the whole thing much more understandable.
We’ll be adding more.
Documentation:
We strive to emulate Django’s documentation example (Django has great docs).
ReStructured text for all the documents
Sphinx for presenting them nicely
Use built in admin docs functionality
The biggest challenge is providing the right level of documentation to help newbies as well as experienced devs
The Future
Stabilizing on Django 1.0 – YEAH!
Improving the docs around extensibility
Improved product model
More customization of the admin interface
Product Model
Admin – Order Manager, Product Manager, etc.
External Apps – XMLRPC interface, JSON interface
Q&A time
***********
James Bennett
Writing Reusable Django Apps
The extended remix
The Fourfolk path:
Do one thing and do it well
Don’t be afraid of multiple apps
Write for flexibility
Build to distribute
1) Do one thing and do it well (the Unix philosophy)
Take a collection of little apps that do one thing and collect them together
Application == encapsulation => Keep a tight focus
Ask yourself: “What does this app do?”
The answer should be 1 or 2 short sentences
Bad focus is like a run-on sentence
Warning signs
A lot of very good Django apps are very small, just a few files
If you app is getting big enough to need lots of things to split up into lots of modules it maybe time to split it into more apps
Even a lot of simple Django sites commonly have a dozen or more applications in INSTALLED_APPS
If you’ve got a complex/features packed site and a short app list, it may be time to think hard about how to tightly focus.
Good app: Django Snipets -> User registration
Some “simple” things aren’t so simple.
Should I add this feature?
What does the app do?
Does this feature have anything to do with my app?
The monolith mindset
The app is the whole site
Re-use is often an afterthough
Tend to dev…
(James is moving fast through his slides)
The Django mindset
Application == some bit of functionality
Django is a list of installed apps
apps live on the python path, no inside any specific apps or plugins directory
abstractions life site model make you think…
Should this be its own app?
Is it completely unrelated to the app’s focus?
Is it orthogonal to whatever else I am doing?
I’ve learned the hard way:
Djangosnippets.org -> Is one app, includes bookmarking, tagging, rating, etc.
It should be four different apps, not one.
Orthogonality:
Means you can change one thing without affecting other features
Almost always indicates the need for a separate app
Reuse:
Lots of cool features actually aren’t specific to one site
See: bookmarking, taggin, rating
Example: Contact form
Site-specific needs => Write for flexibility
Common Sense
Sane defaults
Easy Overrides
Don’t set anything in stone
Form processing
Supply a form class
But let people specify their own
URL best practices:
Provide a URLConf in the app
Use the named URL patterns
USe reverse lookups: reverse(), permalink, {% url %}
Working with models
Whenver polssible, avoid hard coding a model class
Use get_mode() and take an app label/ model name string instead
Don’t rely on objects…
Working with Models
Learn to Love Managers
Managers are easy to reuse
Managers are easy to subclass and customize
Managers let you encapsulate patterns of behavior behind a nice API
Advanced techniques
Encourage subclassing and use of subclasses
Provide a standard interface people can implement in place of your default implementation
Use registry (like the admin)
The API your application exposes is just as important as the design of the sites you’ll use it in. In fact, it’s more important.
Good API design
“Pass in a value for this argument to change the behavior”
“Change the value of this setting”
“Subclass this and override these methods to customize”
“Implement something with this interface”
Bad API design
“API? Let me see if we have one of those…”
“It’s open source, fork it, do whatever you want”
No, really. Your gateway interface is not your API.
Build to distribute
So you did the tutorial, a lot of code like this (shows a project name hard coded into app)
Why (some) Projects suc
You have to replicate that directory structure every time you re-use
Good Use of a Project
Settings module
a root URLConf module
And Nothing Else
settings.ljworld
urls.ljworld
And lots of apps
What reusable app look like
single module directly on the Python path
Related modules under a package

General Best Practices:
Be up front about dependencies – What am I depending on?
Write for Python 2.3 when possible
Pick a release or pick trunk, and doucment that
But if you pick trunk, update frequently
(not as important with Django 1.0, will be easier)
Templates are hard
Providing templates is big out of the box win
But templates are hard to make portable
I usually don’t do default templates
Either way
Doc your template names and contexts
Be obsessive about document
It’s Python: give stuff docstrings
If you do, Django will generate docs for you
And users will love you forever
Documentation-driven development
Write the docstring before you write the code
Advantages – You’ll never be lacking docs
You’ll be up to date
Django will help you
Good examples:
django-tagging
django-atompub
django-hotclub
Jannis Leidel’s django-packing
Django Pluggables
********
Justin Brunn
Geo-Django
Utilities
Admin
Geo Feeds & Site Maps
Geo What?
GIS = Geographic Information Informations
Standeards, data, coordinate systems are foreign to most devs
Maps are flat, the world is not
Coordinate Systems
Way of representing location
Analogous to Unicode
Geodetic : Coords are in angles, depending on the datum, and ellipsoid model
Projected : Way to represent the curved surface of Earth on flat surface. Coords may be in linear units. Distortion.
Standards
Open Geospatial Consortium, Inc. – standards org
SFS – Standard for spatial databases
WKT – Geometries
KML, GML, WMS, WFS are other popular OGC standards
SFS Geometries
Point, LineString, LinearRing, Polygon
Multiple geometries may be in a collection: Multipoint, MultiPolygon, MultiLineString, GeometryCollection
GeoDjango : Interfaces from poplular spatial database (PostGIS, Oracle, MySQL ) to GDAL/GEOS/PROJ.4 (geo libraries) : Open Layers Google, etc.
Crash Course
How to geo enable your models
Import from django.contrib.gis.db
Models have usual bits plus the GeoManager and the Geometries
Eye Candy
Houstoncrimemaps
backyardpost.com – using it for real estate listings
watch.tampabay.com
bmanearth.burningman.org/com
Requirements
Python 2.4+ with ctypes
Spatial Database

Installation
Require relatively new versions of geospatial libraries — limited binary package availability
Compilation from binary source
Geospatial Libraries
Why?
Powerful open source libraries; temperamental SWIG interfaces (ditched SWIG for ctypes)
ctypes enables all-Python interfces (no compliation necessary)
Use of C APIs allows for high…
GEOS – Geometry Engine…
GDAL – Geospatial Data Abstraction Library (the Swiss Army Knife of Geo Libs)
Models and Databases
Show me the SQL: (Gives code examples)
**************
** Sorry folks, this live blogger has reached caffeine overload and can’t focus to type **
This is were we say take me to the puppies
*************