DjangoCon 2011 – Deployment, Daemons and Datacenters

Following a great talk on Security in Django we now have “Deployment, Daemons and Datacenters” by Andrew Godwin. This talk will go into the deployment strategies at ep.io.

A tour through the systems that power ep.io, the Python hosting platform, from the array of daemons powering the system, to how redundancy is set up, and also covering general best practices for hosting Django sites yourself.

Updates Below:

17.58

End of the talk. I’ll try to post the slides once they’re available.

17.58

“Have you dealt with the fact that different people have different laws with regards to Backups?”

In the EU you cannot export data outside of the country. You need to be aware of the different laws. Often good idea to keep backups in the same country, but different cities.

17.56

“Why did you move from AWS to your own hardware?”

A combination of performance. When you looked at the cost per dollar it was cheaper to have their own hosted service. Disk IO is a lot faster on their own hardware than virtualized hosts.

17.55

“How do you handle file uploads with multiple application servers?”

Uses http://www.gluster.org/ on a shared server.

17.54

Andrews done talking. Question time.

17.51

Standard tools aren’t always the best.

  • ep.io load balancer was initially HAProxy (doesn’t like having 3000 backends being reloaded every 10 seconds)
  • Custom eventlet-based load-balancer was simpler and slightly faster

17.49

Automation

  • Use Puppet or Chef along with Fabric
  • If you do something more than three times, automate it
  • Everything you manually SSH in, a kitten gets extremely worried

17.49

Loose Coupling

  • Simple, loosely-connected components
  • Easier to test and easier to debug
  • Enforces some rough interface definitions

17.48

Plan for multiple machines
  • That means no SQLite in production (doesn’t work for multiple machines)
  • Make good use of database transactions
  • How are you going to store uploaded files?

17.47

Sensible Architecture

Ship long-running tasks off:

  • Use celery, or your own worker solution
  • Even more critical if you have syncrhonous worker threads in your web apps
  • Email sending can be very slow

17.46

An easy start…

  • Dump your database nightly to a SQL file
  • Use rdiff-backup (or similar)_ to sync your DB dump, codebase and uploads to a backup directory
  • Also sync offsite – get a VPS with a different provider than your main one
  • Make your backup server pull the backups, don’t push them to it.

17.44

Replication is Hard

  • PostgreSQL and Redis replication both require your code to be modified a bit
  • Django offers some help with database routers
  • It’s also not always necessary and can cause bugs for your users (small sites may not be the answer)

17.42

Check your backups restore

  • Just seeing if they’re there isn’t good enough
  • Try restoring your entire site onto a fresh box

17.42

Backups before any major change in the database or code.

“It’s tedious but the one time you need it it’ll help you”

17.41

Never back up to the same provider. They can cancel your account…

17.40

Backups and Redundancy

Archives != High Availability

  • Your PostgreSQL slave is not a backup
  • You should backup using multiple formats to diverse locations

17.39

Development and Staging

  • No need to run gunicorn/nginx locally (runserver still works)
  • PostgreSQL 9 still slightly annoying to install
  • Redis is very easy to set up
  • Staging should be EXACTLY the same as live

17.36

How to handle higher loads:

  • Varnish for site caching
  • HAProxy or Nginx for load-balancing
  • Give PostgreSQL more resources

17.36

ep.io Stack

Three years ago

  • Apache and mod_wsgi
  • PostgreSQL 8.x
  • Memcached
Today
  • Nginx (static files/gzipping)
  • Gunicorn (dynamic pages, unix socket best)
  • PostgreSQL 9
  • Redis
  • virtualenv

17.34

Security

  • ep.io treas their internal network as public (any traffic has to be signed/encrypted)
  • Firewalling of unnecessary ports
  • Separate machines for higher-risk processes

17.33

The Joy of Networks

  • Any network has a significant slowdown compared to local access
  • Locking and concurrent access also an issue
  • Internal latency on EC2 can peak higher than 10s
  • Routing blips can cause very short outages

17.31

More ep.io statistics

15 requests, some git some pypi

  • Traditional: 300 seconds
  • Parellised, no cache: 30 seconds
  • Parellised, cache: 2 seconds

17.30

They run a parallel version of pip (with caching). Not 100% compatible with complex dependencies

17.29

Some ep.io information

  • Everytime an app is uploaded to ep.io it gets a fresh app image to deploy into
  • Each app image has its own virtualenv
  • The typical ep.io has around 3 or 4 dependencies
  • Some have more than 40

17.28

Using ZeroMQ and Eventlet works well together.

17.26

Eventlet is…

  • Coroutine-based asynchronous concurrency
  • Basically, lightweight threads with explicit context switching
  • Reads quite like procedural code

17.26

Redundancy’s not easy.

Serveral things can only run once (cronjobs)

17.25

sock = ctx.socket(zmq.REQ)
for endpoint in self.config.query_addresses():
    sock.connect(endpoint)

payload = json.dumps({'type':type, 'extra':extra})

with Timeout(30):
    sock.send(self.sign_message(payload))
    return self.decode_message(sock.recv())

17.23

ZeroMQ & Redundancy

  • Not a message queue
  • Advanced sockets, with multiple endpoints
  • Has both deliver-to-single-consumer and deliver-to-all-consumers
  • Uses TCP for transport

17.22

What’s ep.io?

  • Hosts Python sites/daemons
  • Technically language-independent
  • Supports multiple kinds of databases

17.21

Andrew is taking the stage.

Andrew is…

  • Core Developer
  • South author
  • Cofounder of ep.io

17.17

Everyone just getting settled. Stay tuned.