After a slightly dry but informative talk from Russell Keith-Magee on the current state of the Django Software Foundation (which I forgot to live-blog) I’m back with the next talk of the night by Frank Wiles
PostgreSQL is effectively the default RDBMS for Django. Learn the dark arts of optimizing this powerful database to be blazingly fast on your own hardware or in the cloud.
You can view the slides at http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf
Frank has finished talking now. Extremely well thought out talk. Good combination of technical tips and general insight.
Break large tables into smaller tables by date or another partitioning scheme that fits your data.
Using Triggers can be quicker than Signals. Consider using if you do a lot of repetitive stuff.
default_statistics_target = 10
ALTER <table> ALTER <column> SET STATISTICS <integer>;
CREATE INDEX open_tickets_idx ON tickets (OPEN) WHERE OPEN='t';
Multicolumn Indexes can help if you’re constantly doing searches on multiple columns.
Instead of using count() you can have a simple key-value table that has a count of your counts. Update via triggers in PG or save signals
Common Django Specific problems:
- Use select_related()
- Queries in loops (STAY AWAY!)
- Proper indexing (Index the proper things, don’t index columns you select infrequently)
- count() is slow. PG has statistics on it’s tables which you can use instead.
Mount your filesystem with noatime
“Your choice of filesystem matters. Don’t use a journaled filesystem for your WAL. Use XFS for maximum performance”
When working “In the Cloud”
- Remember: You’re sharing disks and you don’t even know how!
- You can get the best of both worlds with Rackspace and their Cloud Connect product
- Using EBS volumes and software RAID is best (but somewhat scary) option on AWS
“If you can afford SSDs, you can ignore the rest of this talk”
- Allows you to specify where you want specific tables, indexes on different disks/volumes
- Consider putting archive or legacy data onto slower drives/volumes
- Separate your data and indexes onto different volumes (if possible)
Use pgbouncer to pool connections
- Have as much RAM as possible (best bang for your buck)
- Disks, have lots of disks (Faster disks make a huge difference. Raid 1+0 is good, Raid 5 bad. Separate out the WAL onto it’s own disk)
- After previous two worry about CPU speed. You’re unlike to be CPU bound
wal_buffers – set to 16MB and forget it
checkpoint_segments – Increase to at least 10
maintenance_work_mem – 50MB for every GB of Ram (stats tools)
synchronous_commit – Turn off with data loss risks (Potential for losing some data. Default is it’ll pool connections for every 600ms)
The “Big 3” Tuning Parameters
shared_buffers – Set to 25% of available RAM and move up/down 5% to find the sweet spot
effective_cache_size – Planning hint that tells PG how much RAM it can expect for OS disk cache
work_mem – Per processes amount of ORDER BY space
“Do Smart Things”
- Cache and Cache some more
- Watch your query counts (1.3 has assert num queries, use it!)
“Don’t do dumb things”.
- Assume PostgreSQL is like MySQL
- Database server doing double duty
- Disk Contention
- Retrieve more data than you need
“Measure Everything Before and After you try it”
And we’re off. Frank Wiles takes the stage.