Scaling Python on the Web

First session of the day was on Scaling Python on the Web; rough notes which I may clean up later:

How fast is fast enough?
- Don’t prematurely optimize
- Know where the bottlenecks are, and optimize those specifically?
Orders of magnitude: static (httpd), dynamic (python), db-queried
Even 40 req/s in 3.4m pages/day
Hundreds to low thousands of dynamic page views is usually good enough
Scaling isn’t about the language, it’s about:
- DRY: cache!
- share nothing
built a sample photo-app, FlickrKillr, for demonstration purposes
- preloaded with 100k’s users, 10-20 photos each

first iteration: CGI

roughly 23 requests/second
problems:
- loading Python interpreter for each request
- all resources initialized for each request (inc. db connection)

  <li>
    possible remedies:<ul>
      <li>
        run a Python web server (long-running process)
      </li>
      <li>
        make one db connection per thread instead of request
      </li>
    </ul>
  </li>

  <li>
    other remedies:<ul>
      <li>
        fastcgi
      </li>
      <li>
        snakelets, twisted.web, RhubarbTart
      </li>
      <li>
        mod_python
      </li>
    </ul>
  </li>
</ul>

second iteration: python app server (CherryPy used for this demo)
- roughly 139 requests per second
- problems
  - global interpreter lock — can only utilize one core on a dual core machine
  - sessions in the database — prefer an in-memory session store
- version 3: load balancing with nginx
  - 217 requests/sec
  - outstanding problems
    - static files read from disk every time
    - and they’re being read/written from python
  - version 4: caching
    - 616 req/sec (benchmarking w/ homegrown tool)
    - 1750 req/sec (benchmarking w/ ab)
  - other notes:
    - don’t forget to index
    - without an index, the fourth iteration falls down to 28 requests/sec
  http://www.polimetrix.com/pycon/demo.tar.gz