Scaling Python on the Web

First session of the day was on Scaling Python on the Web; rough notes which I may clean up later:

  • How fast is fast enough?
    • Don’t prematurely optimize
    • Know where the bottlenecks are, and optimize those specifically?
  • Orders of magnitude: static (httpd), dynamic (python), db-queried
  • Even 40 req/s in 3.4m pages/day
  • Hundreds to low thousands of dynamic page views is usually good enough
  • Scaling isn’t about the language, it’s about:
    • DRY: cache!
    • share nothing
    • built a sample photo-app, FlickrKillr, for demonstration purposes
    • preloaded with 100k’s users, 10-20 photos each
    • first iteration: CGI
    • roughly 23 requests/second
    • problems:
      • loading Python interpreter for each request
      • all resources initialized for each request (inc. db connection)
      • possible remedies:
      • run a Python web server (long-running process)
      • make one db connection per thread instead of request
      • other remedies:
      • fastcgi
      • snakelets, twisted.web, RhubarbTart
    • mod_python
  • second iteration: python app server (CherryPy used for this demo)
    • roughly 139 requests per second
    • problems
      • global interpreter lock—can only utilize one core on a dual core machine
      • sessions in the database—prefer an in-memory session store
      • remedies:
      • run multiple instances of CherryPy (overcode GIL)
      • but then we need to balance with something like nginx
      • other options
    • cherrypy in mod_python
  • version 3: load balancing with nginx
    • 217 requests/sec
    • outstanding problems
      • static files read from disk every time
      • and they’re being read/written from python
      • solutions:
      • memcached
    • combine with memcached w/ nginx
  • version 4: caching
    • 616 req/sec (benchmarking w/ homegrown tool)
    • 1750 req/sec (benchmarking w/ ab)
    • other notes:
    • don’t forget to index
  • without an index, the fourth iteration falls down to 28 requests/sec

    http://www.polimetrix.com/pycon/demo.tar.gz

3 Comments

  1. ingo
    Posted Saturday, March 3, 2007 at 1:29 pm | Permalink

    This is somewhat of a pet peeve, so excuse my rambling, but I miss an “outstanding problems” section for caching. Caching is /hard/, especially when dealing with dynamic web-sites such as CMS-driven sites. I couldn’t care less that 95% get a fast response through caching if my 5% of editing interactions are slow.
    Caching is not a panacea and while the whole stuff about “optimize only what you need” is absolutely true and I dearly value the development speed that Python gives me, there are faster interpreters out there.

  2. BLA
    Posted Tuesday, March 6, 2007 at 6:19 am | Permalink

    Each tool has its pros & cons. Each problem has its appropriate tools. Wrong mixes of problem & tools = bad results. Caching is hard, it’s not a panacea, Python is easy but slow (compared – for example – to Java).

    Do you want fast development & fast code? Use Python, optimize it with your own C extensions. Yes, “optimize only what you need”. Optimize only that, only with C. And only if you can’t make your algorithms better, and can’t design your database more carefully. And don’t use CGI, if possible.

    SHORTLY: Python, no CGI, optimized database & algorithms = fast development & good codebase. ONLY AFTER THAT tune your code & setup with C, caching, load balancing = hard part of development.

  3. Posted Tuesday, March 6, 2007 at 8:35 am | Permalink

    “easy but slow… compared to Java”?

    I’ll buy the easy part, but I wouldn’t have pointed to Java as my exemplar of speed. I don’t think Java is slow, but then I don’t think Python is slow, either. I think both are “slower than C”, but that’s just a comparative statement, not absolute.

Post a Comment

Your email is never shared. Required fields are marked *

*
*
Creative Commons Attribution-ShareAlike 3.0 United States
Creative Commons Attribution-ShareAlike 3.0 United States