Skip to main content
  1. Posts/

Scaling Python on the Web

First session of the day was on Scaling Python on the Web; rough notes which I may clean up later:

  • How fast is fast enough?
    • Don’t prematurely optimize
    • Know where the bottlenecks are, and optimize those specifically?
  • Orders of magnitude: static (httpd), dynamic (python), db-queried
  • Even 40 req/s in 3.4m pages/day
  • Hundreds to low thousands of dynamic page views is usually good enough
  • Scaling isn’t about the language, it’s about:
    • DRY: cache!
    • share nothing
  • built a sample photo-app, FlickrKillr, for demonstration purposes
    • preloaded with 100k’s users, 10-20 photos each
  • first iteration: CGI
    • roughly 23 requests/second
    • problems:
      • loading Python interpreter for each request
      • all resources initialized for each request (inc. db connection)
    •   <li>
          possible remedies:<ul>
            <li>
              run a Python web server (long-running process)
            </li>
            <li>
              make one db connection per thread instead of request
            </li>
          </ul>
        </li>
      
        <li>
          other remedies:<ul>
            <li>
              fastcgi
            </li>
            <li>
              snakelets, twisted.web, RhubarbTart
            </li>
            <li>
              mod_python
            </li>
          </ul>
        </li>
      </ul>
      
    • second iteration: python app server (CherryPy used for this demo)
      • roughly 139 requests per second
      • problems
        • global interpreter lock — can only utilize one core on a dual core machine
        • sessions in the database — prefer an in-memory session store
      •   <li>
            remedies:<ul>
              <li>
                run multiple instances of CherryPy (overcode GIL)
              </li>
              <li>
                but then we need to balance with something like nginx
              </li>
            </ul>
          </li>
        
          <li>
            other options<ul>
              <li>
                cherrypy in mod_python
              </li>
            </ul>
          </li>
        </ul>
        
      • version 3: load balancing with nginx
        • 217 requests/sec
        • outstanding problems
          • static files read from disk every time
          • and they’re being read/written from python
        •   <li>
              solutions:<ul>
                <li>
                  memcached
                </li>
                <li>
                  combine with memcached w/ nginx
                </li>
              </ul>
            </li>
          </ul>
          
        • version 4: caching
          • 616 req/sec (benchmarking w/ homegrown tool)
          • 1750 req/sec (benchmarking w/ ab)
        • other notes:
          • don’t forget to index
          • without an index, the fourth iteration falls down to 28 requests/sec

        http://www.polimetrix.com/pycon/demo.tar.gz