Accessing pagure's performances via mod_wsgi-express
Graham Dumpleton has written a series of blog post on monitoring WSGI performances:
- Returning a string as the iterable from a WSGI application
- Monitoring the response content from a WSGI application
- Performance monitoring of real WSGI application traffic
Discussing with Ralph Bean about it I wanted to test a little Graham's script, see if maybe we could re-use them ourself.
So here is the results of my tests, performed against my local instance of pagure
Create a virtualenv
This is only valid if you do not already use a virtualenv for pagure, so in my case I ended up with:
mkvirtualenv --system-site-packages mod_wsgi-express
Install `mod_wsgi`
In the virtualenv we need to install mod_wsgi since we need `mod_wsgi-express` to test our application (and I was too lazy to deploy pagure locally using a full bloom wsgi setup).
pip install mod_wsgi
Note: this step might require you to install httpd-devel
on your machine if you do not already have it.
Run mod_wsgi-express
So the first thing I searched for afterward was, how to run mod_wsgi-express, turns out, it's in the doc.
Ok I'll be nice, here is what I used
mod_wsgi-express start-server wsgi.py --debug-mode --log-to-terminal
The last two options are not necessary but I found them pretty handy when running into an error 500 to get the traceback directly in the terminal.
So all what is left is writing this `wsgi.py` file
The `wsgi.py` file
So here is the file I have been using:
import os import flask from wrapt import decorator, ObjectProxy from timeit import default_timer @decorator def timed_wsgi_application5(wrapped, instance, args, kwargs): start = default_timer() try: return WSGIApplicationIterable5(wrapped(*args, **kwargs), start) except: duration = default_timer() - start record_request_metrics(duration) raise class WSGIApplicationIterable3(ObjectProxy): def __init__(self, wrapped, name, start): super(WSGIApplicationIterable3, self).__init__(wrapped) self._self_name = name self._self_start = start self._self_time = 0.0 self._self_count = 0 self._self_bytes = 0 def __iter__(self): time = 0.0 start = 0.0 count = 0 bytes = 0 try: for data in self.__wrapped__: start = default_timer() yield data finish = default_timer() if finish > start: time += (finish - start) start = 0.0 count += 1 bytes += len(data) finally: if start: finish = default_timer() if finish > start: time += (finish - start) self._self_time = time self._self_count = count self._self_bytes = bytes def close(self): if hasattr(self.__wrapped__, 'close'): self.__wrapped__.close() duration = default_timer() - self._self_start print('write %s blocks %s bytes %.3fms' % (self._self_count, self._self_bytes, self._self_time*1000.0)) print('finish %s %.3fms' % (self._self_name, duration*1000.0)) @decorator def timed_wsgi_application(wrapped, instance, args, kwargs): start = default_timer() try: return WSGIApplicationIterable3( wrapped(*args, **kwargs), args[0].get('PATH_INFO', 'unknown path'), start) except: duration = default_timer() - start print('**', duration) raise # Specify the configuration file os.environ['PAGURE_CONFIG'] = '../config' import wsgiref.validate from pagure import APP APP.debug = True application = timed_wsgi_application(APP.wsgi_app) application = wsgiref.validate.validator(application)
The tricky part was to define the `application` function to call the flask application correctly while still allowing to use the decorator written by Graham.
I also adjusted it a little bit to include the path requested in the data returned, so that you know which path took a while to answer.
Here is a sample of the output:
$ mod_wsgi-express start-server wsgi.py --debug-mode --log-to-terminal Server URL : http://localhost:8000/ Server Root : /tmp/mod_wsgi-localhost:8000:1000 Server Conf : /tmp/mod_wsgi-localhost:8000:1000/httpd.conf Error Log File : /dev/stderr (warn) Request Capacity : 5 (1 process * 5 threads) Request Timeout : 60 (seconds) Queue Backlog : 100 (connections) Queue Timeout : 45 (seconds) Server Capacity : 20 (event/worker), 20 (prefork) Server Backlog : 500 (connections) Locale Setting : en_US.UTF-8 [Wed Mar 02 16:55:27.702283 2016] [core:notice] [pid 21903:tid 139757036423360] SELinux policy enabled; httpd running as context unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 [Wed Mar 02 16:55:27.705735 2016] [core:warn] [pid 21903:tid 139757036423360] AH00098: pid file /tmp/mod_wsgi-localhost:8000:1000/httpd.pid overwritten -- Unclean shutdown of previous Apache run? write 1 blocks 15575 bytes 0.077ms finish /test 956.048ms write 1 blocks 3502 bytes 0.090ms finish /static/fontawesome/font-awesome.min.css 14.127ms write 1 blocks 30055 bytes 0.107ms finish /test/issues 199.554ms write 1 blocks 11571 bytes 0.075ms finish /test/pull-requests 150.349ms write 1 blocks 49989 bytes 0.098ms finish /test/pull-request/50 420.889ms write 16 blocks 123708 bytes 0.493ms finish /static/hack_fonts/fonts/woff2/hack-bold-webfont.woff2 13.823ms write 1 blocks 30055 bytes 0.088ms finish /test/issues 152.456ms write 1 blocks 18403 bytes 0.065ms finish /test/issue/10 119.909ms write 1 blocks 3976 bytes 0.064ms finish /static/upload.js 4.424ms write 2 blocks 9314 bytes 0.115ms finish /static/issue_ev.js 7.071ms
From there we can see that flask is doing the right thing by minimizing the number of blocks returned and that most often the response time is not spent on writing said response.
This means, if we were to write and deploy our own wsgi-middleware to monitor performances, it would basically take two lines to our existing wsgi files, one to import our wrapper, one to redefine `application`.