[Catalyst] Serving "static" content from DB

Mihai Bazon mihai at bazon.net
Sun Jun 7 12:01:58 GMT 2009


Look, I think I didn't properly explain what I need.  Or, maybe you
didn't read all my email and just noticed that I intend to keep file
content in a BLOB. :-)

I do want to keep files in the DB, BUT serve them as static files.  The
backend<->db traffic is unimportant, as it will happen only when the
file is updated.

I wrote some tests, since you ask me about numbers.  You can download it
here to run it yourself:

    http://mihai.bazon.net/Static-VS-DBI.tar.bz2

There are 3 tests: (1) serving a static file, (2) fetching the content
from DB at each request (Dropme::handle_dynamic) and (3) redirect Apache
to a file on disk which is updated with the content from DB
(Dropme::handle_dynamic_cached). The Dropme package is defined in
modperl.pl

Here's how to run it (assumes a mod_perl2-enabled Apache2):

    cd ~
    tar jxf Static-VS-DBI.tar.bz2
    chmod 777 Static-VS-DBI/cache       # Apache needs to write here
    ln -s ~/Static-VS-DBI /tmp/testblob
    cd /etc/apache2/sites-enabled       # or wherever you keep vhosts
    ln -s /tmp/testblob/dropme.conf
    sudo /etc/init.d/apache2 restart    # or whatever for your distro
    mysqladmin -u root -p create dropme
    mysql -u root -p dropme

# run the following in MySQL console:

    grant all privileges on dropme.* to dropme at localhost identified by 'dropme';
    ^D (exit MySQL shell)

    cd ~/Static-VS-DBI
    ./createdb.pl

The createdb.pl script will put each files/* in a record in the "Files"
table in the DB.  Then, you can use the following URL-s to access them:

 1. http://localhost:54321/image.jpg (served statically by Apache)
 2. http://localhost:54321/mp1/image.jpg (through handle_dynamic)
 3. http://localhost:54321/mp2/image.jpg (through handle_dynamic_cached)

To stress-test, use "ab", e.g.:

 ab -c 2 -n 100 http://localhost:54321/mp2/image.jpg

My conclusions (numbers in requests per second):

  file                    |   static   |   mp1   |   mp2
 ---------------------------------------------------------
  lgpl.txt (25K)          |   65.05    |  58.87  |  53.06
  image-small.jpg (110K)  |   61.49    |  57.21  |  52.45
  image.jpg       (1.8M)  |   53.76    |  35.43  |  49.33
  10MB.bin        (10M)   |   38.77    |  11.01  |  31.73

As expected, static wins in all cases.  For small files, mp1 is better
than mp2 but not by a long shot.  For large files, mp1 is a lot slower,
while static and mp2 are comparable (even when you increase -c
(connections per second)).  "mp2" can probably be optimized, I wrote
some ugly code to check if the cached file is out of date; it also can
be installed as a MapToStorageHandler, rather than ResponseHandler,
since what it does is mapping an URL to a file:

 1. is the file cached?
    - If not, retrieve from DB then save it on disk.
    - If yes, is its mtime older than what's in DB?
         - if yes, retrieve from DB then save it on disk.
 2. $r->filename($cached_file) and return DECLINED

So the file is actually served by Apache itself, and the BLOB is hit
only once.  For most requests, the Perl handler steps in only to check
that the cached file is up-to-date.  Moreover, the cache is outside the
document_root, which is many times convenient.

Cheers,
-Mihai

Cosimo Streppone <cosimo at streppone.it> wrote:
> Mihai Bazon <mihai at bazon.net> wrote:
> 
> > John Romkey <romkey at apocalypse.org> wrote:
> >> On Jun 6, 2009, at 7:41 AM, Ian Docherty wrote:
> >> > Mihai Bazon wrote:
> >> >> Hi folks,
> 
> > I am aware of the performance quirks
> 
> Good.
> 
> > MySQL blobs are pretty fast, btw.
> 
> Ah :)
> 
> I'm interested about this. Do you have numbers?
> 
> > Performance should be (almost) the same as for static files if I
> > implement a handler that (1) updates the static file from DB when it's
> > out of date and (2) DECLINE-s the request so that Apache^W the web
> > server itself can further serve the file.
> 
> Performance can be almost the same. Scalability won't.
> But of course you are the only person that can evaluate that
> depending on your needs, requirements, etc...
> 
> However:
> 
> > When the need comes, I'll figure out something faster
> > -- but for now all I can think of is "early
> > optimization is the root of all evil". ;-)
> 
> I know a team that thought exactly the same, did exactly
> the same as you are planning to do (even pictures in the db),
> served from the db and scaled on the fly by CGI processes.
> 
> (and don't underestimate backend<->db network traffic)
> 
> That became my team some time ago, and we spent
> _months_ to destroy that monster and serve those
> content as completely static by lightweight httpd servers.
> 
> I'm not saying you should change anything, but think
> about the poor souls who shall maintain the system
> in the hypotetical future where you get, as we do,
> millions of hits/day.
> 
> -- 
> Cosimo
> 
> _______________________________________________
> List: Catalyst at lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/


More information about the Catalyst mailing list