[MLB-WIRELESS] Mirroring Websites

Tyson Clugg tyson at wireless.org.au
Wed Oct 8 14:02:34 EST 2003


Sneeze asked:
> I'm wanting to setup some mirrors of Internet websites on a PC.  Can anyone
> point me towards some good resources on how to do this and what programs I
> might need.  I've had a quick look at wget and noticed some mirroring
> options but I'm not sure what to do.

First thing, there are 2 basic approaches to this:
1. Take a snapshot of a site and present the snapshot content on your mirror.
2. Use proxy software to dynamically present the other website.

Using 'wget' you will end up with a snapshot of the website.

> Also, is it at all possible to mirror a site that's completely scripted or
> PHP?  Would the mirroring process create static HTML pages in place of the
> original pages?

Proxy software will be able to store static versions of pages served using 'GET /somefile.php HTTP/1.x' for re-distribution when the cache control mechanisms allow it.  This means for the majority of PHP scripted pages, the answer is NO.  Unless the PHP coder has been thoughtful enough to specify proxy friendly cache control parameters for pages that are non-dynamic, the proxy software will not be able to serve cached pages from the site.

It may be possible to tell proxy software to disallow access to non-cacheable content, and hence you would have an interesting mix of the two - dynamic mirror building of static content.  See the following for details relating to the Apache 2.0 web server (which can do proxy serving):
http://httpd.apache.org/docs-2.0/mod/mod_proxy.html
http://httpd.apache.org/docs-2.0/mod/mod_disk_cache.html
http://httpd.apache.org/docs-2.0/mod/mod_mem_cache.html
http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html#Solutions


More information about the Melbwireless mailing list