No subject
Tue Jan 17 15:36:28 EST 2012
Dynamic Mirror
Description:=20
Assume there are nice webpages on remote hosts we want to bring into =
our namespace. For FTP servers we would use the mirror program which =
actually maintains an explicit up-to-date copy of the remote data on the =
local machine. For a webserver we could use the program webcopy which =
acts similar via HTTP. But both techniques have one major drawback: The =
local copy is always just as up-to-date as often we run the program. It =
would be much better if the mirror is not a static one we have to =
establish explicitly. Instead we want a dynamic mirror with data which =
gets updated automatically when there is need (updated data on the =
remote host).
Solution:=20
To provide this feature we map the remote webpage or even the complete =
remote webarea to our namespace by the use of the Proxy Throughput =
feature (flag [P]):
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^hotsheet/(.*)$ http://www.tstimpreso.com/hotsheet/$1 =
[P]
RewriteEngine on
RewriteBase /~quux/
RewriteRule ^usa-news\.html$ =
http://www.quux-corp.com/news/index.html [P]
Reverse Dynamic Mirror
Description:=20
...=20
Solution:=20
RewriteEngine on
RewriteCond /mirror/of/remotesite/$1 -U
RewriteRule ^http://www\.remotesite\.com/(.*)$ =
/mirror/of/remotesite/$1
Confused yet? While I've got you all in a spin, I'll casually slip in =
some extra information that isn't useful for this problem but I'd like =
to mention it to ^yob anyway... :-POn-the-fly Content-Regeneration
Description:=20
Here comes a really esoteric feature: Dynamically generated but =
statically served pages, i.e. pages should be delivered as pure static =
pages (read from the filesystem and just passed through), but they have =
to be generated dynamically by the webserver if missing. This way you =
can have CGI-generated pages which are statically served unless one (or =
a cronjob) removes the static contents. Then the contents gets =
refreshed.
Solution:=20
This is done via the following ruleset:=20
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^page\.html$ page.cgi =
[T=3Dapplication/x-httpd-cgi,L]
Here a request to page.html leads to a internal run of a corresponding =
page.cgi if page.html is still missing or has filesize null. The trick =
here is that page.cgi is a usual CGI script which (additionally to its =
STDOUT) writes its output to the file page.html. Once it was run, the =
server sends out the data of page.html. When the webmaster wants to =
force a refresh the contents, he just removes page.html (usually done by =
a cronjob).
Cheers,Tyson,
------=_NextPart_000_002B_01C38DA4.D2AA3970
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1264" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Sneeze=20
asked:</FONT></DIV></FONT><FONT face=3DArial size=3D2>> I'm wanting =
to setup some=20
mirrors of Internet websites on a PC. Can anyone<BR>> point me =
towards=20
some good resources on how to do this and what programs I<BR>> might=20
need. I've had a quick look at wget and noticed some =
mirroring<BR>>=20
options but I'm not sure what to do.<BR></DIV></FONT>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>First thing, there are =
2 basic=20
approaches to this:</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>1. Take a snapshot of a =
site and=20
present the snapshot content on your mirror.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>2. Use proxy software =
to dynamically=20
present the other website.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Using 'wget' you will =
end up with a=20
snapshot of the website.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>> Also, is it at all possible to =
mirror a site=20
that's completely scripted or<BR>> PHP? Would the mirroring =
process=20
create static HTML pages in place of the<BR>> original=20
pages?<BR></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>Proxy software =
will be able to=20
store static versions of pages served using 'GET /somefile.php=20
HTTP/1.x' for re-distribution when the cache control mechanisms =
allow=20
it. This means for the majority of PHP scripted pages, the answer =
is=20
NO. Unless the PHP coder has been thoughtful enough to =
specify proxy=20
friendly cache control parameters for pages that are non-dynamic, the =
proxy=20
software will not be able to serve cached pages from the =
site.</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2>It may be possible to =
tell proxy=20
software to disallow access to non-cacheable content, and hence you =
would have=20
an interesting mix of the two - dynamic mirror building of static =
content. =20
See the following for details relating to the Apache 2.0 web server =
(which=20
can do proxy serving):</FONT></DIV>
<DIV><A =
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_proxy.html"><FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_proxy.html</FONT></A></=
DIV>
<DIV><A =
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_disk_cache.html"><FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_disk_cache.html</FONT><=
/A></DIV>
<DIV><A =
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_mem_cache.html"><FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_mem_cache.html</FONT></=
A></DIV>
<DIV><A=20
href=3D"http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html#Solutions">=
<FONT=20
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html#Solutions<=
/FONT></A></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2></FONT><FONT face=3DArial =
size=3D2><FONT=20
color=3D#0000ff>From</FONT> <A=20
href=3D"http://httpd.apache.org/docs-2.0/misc/rewriteguide.html"><FONT =
face=3DArial=20
size=3D2>http://httpd.apache.org/docs-2.0/misc/rewriteguide.html</FONT></=
A> <FONT=20
color=3D#0000ff>we find:</FONT></FONT></DIV>
<H3>Dynamic Mirror</H3>
<DL>
<DT>Description:=20
<DD>
<P>Assume there are nice webpages on remote hosts we want to bring =
into our=20
namespace. For FTP servers we would use the <CODE>mirror</CODE> =
program which=20
actually maintains an explicit up-to-date copy of the remote data on =
the local=20
machine. For a webserver we could use the program <CODE>webcopy</CODE> =
which=20
acts similar via HTTP. But both techniques have one major drawback: =
The local=20
copy is always just as up-to-date as often we run the program. It =
would be=20
much better if the mirror is not a static one we have to establish =
explicitly.=20
Instead we want a dynamic mirror with data which gets updated =
automatically=20
when there is need (updated data on the remote host).</P>
<DT>Solution:=20
<DD>
<P>To provide this feature we map the remote webpage or even the =
complete=20
remote webarea to our namespace by the use of the <DFN>Proxy =
Throughput</DFN>=20
feature (flag <CODE>[P]</CODE>):</P>
<DIV class=3Dexample><PRE>RewriteEngine on
RewriteBase /~quux/
RewriteRule ^<STRONG>hotsheet/</STRONG>(.*)$ =
<STRONG>http://www.tstimpreso.com/hotsheet/</STRONG>$1 =
[<STRONG>P</STRONG>]
</PRE></DIV>
<DIV class=3Dexample><PRE>RewriteEngine on
RewriteBase /~quux/
RewriteRule ^<STRONG>usa-news\.html</STRONG>$ =
<STRONG>http://www.quux-corp.com/news/index.html</STRONG> =
[<STRONG>P</STRONG>]
</PRE></DIV></DD></DL>
<H3>Reverse Dynamic Mirror</H3>
<DL>
<DT>Description:=20
<DD>...=20
<DT>Solution:=20
<DD>
<DIV class=3Dexample><PRE>RewriteEngine on
RewriteCond /mirror/of/remotesite/$1 -U
RewriteRule ^http://www\.remotesite\.com/(.*)$ =
/mirror/of/remotesite/$1
</PRE></DIV></DD></DL>
<DIV class=3Dexample><PRE><FONT face=3DArial color=3D#0000ff =
size=3D2>Confused yet? While I've got you all in a spin, I'll casually =
slip in some extra information that isn't useful for this problem but =
I'd like to mention it to ^yob anyway... :-P</FONT></PRE>
<H3>On-the-fly Content-Regeneration</H3>
<DL>
<DT>Description:=20
<DD>
<P>Here comes a really esoteric feature: Dynamically generated but =
statically=20
served pages, i.e. pages should be delivered as pure static pages =
(read from=20
the filesystem and just passed through), but they have to be generated =
dynamically by the webserver if missing. This way you can have =
CGI-generated=20
pages which are statically served unless one (or a cronjob) removes =
the static=20
contents. Then the contents gets refreshed.</P>
<DT>Solution:=20
<DD>This is done via the following ruleset:=20
<DIV class=3Dexample><PRE>RewriteCond %{REQUEST_FILENAME} =
<STRONG>!-s</STRONG>
RewriteRule ^page\.<STRONG>html</STRONG>$ =
page.<STRONG>cgi</STRONG> [T=3Dapplication/x-httpd-cgi,L]
</PRE></DIV>
<P>Here a request to <CODE>page.html</CODE> leads to a internal run of =
a=20
corresponding <CODE>page.cgi</CODE> if <CODE>page.html</CODE> is still =
missing=20
or has filesize null. The trick here is that <CODE>page.cgi</CODE> is =
a usual=20
CGI script which (additionally to its <CODE>STDOUT</CODE>) writes its =
output=20
to the file <CODE>page.html</CODE>. Once it was run, the server sends =
out the=20
data of <CODE>page.html</CODE>. When the webmaster wants to force a =
refresh=20
the contents, he just removes <CODE>page.html</CODE> (usually done by =
a=20
cronjob).</P></DD></DL><PRE><P><FONT face=3DArial color=3D#0000ff =
size=3D2>Cheers,</FONT></P><P><FONT face=3DArial color=3D#0000ff =
size=3D2>Tyson,</FONT></P></PRE></DIV></BODY></HTML>
------=_NextPart_000_002B_01C38DA4.D2AA3970--
To unsubscribe: send mail to majordomo at wireless.org.au
with "unsubscribe melbwireless" in the body of the message
More information about the Melbwireless
mailing list