Apache Proxypass

Many solutions today are built using highly available configurations that can easily scale. Setting up a solution to scale is easy, but getting your web application to work correctly with a multi-server configuration can be difficult as not everyone has access to a quality shared storage solution that is fast and reliable.

In many web applications such as WordPress, you typically want all your wp-admin traffic to go to the master server. There are probably a dozen ways to go about this, many of which get very over complicated with wacky Varnish configurations handling the redirection, or even with Nginx.

These is where ProxyPass can offer a cleaner alternative. ProxyPass allows you to take a request for a specific URL, and forward it to another server, which would be known as your backend server, or your master web server.

This guide will assume that you are performing this on all web servers in the solution, unless otherwise specified. The specific examples are for a WordPress based solution, but it can be easily adapted for other CMS’s.

To get started, first ensure that mod_proxy is installed:

# CentOS 6
[root@web01 ~]# yum install mod_proxy_html
[root@web01 ~]# service httpd restart
[root@web01 ~]# httpd -M |grep proxy
 proxy_module (shared)
 proxy_balancer_module (shared)
 proxy_ftp_module (shared)
 proxy_http_module (shared)
 proxy_connect_module (shared)
 proxy_ajp_module (shared)

# Ubuntu 12.04 and 14.04
[root@web01 ~]# apt-get update
[root@web01 ~]# apt-get install libapache2-mod-proxy-html
[root@web01 ~]# a2enmod proxy proxy_http

There are several ways you can proceed from here. I’ll post them out as ‘options’ below. Each one basically accomplishes the same thing, but one may work better for your environment than another.

So no matter which of the 3 options you go with, always be sure to rigorously test it before implementing it in production!

Option 1: Easy – Define master server based off the URI in each Apache Vhost

This example is simple. In each Apache Vhost, add the following lines on each slave web server to point wp-admin and wp-login.php to your master server, which in this case is 192.168.2.1:

# CentOS 6
[root@web02 ~]# vim /etc/httpd/vhost.d/example.com.conf

# Ubuntu 12.04 and 14.04
[root@web02 ~]# vim /etc/apache2/sites-enabled/example.com.conf
...
ProxyPreserveHost On
        ProxyRequests Off
        ProxyPassMatch ".*/wp-admin.*" "http://192.168.2.1"
        ProxyPassMatch ".*/wp-login.php" "http://192.168.2.1"
...

Option 2: Advanced – Define master server based off URI using location blocks in each Apache Vhost

This example is slightly more advanced. In each Apache Vhost, add the following location blocks to point wp-admin and wp-login.php to your master server, which in this case is 192.168.2.1. We’re also manually defining the host header within these location blocks, which gives you the option to start excluding specific items if needed:

# CentOS 6
[root@web02 ~]# vim /etc/httpd/vhost.d/example.com.conf

# Ubuntu 12.04 and 14.04
[root@web02 ~]# vim /etc/apache2/sites-enabled/example.com.conf
...
ProxyRequests Off
  ProxyPreserveHost Off
  ProxyVia Off
  <Location "/wp-login.php">
    Header set "Host" "www.example.com"
    ProxyPass http://192.168.2.1/wp-login.php
    ProxyPassReverse http://192.168.2.1/wp-login.php
  </Location>
  <Location "/wp-admin">
    Header set "Host" "www.example.com"
    ProxyPass http://192.168.2.1/wp-admin
    ProxyPassReverse http://192.168.2.1/wp-admin
  </Location>

Option 3: Complex – Define master server in global Apache configuration, and only send over POST requests for wp-admin

This example is more complex. You are defining the master server (192.168.2.1) in your global Apache configuration, then configuring each Apache Vhost to only send over POST requests for wp-admin to the master server.

Setup proxypass so it knows which server is the master web server. Be sure to update the IP so its the IP address of your master web server:

# CentOS 6
[root@web01 ~]# vim /etc/sysconfig/httpd
...
OPTIONS="-DSLAVE"
export MASTER_SERVER="192.168.2.1"
...

# Ubuntu 12.04 and 14.04
[root@web01 ~]# /etc/apache2/envvars
...
export APACHE_ARGUMENTS="-DSLAVE"
export MASTER_SERVER="192.168.2.1"
...

Now on your slave web servers, we need to update the site’s vhost configuration to proxy the requests for /wp-admin so they will route to the master web server:

# CentOS 6
[root@web02 ~]# vim /etc/httpd/vhost.d/example.com.conf

# Ubuntu 12.04 and 14.04
[root@web02 ~]# vim /etc/apache2/sites-enabled/example.com.conf
...
<IfDefine SLAVE>
RewriteEngine On
ProxyPreserveHost On
ProxyPass /wp-admin/ http://${MASTER_SERVER}/wp-admin/
     ProxyPassReverse /wp-admin/ http://${MASTER_SERVER}/wp-admin/
RewriteCond %{REQUEST_METHOD} =POST
     RewriteRule . http://${MASTER_SERVER}%{REQUEST_URI} [P]
</IfDefine>
...

# CentOS 6
[root@web02 ~]# service httpd restart

# Ubuntu 12.04 and 14.04
[root@web02 ~]# service apache2 restart

That slave server(s) should now start proxying the /wp-admin requests and sending them over to the master web server. Please be sure to test this out and check your logs to ensure /wp-admin POST requests are now routing to the master web server.

Apache mod_status module

The Apache mod_status module is one that I don’t hear about much anymore, but it is something that can be very useful when troubleshooting high CPU or Memory usage with Apache.

Taken it directly from the Apache documentation, mod_status provide you with details such as:

- The number of worker serving requests.
- The number of idle worker.
- The status of each worker, the number of requests that worker has performed and the total number of bytes served by the worker.
- A total number of accesses and byte count served.
- The time the server was started/restarted and the time it has been running for.
- Averages giving the number of requests per second, the number of bytes served per second and the average number of bytes per request.
- The current percentage CPU used by each worker and in total by all workers combined.
- The current hosts and requests being processed.

Setting it up is simple. It only gets a bit complicated explaining it in a single blog post for multiple operating systems as it gets stored in different places depending on which distro your using.

I’ll outline the configuration location that it needs to be placed in below:

# CentOS 6 / CentOS 7
[root@web01 ~]# vim /etc/httpd/conf.d/status.conf

# Ubuntu 12.04
[root@web01 ~]# vim /etc/apache2/conf.d/status.conf

# Ubuntu 14.04
[root@web01 ~]# vim /etc/apache2/conf-available/status.conf

Using the correct location for your distro as shown above, use the following configuration to enable mod_status. Please be sure to update the AuthUserFile line accordingly for your distro:

<IfModule mod_status.c>
#
# ExtendedStatus controls whether Apache will generate "full" status
# information (ExtendedStatus On) or just basic information (ExtendedStatus
# Off) when the "server-status" handler is called. The default is Off.
#
ExtendedStatus On

# Allow server status reports generated by mod_status,
# with the URL of http://servername/server-status
# Uncomment and change the ".example.com" to allow
# access from other hosts.
#
<Location /server-status>
     SetHandler server-status
     Order deny,allow
     Deny from all
     Allow from localhost ip6-localhost
     <IfModule mod_rewrite.c>
          RewriteEngine off
     </IfModule>
     Allow from 127.0.0.1

# On CentOS / RedHat systems, uncomment the following line
     AuthUserFile /etc/httpd/status-htpasswd

# On Debian / Ubuntu systems, uncomment the following line
#     AuthUserFile /etc/apache2/status-htpasswd

     AuthName "Password protected"
     AuthType Basic
     Require valid-user

     # Allow password-less access for allowed IPs
     Satisfy any
</Location>

</IfModule>

Once you have the configuration in place, you can secure it with a username and password, and then enable it by:

# CentOS 6 / CentOS 7
[root@web01 ~]# htpasswd -c /etc/httpd/status-htpasswd serverinfo
[root@web01 ~]# service httpd restart

# Ubuntu 12.04
[root@web01 ~]# htpasswd -c /etc/apache2/status-htpasswd serverinfo
[root@web01 ~]# service apache2 restart

# Ubuntu 14.04
[root@web01 ~]# htpasswd -c /etc/apache2/status-htpasswd serverinfo
[root@web01 ~]# a2enconf status.conf
[root@web01 ~]# service apache2 restart

Now that mod_status is enabled and working when going to http://serverip/server-status, how can it help with troubleshooting?

Lets say you look at top, and you consistently see an Apache process maxing out a CPU, or using up a ton of memory. You can cross-reference the PID of that Apache child process against the same PID that you find within the server-status page. The requests are constantly changing, so you may need to refresh the /server-status page a couple of times to catch it.

To aid in the troubleshooting as you are trying to match up pids against what is shown in top, you can have the /server-status page refresh automatically by using the following in the URL:

http://serverip/server-status?refresh=2

Once you do locate it, it may give you some idea of what client, or what types of requests, are causing the resource contention issues. Usually it is a specific web application misbehaving, or a specific client is attacking a site.

SSL terminated load balancer causing redirect loops

You have an environment that is terminating SSL on the load balancer for one reason or another. Your application, such as WordPress or Magento, is configured to force SSL. But when you go to test out the site or the admin portal, you get a redirect loop. What happened?

This is a very common issue. In most cases when you are terminating SSL at the load balancer, the load balancer will send the traffic over to your web server using HTTP. This can confuse the web application since it was expecting it to be over HTTPS, and the application will not be able to tell that the client’s browser was in fact using HTTPS, which will result in a redirect loop.

The solution to this is actually very simple. You need to ask your load balancer to send the X-Forwarded-Proto header. This can easily be done by adding a SetEnvIf directive into your .htaccess (assuming Apache here), which will set the header to be what your application was expecting.

To account for this, at the top of your site’s .htaccess file, add the following:

[root@web01 ~]# vim /var/www/vhosts/www.domain.com/.htaccess
...
# Detect the LB header and set the header accordingly for the application
SetEnvIf X-Forwarded-Proto https HTTPS=on
...

So in summary, this will prevent your application from getting confused regarding if the connection originated over HTTP or HTTPS since the load balancer is handling the SSL termination, not the server.

Apache Mod_Rewrite tutorial

Apache Mod_Rewrite provides the ability to rewrite incoming requests to different destinations. It can be a bit complicated to wrap your head around the syntax, but once you get used to it, it becomes very powerful.

Rewrite rules can be setup in 2 places, the Apache VirtualHost configuration, or within the website’s .htaccess file. Modifying the .htaccess file allows you to change the rewrite rules on the fly without restarting Apache. However it comes at a steep cost. When using an .htaccess file, each time someone visits your page, the .htaccess file must be processed. So for large rulesets, this could add a substantial CPU load on your web server.

Taken directly from Apache’s website:

You should avoid using .htaccess files completely if you have access to httpd main server config file. Using .htaccess files slows down your Apache http server. Any directive that you can include in a .htaccess file is better set in a Directory block, as it will have the same effect with better performance.

To avoid the performance penalty, I prefer to keep the rules within the Apache VirtualHost configuration. When Apache is restarted, the rules are read into memory, which allows for faster processing.

Rewrite rule example – Step by step

I prefer to learn by example. So lets take an example that is going to be slightly more complicated then it needs to be on purpose to illustrate the syntax.

All requests to http://www.domain.com should be redirected to http://192.168.1.100.  Concurrently, https should also redirect to https://192.168.1.100.

A possible solution to put into the Apache VirtualHost configuration would be:

RewriteEngine On
RewriteCond %{SERVER_PORT} ^80$
RewriteCond %{HTTP_HOST} ^domain.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.domain.com$ [NC]
RewriteRule ^/(.*) http://192.168.1.100/$1 [R,L]

RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{HTTP_HOST} ^domain.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.domain.com$ [NC]
RewriteRule ^/(.*) https://192.168.1.100/$1 [R,L]

To explain some of the common regex used:

1.  ^ --> String starts with
2.  !^ --> String does not start with
3.  % --> server variable
4.  [NC,OR]  --> This means 'NC' Not case sensitive 'OR' It can either match this or the line below.  If you want it to match both rules, then just use 'NC'  Using an 'AND' statement would break the rule, as the policy already defaults to 'AND'.
5.  [R,L]  This means:  rewrite, last.  So basically it is doing the rewrite and it saying this is the last part of the condition, so end the loop.  
6.  ^/(.*) --> This means anything and everything matching the conditions above (used in this example in the rewriterule)

Easy right? Yea, I know it is not. But it makes more sense the more you have to use it. So let me explain the solution above line by line:

First, we need to enable the rewrite engine:

RewriteEngine On

Now put in the first test condition. For the purposes of this article, we are going to simplify this request by testing for anything coming in over port 80:

RewriteCond %{SERVER_PORT} ^80$

Then we need to test a condition to actually match the URL we are looking to do the rewrite on. In this case, it is domain.com. Now, you should be sure to set the ‘NC’ (non case) so we don’t break the rule if someone types in the request in capital letters. You want to test for both domain.com and www.domain.com. This is done by:

RewriteCond %{HTTP_HOST} ^domain.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.domain.com$ [NC]

Next we write the actual rewrite statement. This is saying, take the result of the above and point it to what we specify here. So to talk this out, it is rewriting anything that matches whats above ^/(.*) and redirecting it to the specified address. The [R,L] rewrite the url ‘R’, and end the statement as it is the last rule ‘L’.

RewriteRule ^/(.*) http://192.168.1.100/$1 [R,L]

Finally, don’t forget to apply all this to https on port 443! You use the same explanations as above for figuring this out.

Common rewrite examples

Enough of the obscene examples and explanations. Below is a bunch of common (and not so common) examples for quick reference:

Rewrite domain.com to www.domain.com:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}$1 [R=301,L]

Rewrite www.domain.com to domain.com:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.domain.com [NC]
RewriteRule ^(.*)$ http://domain.com$1 [L,R=301]

Force SSL on your domain, meaning take any http requests and redirect them to https:

RewriteEngine On
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^(.*)$ https://%{HTTP_HOST}$1 [R=301,L]

Force SSL on your domain, meaning take any http requests and redirect them to https when using an SSL terminated load balancer:

RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule ^(.*)$ https://www.domain.com$1 [R=301,L]

Rewrite one domain to another:

RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?domain1.com [NC]
RewriteRule ^(.*)$ http://domain2.com$1 [R=301,L]

Redirect a path to a new domain:

Redirect 301 /path http://new.domain.com
Redirect 301 /otherpath/somepage.php http://other.domain.com

Rewrite page with query string to a different page and include query string. Please note, this must be in the Apache VirtualHost config, it will not work in the .htaccess.

# This will redirect www.example.com/products?sku=xxx over to www.example.com/products/sku/xxx
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/products [NC]
RewriteCond %{QUERY_STRING} ^sku=(.*)
RewriteRule (.*) https://www.example.com/products/sku/%1? [R=301,L]

Rewrite page with query string, and strip query string:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteCond %{REQUEST_URI} ^/pages/pages\.php$
RewriteCond %{QUERY_STRING} ^page=[0-9]*$
RewriteRule ^.*$ http://www.domain.com/path/? [R=301,L]

Force all URL’s to be lowercase. Please note, this must be in the Apache VirtualHost config, it will not work in the .htaccess.

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) ${lc:$1} [R=301,L]

Exclude phpmyadmin from being rewritten by existing rules (Place this above the problem rule):

RewriteCond %{REQUEST_URI} !=/phpmyadmin

Disable TRACE and TRACK methods:

RewriteEngine On
RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
RewriteRule .* - [F]

Rewrite images to Cloud Files. Please note, this must be in the Apache VirtualHost config, it will not work in the .htaccess.
* Note: The RewriteRule and URL belong on the same line.

<Directory /var/www/vhosts/domain.com/content/images>
RewriteEngine On
RewriteRule ^(.*)$ http://c0000.cdn00.cloudfiles.rackspacecloud.com/$1 [R=302,L]
</Directory>

Rewrite all pages to a maintenance page

Options +FollowSymlinks
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} !/maintenance.html$
RewriteRule .* /maintenance.html [L]

RHCSA Study Guide – Objective 8 : Web Services

############################
Everything below are my raw notes that I took while attending an unofficial RHCSA training session.  I am posting them here in hopes they will assist others who may be preparing to take this exam.  

My notes are my own interpretation of the lectures, and are certainly not a replacement to classroom training either through your company, or by taking the official RHCSA classes offered through Red Hat.  If you are new to the Red Hat world, I strongly suggest looking into their training courses over at Red Hat.
############################

Apache

Apache is the default web server on RHEL6. The default configuration file exists at:

/etc/httpd/conf/httpd.conf

In Apache2, the /etc/httpd/conf.d directory stores configuration that are specific to a particular Apache module. All files in this directory ending in .conf will be parsed as a configuration file.

Basic apache vhost default:

< VirtualHost blah.com>
        ServerName blah.com
        ServerAlias www.blah.com
        DocumentRoot /var/www/vhosts/www.blah.com
        CustomLog /var/log/httpd/blah.com.access
        ErrorLog /var/log/httpd/blah.com.error
< VirtualHost>

Apache supports 3 types of virtual hosting:

- IP based hosting : (All sites have different IP's)
- Port based virtual hosting : (Can use the port to tell the server were to go to.  ie. google.com:33333
- Name based virtual hosting : (most popular, as apache looks at the host header and directs the name to the vhost container and match)

Additional docs:

[root@web01 ~]# yum install httpd-manual
[root@web01 ~]# service httpd restart
[root@web01 ~]# firefox localhost:/manual

Lab

1.  Configure two websites on your server.  "X" represents your station #. 

2.  wwwX.example.com should be served from the /var/www/html and should also respond to requests for the short hostname wwwX.

3.  vhostX.example.com should be served from /home/linus/html and should also respond to requests for the short hostname vhostX.

4.  Both should be listening on your primary ip address, but wwwX.exmaple.com should be the default site.

** Too much to post answers here... but its really straight forward.  Just watch selinux, and perms on /home/linus

Securing Apache

2 directives for setting up access controls

-  allow from (host|network|ALL)
-  deny from (host|network|ALL)

These are applied in the given order:

1.  order allow,deny : Allows explicitly allowed clients and denies everyone else.  Anyone matching both deny and allow are denied.

2.  order deny,allow : Denies explicitly denied clients and allows everyone else.  Anyone matching both deny and allow are allowed.

These directives are placed inside one of the following tags:

< Directory>
< File>

In theory, its best to keep these as global variables in the httpd.conf. You have to remember that you are protecting your data, your files and directories, so its best to keep these secured against all vhosts… so you set them globally. In other words, set it OUTSIDE the vhost tag.