As mentioned in the first post, this follows my presentation from a year ago on transparent reverse proxies. While I will begin this presentation by covering the basics again, there are several ambiguous and confusing terms involved. So this post will attempt to provide some clarity of terminology, and a written version of the background information.
First set of confusing terms is
file handle vs
file descriptor vs
All three refer to an open
file, which can be a logical file on a filesystem, a socket, a device, everything is a
File Handles are a userspace structure for referencing a file (open or not). The key here is userspace. As such, they are process-specific.
File Descriptors are still process-specific, but they are the kernelspace counterpart to the File Handle. These are presented to userspace as just an integer. File Descriptors can be copied, including between processes. The duplicate File Descriptor will be assigned a new integer. Multiple File Descriptors can point to the same File Description.
File Descriptions are the internal kernelspace datastructure for an open file. Each open file has exactly one File Description. File Descriptions hold metadata about the opened file (such as read/write mode, pointer offset, underlying location, and so on).
Load-balanced or Virtual Hosted services are another area of confusing terminology. There is less standardization of terms here, but for the purposes of this presentation, I will use the following.
A load balancer is a service that splits incoming connections between multiple service providers. This can be at the DNS level, at the application level, or at the edge server. Application and DNS level load balancing split the connections at Layer-4 (TCP/UDP), so the main connection (HTTP GET requests, et cetera) are routed to different logical service providers. When load balancing at the edge server level, all the traffic hits the edge server.
The logical server, after resolving domain names, which accepts incoming connections. When multiple logical servers are involved, often only the edge server listens on the public internet. Note that the Edge Server can run on the same logical machine (physical or virtual) as other logical servers.
A logical server which handles a subset of incoming connections or requests. These are hidden from the “outside world” behind the edge server. A Reverse Proxy can intelligently select between upstream servers.
A service running on the Edge Server which examines incoming connections and relays or routes them to different upstream servers based on the name of the server the client requested.
Transparent Reverse Proxy
A Reverse Proxy which uses the SSL
Server Name Indication Field or packet inspection to determine the name of the requested Upstream Server without affecting the state of the connection. The connection is then passed (usually by duplicating the File Descriptor for the socket) to the Upstream Server. At the application level, this process is transparent.
Server Name Indication (SNI)
The name field of a TLS header, which indicates in plain-text the desired logical server name.
Named Virtual Host
A Named Virtual Host is a logical website (or other HTTP service) identified by a unique hostname. This is provided by the SNI field of a TLS header, or as the
HOST: line of the HTTP headers of a request.
In the description of Topology, HTTP is mentioned only in passing. Reverse Proxies are used for a wide variety of application-level protocols, and one of the major advantages of Transparent Reverse Proxies is they are application-protocol agnostic (provided the initial handshake has some way for the Reverse Proxy server to determine which Upstream Server to use).
Transparent Reverse Proxies are not always superior to HTTP Reverse Proxies. HTTP Reverse Proxies can cache results, are often supported out of the box, and often produce equivalent results. This is not a dig at them, or at those who write or use them.
Load balancing (but not Named Virtual Hosting) can also be done by the kernel (starting in Linux 4.something). I’m going to ignore that for expediency.
I am assuming all logical services are running on the same logical computer on the same physical computer. It is possible to use a Transparent Reverse Proxy when that is not true, but demonstrating it is more difficult, the setup is often trivialized, and some of the performance benefits of a Transparent Reverse Proxy are lost.