Terminology
As mentioned in the first post, this follows my presentation from a year ago on transparent reverse proxies. While I will begin this presentation by covering the basics again, there are several ambiguous and confusing terms involved. So this post will attempt to provide some clarity of terminology, and a written version of the background information.
Open Files
First set of confusing terms is file handle
vs file descriptor
vs file description
.
All three refer to an open file
, which can be a logical file on a filesystem, a socket, a device, everything is a file
.
File Handle
File Handles are a userspace structure for referencing a file (open or not). The key here is userspace. As such, they are process-specific.
File Descriptor
File Descriptors are still process-specific, but they are the kernelspace counterpart to the File Handle. These are presented to userspace as just an integer. File Descriptors can be copied, including between processes. The duplicate File Descriptor will be assigned a new integer. Multiple File Descriptors can point to the same File Description.
File Description
File Descriptions are the internal kernelspace datastructure for an open file. Each open file has exactly one File Description. File Descriptions hold metadata about the opened file (such as read/write mode, pointer offset, underlying location, and so on).
Topology
Load-balanced or Virtual Hosted services are another area of confusing terminology. There is less standardization of terms here, but for the purposes of this presentation, I will use the following.
Load Balancer
A load balancer is a service that splits incoming connections between multiple service providers. This can be at the DNS level, at the application level, or at the edge server. Application and DNS level load balancing split the connections at Layer-4 (TCP/UDP), so the main connection (HTTP GET requests, et cetera) are routed to different logical service providers. When load balancing at the edge server level, all the traffic hits the edge server.
Edge Server
The logical server, after resolving domain names, which accepts incoming connections. When multiple logical servers are involved, often only the edge server listens on the public internet. Note that the Edge Server can run on the same logical machine (physical or virtual) as other logical servers.
Upstream Server
A logical server which handles a subset of incoming connections or requests. These are hidden from the “outside world” behind the edge server. A Reverse Proxy can intelligently select between upstream servers.
Reverse Proxy
A service running on the Edge Server which examines incoming connections and relays or routes them to different upstream servers based on the name of the server the client requested.
Transparent Reverse Proxy
A Reverse Proxy which uses the SSL Server Name Indication
Field or packet inspection to determine the name of the requested Upstream Server without affecting the state of the connection. The connection is then passed (usually by duplicating the File Descriptor for the socket) to the Upstream Server. At the application level, this process is transparent.
Other Terms
Server Name Indication (SNI)
The name field of a TLS header, which indicates in plain-text the desired logical server name.
Named Virtual Host
A Named Virtual Host is a logical website (or other HTTP service) identified by a unique hostname. This is provided by the SNI field of a TLS header, or as the HOST:
line of the HTTP headers of a request.
Notes
-
In the description of Topology, HTTP is mentioned only in passing. Reverse Proxies are used for a wide variety of application-level protocols, and one of the major advantages of Transparent Reverse Proxies is they are application-protocol agnostic (provided the initial handshake has some way for the Reverse Proxy server to determine which Upstream Server to use).
-
Transparent Reverse Proxies are not always superior to HTTP Reverse Proxies. HTTP Reverse Proxies can cache results, are often supported out of the box, and often produce equivalent results. This is not a dig at them, or at those who write or use them.
-
Load balancing (but not Named Virtual Hosting) can also be done by the kernel (starting in Linux 4.something). I’m going to ignore that for expediency.
-
I am assuming all logical services are running on the same logical computer on the same physical computer. It is possible to use a Transparent Reverse Proxy when that is not true, but demonstrating it is more difficult, the setup is often trivialized, and some of the performance benefits of a Transparent Reverse Proxy are lost.