So you want to run ProFTPD on an AWS EC2 instance? Due to FTP's nature as a multi-connection protocol, it is not as straightforward to use FTP within AWS EC2, but it can be done. Read on to find out how. Note that the following documentation assumes that you know how to install and configure ProFTPD already. If you are only running individual FTP servers, then the sections on AWS security groups and addresses are relevant. If you want to provide a "scalable" pool/cluster of FTP servers, then the AWS Elastic Load Balancing and AWS Route53 sections will also be of interest.
Security Groups Every EC2 instance belongs to one or more AWS Security Groups (often abbreviated as simply "SGs"). As the AWS documentation states, a "security group" is a effectively a set of firewall rules controlling network access to your EC2 instance. I tend to think of SGs more like NAT rules, since the "firewall" is the EC2 network perimeter managed by Amazon, and an SG dictates what holes to allow from the outside world into the EC2 internal networks.
Clients wishing to make a connection to the proftpd running on your EC2 instance, be it FTP, FTPS, SFTP, or SCP, will thus need to be allowed to connect by one (or more) of your SGs. Assuming your proftpd listens on the standard FTP control port (21), you would configure one of your SGs to allow access to that port, from any IP address, using the AWS CLI like so:
proftpd
$ aws ec2 authorize-security-group-ingress \ --group-id sg-XXXX \ --protocol tcp \ --port 21 \ --cidr 0.0.0.0/0
If you are allowing SFTP/SCP connections, e.g. to your proftpd, running the mod_sftp module on the standard SSH port (22):
mod_sftp
$ aws ec2 authorize-security-group-ingress \ --group-id sg-YYYY \ --protocol tcp \ --port 22 \ --cidr 0.0.0.0/0
If you are only allowing SFTP/SCP access, that should suffice for the security group configuration for your instance. Allowing FTP/FTPS connections requires more security group tweaks.
FTP uses multiple TCP connections: one for the control connection, and separate other connections for data transfers (directory listings and file uploads/downloads). The ports used for these data connections are dynamically negotiated over the control connection; it is this dynamic nature of the data connections which causes complexity with network access rules. This site does a great job of describing these issues more in detail:
http://slacksite.com/other/ftp.html
We want to configure ProFTPD to use a known range of ports for its passive data transfers, and then we want to configure our FTP SG to allow access to that known port range. Thus we would use something like this in the proftpd.conf:
proftpd.conf
PassivePorts 60000 65535
$ aws ec2 authorize-security-group-ingress \ --group-id sg-XXXX \ --protocol tcp \ --port 60000-65534 \ --cidr 0.0.0.0/0
Public vs Private Instance Addresses Every EC2 instance with have its own local/private IP address and DNS name, automatically assigned by AWS. Instances may also be automatically assigned public IP addresses/DNS names as well, depending on various factors. The AWS docs on instance addressing discuss those factors in greater detail.
If your EC2 instance will be supporting FTP/FTPS sessions, then you will need to determine whether your instance has a public address. If so, that address needs to be configured using the MasqueradeAddress directive. Why? When an FTP client negotiates a passive data transfer, ProFTPD tells that FTP client an address, and a port, to which to connect to transfer the data. For EC2 instances with a public address, that public address is what ProFTPD needs to convey to the FTP client, and the MasqueradeAddress is the directive that does so.
MasqueradeAddress
So how can you tell what the public address of your EC2 instance is, if it even has one? You can use the EC2 instance metadata, via curl, like so:
curl
$ curl http://169.254.169.254/latest/meta-data/public-hostname
$ curl http://169.254.169.254/latest/meta-data/public-hostname <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>404 - Not Found</title> </head> <body> <h1>404 - Not Found</h1> </body> </html>
Here's one solution for handling this situation: obtain the public hostname for your instance, store it in an environment variable, and then use that environment variable in your proftpd.conf:
$ export EC2_PUBLIC_HOSTNAME=`curl -f -s http://169.254.169.254/latest/meta-data/public-hostname`
-f
-s
MasqueradeAddress %{env:EC2_PUBLIC_HOSTNAME}
EC2_PUBLIC_HOSTNAME
PROFTPD_ARGS="" # If we have a public hostname, then the string will not be # zero length, and we define a property for ProFTPD's use. if [ ! -z "$EC2_PUBLIC_HOSTNAME" ]; then PROFTPD_ARGS="$PROFTPD_ARGS -DUSE_MASQ_ADDR" fi
<IfDefined USE_MASQ_ADDR> MasqueradeAddress %{env:EC2_PUBLIC_HOSTNAME} </IfDefined>
Fortunately the EC2 instance addressing does not require any additional changes/tweaks to the AWS Security Groups.
Elastic Load Balancing Now that you have ProFTPD up and running on your EC2 instance, and you can connect using FTP/FTPS and SFTP/SCP, and browse directories and upload and download files, you are probably thinking about how to have more than one instance for your FTP service. After all, you want redundancy for your FTP servers just like you have for your HTTP servers, right? And for HTTP servers, you would use an AWS Elastic Load Balancer (often called an "ELB"). Why not use the same technique for FTP? Can you configure an ELB for FTP?
Yes, ELBs can be used for FTP. Like SGs, though, it's complicated by FTP's use of multiple TCP connections; for SFTP/SCP, ELBs are simpler to configure.
The first thing to keep in mind is that ELBs only distribute (i.e. "balance") connections in a round-robin fashion among the backend TCP servers; they do not distribute connections based on the load of those backend servers. (The balancing algorithm is slightly different for HTTP servers, but that does not apply to ProFTPD.) This means that any user might connect to any of your ProFTPD instances; this, in turn, means that users must be able to login on all instances, and that the files for all users should be available on all instances. These requirements lead to the requirements for centralized/shared authentication data, and for shared filesystems. The centralized/shared authentication data can be handled by using e.g. SQL databases, LDAP directories, or even synchronized password files. For shared filesystems, the popular approaches are:
The next thing to keep in mind is whether you have an EC2 Classic account, or whether you are using AWS VPC. Chances are that you are using a VPC. ELBs for an EC2 Classic account can only be configured to listen on a restricted list of ports, i.e.:
Let's assume that you are using a VPC, and thus you configure a TCP listener on your ELB for port 21, which uses the instance port 21. And for SFTP/SCP, it would be a TCP listener for port 22, using instance port 22. Obviously you would not use HTTP or HTTPS listeners, but what about an SSL listener, for FTPS? No. An SSL listener performs the SSL/TLS handshake first, then forwards the plaintext messages to the backend instance. But FTPS is a "STARTTLS" protocol, which means the connection is first unencrypted, and then feature negotiation happens on that connection, and then the SSL/TLS handshake happens. ELBs do not support STARTTLS protocols, thus you cannot use them for terminating SSL/TLS sessions for FTP servers.
Your ProFTPD configuration might use multiple different ports, for different <VirtualHost>s. Your ELB would need a different TCP listener for each of those separate ports. However, now that ProFTPD supports the FTP HOST command (which allows for proper name-based virtual hosts in FTP, just like HTTP 1.1 has via its Host header), you should only need on TCP listener now.
<VirtualHost>
HOST
Host
An ELB wants to perform health checks on its backend instances, to know that that instance is up, running, and available to handle connections. ELBs can perform HTTP requests as healthchecks, or make TCP connections. ProFTPD is not an HTTP server, so using TCP health checks is necessary. You would configure the ELB to make TCP connections to ProFTPD port, e.g. port 21 for FTP/FTPS, and/or port 22 for SFTP/SCP.
What about the range of ports defined via PassivePorts, that you had to allow in your SG? Does your ELB need TCP listeners for all of those ports, too? No. To understand why, we need to examine in detail just how passive data transfers work in FTP. An FTP client connects to your FTP server, through the ELB, like this, for its control connection:
PassivePorts
client --- ctrl ---> ELB:21 --- ctrl ---> instance:21
client --- data ---> ELB:65000 --- data ---> instance:65000
If your ELB will only ever have just one backend instance, then the above configuration would work. Your EC2 instance might be in a VPC, with no public address, and thus perhaps the only way to make your FTP server there reachable is using an ELB. Where forcing passive data connections through an ELB starts to fail is when there are multiple backend instances. Consider the case where your ELB might have 3 instances:
+--> instance1:21 ELB:21 --|--> instance2:21 +--> instance3:21
client --- ctrl ---> ELB:21 --- ctrl ---> instance2:21
client --- data ---> ELB:65000 --- data ---> instance3:65000
In order to properly support multiple backend instances (which is one of the goals/benefits of using an ELB in the first place) for FTP, then, the trick is to not force data connections through the ELB. Instead, the MasqueradeAddress directive points to each backend instance's respective public hostname. With this configuration, the FTP client connects to the ELB for its control connection, like usual:
client -------------- data -------------> instance2:65000
Now you have an ELB with multiple backend FTP servers. Success, right? Maybe. There are some caveats. FTP clients might notice that they connect to one name (the ELB DNS name), but for data transfers, they are being told (by the FTP server) to connect to a different name; some FTP clients might warn/complain about this mismatch. ProFTPD would definitely complain about this mismatch, for it would see the control connection as originating from the ELB, but the data connection originating from a different address, and would refuse the data transfer. To allow data transfers to work, then, you would need to add the following to your proftpd.conf:
# Allow "site-to-site" transfers, since that is what FTP traffic with # an ELB looks like. AllowForeignAddress on
Next, there is the ELB idle timeout setting to adjust. The default is 60 seconds. During a data transfer, most FTP clients will be handling the data connection, and the control connection is idle. Thus if the data transfer lasts longer than 60 seconds, the ELB might terminate the idle control connection, and the FTP session is lost. Unfortunately the maximum allowed idle timeout for ELBs is 1 hour (3600 seconds); for large (or slow) data transfers, even that timeout could be a problem. There are ways of keeping the control connection from being idle for too long, using keepalives. Note that this idle timeout is not really an issue for SFTP/SCP sessions, as all data transfers for them use the same single TCP connection.
Last, using an ELB only for FTP control connections, and using direct connections for the FTP data transfers only works if your backend EC2 instances have public hostnames; for instances in a VPC, that may not be true. So how can we use an ELB for multiple backend instances that only have private addresses? Sadly, the answer is: you can't. For load balancing FTP sessions among multiple backend EC2 instances with private addresses, you need an FTP-aware proxy, such as ProFTPD with the mod_proxy module. This means running your own instance for doing that load balancing, rather than having AWS manage it. Of course, if the clients using your ELB for FTP services are also within your VPC, then the lack of public hostnames for your EC2 instances is not an issue, and using an ELB as described above will work.
mod_proxy
DNS and AWS Route53 Using an ELB for balancing connections across your pool of FTP servers is rather complex. Are there alternatives? Yes: "DNS load balancing". Instead of using an AWS ELB for balancing/distributing connections across your pool of ProFTPD-running instances, you can use DNS tricks to implement the same functionality. Note, however, these DNS tricks still assume that your EC2 instances are publicly reachable, i.e. have public hostnames. With DNS load balancing, the client resolves a DNS name to an IP address, and connects to that IP address: client1 ----------------- ctrl ----------------> instance2:21 client1 ----------------- data ----------------> instance2:65000 But the DNS server might be configured with several IP addresses for the same DNS name; the client then chooses one IP address from the given list (usually the first address), and connects to that. Some DNS servers will shuffle the list of returned addresses for a name, so that clients will choose different addresses, and thus distribute/balance their connections across all of the addresses: client1 ----------------- ctrl ----------------> instance2:21 client1 ----------------- data ----------------> instance2:65000 client2 ----------------- ctrl ----------------> instance1:21 client2 ----------------- data ----------------> instance1:65000 client3 ----------------- ctrl ----------------> instance3:21 client3 ----------------- data ----------------> instance3:65000 Within AWS, the Route53 service can be used as the DNS service for your domain names. AWS Route53 calls this round robin of addresses a weighted routing policy, as each address associated with a name can be given a "weight", affecting the probability that that address will be returned, by Route53, when the DNS name is resolved to an IP address. Other routing policies are supported, e.g. latency-based routing (so that the instance with the fastest response time is chosen), and geolocation-based routing (the instance address chosen is based on the location of the resolving client). If you are using AWS Route53, then you will need to configure health checks, just as you would for an ELB. Route53 supports TCP health checks, which you would point at your FTP/FTPS port (21) or SFTP/SCP port (22) on your instances. Since any/all clients could connect to any/all of the EC2 instances associated with your DNS name, all of the users would need to be able to login on any instance, and have their files/data available. Thus using a shared filesystem for the files (such as s3fs, NFS, Samba, gluster, etc) and a centralized/shared authentication mechanism (e.g. SQL database, LDAP directory, etc) would be needed. Future Work In order to automate much of the above manual steps, work is progressing on a mod_aws module for ProFTPD, which will eventually: automatically set PassivePorts for FTP/FTPS vhost, if needed automatically set MasqueradeAddress if needed automatically adjust Security Group rules for FTP/FTPS, SFTP/SCP in addition to other interactions with AWS services. Frequently Asked Questions Question: I need to send particular users only to a particular instance/set of instances. How do I configure AWS to do this? Answer: Short answer: you cannot. But it can be done! The AWS services like ELBs and Route53 understand TCP connections, and the HTTP protocol, but they do not understand FTP. And understanding of the protocol is necessary, so that you know how/when to expect the user name, and how to redirect/proxy the backend connection. This is why you cannot use AWS to do per-user balancing. However, you can use the mod_proxy module for ProFTPD, which is protocol-aware, and thus can balance FTP/FTPS connections in multiple ways, including per-user. Question: I am using ELBs for my pool of ProFTPD servers. I would like my logs to show the IP address of the connecting clients, but all I get is the IP address of the ELB. Is there a way to get the original IP address, an equivalent to the X-Forwarded-For HTTP header? Answer: Yes, there is an equivalent mechanism that is supported by ELBs for TCP listeners: the PROXY protocol. To enable use of the PROXY protocol by your ELB, see here. You will also need to tell ProFTPD to expect the PROXY protocol, which means using the mod_proxy_protocol module. The PROXY protocol, and the mod_proxy_protocol module, work equally well for FTP/FTPS and SFTP/SCP sessions. Question: Should I run a firewall on my instance as well? Answer: It is considered a good network security practice to do so, as it provides security in depth. However, care must be taken with those firewall rules; they need to allow the same ports/ addresses as your SGs. (Also note that local/instance firewall rules CANNOT be applied to the connecting client's IP address when connecting through ELB.) © Copyright 2017 The ProFTPD Project All Rights Reserved
Instead of using an AWS ELB for balancing/distributing connections across your pool of ProFTPD-running instances, you can use DNS tricks to implement the same functionality. Note, however, these DNS tricks still assume that your EC2 instances are publicly reachable, i.e. have public hostnames.
With DNS load balancing, the client resolves a DNS name to an IP address, and connects to that IP address:
client1 ----------------- ctrl ----------------> instance2:21 client1 ----------------- data ----------------> instance2:65000
client1 ----------------- ctrl ----------------> instance2:21 client1 ----------------- data ----------------> instance2:65000 client2 ----------------- ctrl ----------------> instance1:21 client2 ----------------- data ----------------> instance1:65000 client3 ----------------- ctrl ----------------> instance3:21 client3 ----------------- data ----------------> instance3:65000
Within AWS, the Route53 service can be used as the DNS service for your domain names. AWS Route53 calls this round robin of addresses a weighted routing policy, as each address associated with a name can be given a "weight", affecting the probability that that address will be returned, by Route53, when the DNS name is resolved to an IP address. Other routing policies are supported, e.g. latency-based routing (so that the instance with the fastest response time is chosen), and geolocation-based routing (the instance address chosen is based on the location of the resolving client).
If you are using AWS Route53, then you will need to configure health checks, just as you would for an ELB. Route53 supports TCP health checks, which you would point at your FTP/FTPS port (21) or SFTP/SCP port (22) on your instances.
Since any/all clients could connect to any/all of the EC2 instances associated with your DNS name, all of the users would need to be able to login on any instance, and have their files/data available. Thus using a shared filesystem for the files (such as s3fs, NFS, Samba, gluster, etc) and a centralized/shared authentication mechanism (e.g. SQL database, LDAP directory, etc) would be needed.
Future Work In order to automate much of the above manual steps, work is progressing on a mod_aws module for ProFTPD, which will eventually:
mod_aws
Frequently Asked Questions Question: I need to send particular users only to a particular instance/set of instances. How do I configure AWS to do this? Answer: Short answer: you cannot. But it can be done! The AWS services like ELBs and Route53 understand TCP connections, and the HTTP protocol, but they do not understand FTP. And understanding of the protocol is necessary, so that you know how/when to expect the user name, and how to redirect/proxy the backend connection. This is why you cannot use AWS to do per-user balancing. However, you can use the mod_proxy module for ProFTPD, which is protocol-aware, and thus can balance FTP/FTPS connections in multiple ways, including per-user. Question: I am using ELBs for my pool of ProFTPD servers. I would like my logs to show the IP address of the connecting clients, but all I get is the IP address of the ELB. Is there a way to get the original IP address, an equivalent to the X-Forwarded-For HTTP header? Answer: Yes, there is an equivalent mechanism that is supported by ELBs for TCP listeners: the PROXY protocol. To enable use of the PROXY protocol by your ELB, see here. You will also need to tell ProFTPD to expect the PROXY protocol, which means using the mod_proxy_protocol module. The PROXY protocol, and the mod_proxy_protocol module, work equally well for FTP/FTPS and SFTP/SCP sessions. Question: Should I run a firewall on my instance as well? Answer: It is considered a good network security practice to do so, as it provides security in depth. However, care must be taken with those firewall rules; they need to allow the same ports/ addresses as your SGs. (Also note that local/instance firewall rules CANNOT be applied to the connecting client's IP address when connecting through ELB.) © Copyright 2017 The ProFTPD Project All Rights Reserved
Question: I need to send particular users only to a particular instance/set of instances. How do I configure AWS to do this? Answer: Short answer: you cannot. But it can be done! The AWS services like ELBs and Route53 understand TCP connections, and the HTTP protocol, but they do not understand FTP. And understanding of the protocol is necessary, so that you know how/when to expect the user name, and how to redirect/proxy the backend connection. This is why you cannot use AWS to do per-user balancing. However, you can use the mod_proxy module for ProFTPD, which is protocol-aware, and thus can balance FTP/FTPS connections in multiple ways, including per-user. Question: I am using ELBs for my pool of ProFTPD servers. I would like my logs to show the IP address of the connecting clients, but all I get is the IP address of the ELB. Is there a way to get the original IP address, an equivalent to the X-Forwarded-For HTTP header? Answer: Yes, there is an equivalent mechanism that is supported by ELBs for TCP listeners: the PROXY protocol. To enable use of the PROXY protocol by your ELB, see here. You will also need to tell ProFTPD to expect the PROXY protocol, which means using the mod_proxy_protocol module. The PROXY protocol, and the mod_proxy_protocol module, work equally well for FTP/FTPS and SFTP/SCP sessions. Question: Should I run a firewall on my instance as well? Answer: It is considered a good network security practice to do so, as it provides security in depth. However, care must be taken with those firewall rules; they need to allow the same ports/ addresses as your SGs. (Also note that local/instance firewall rules CANNOT be applied to the connecting client's IP address when connecting through ELB.) © Copyright 2017 The ProFTPD Project All Rights Reserved
The AWS services like ELBs and Route53 understand TCP connections, and the HTTP protocol, but they do not understand FTP. And understanding of the protocol is necessary, so that you know how/when to expect the user name, and how to redirect/proxy the backend connection. This is why you cannot use AWS to do per-user balancing. However, you can use the mod_proxy module for ProFTPD, which is protocol-aware, and thus can balance FTP/FTPS connections in multiple ways, including per-user.
Question: I am using ELBs for my pool of ProFTPD servers. I would like my logs to show the IP address of the connecting clients, but all I get is the IP address of the ELB. Is there a way to get the original IP address, an equivalent to the X-Forwarded-For HTTP header? Answer: Yes, there is an equivalent mechanism that is supported by ELBs for TCP listeners: the PROXY protocol.
X-Forwarded-For
To enable use of the PROXY protocol by your ELB, see here. You will also need to tell ProFTPD to expect the PROXY protocol, which means using the mod_proxy_protocol module.
PROXY
mod_proxy_protocol
The PROXY protocol, and the mod_proxy_protocol module, work equally well for FTP/FTPS and SFTP/SCP sessions.
Question: Should I run a firewall on my instance as well? Answer: It is considered a good network security practice to do so, as it provides security in depth. However, care must be taken with those firewall rules; they need to allow the same ports/ addresses as your SGs. (Also note that local/instance firewall rules CANNOT be applied to the connecting client's IP address when connecting through ELB.) © Copyright 2017 The ProFTPD Project All Rights Reserved