security - I am under DDoS. What can I do?

This is a Canonical Question about DoS and DDoS mitigation.

I found a massive traffic spike on a website that I host today; I am getting thousands of connections a second and I see I'm using all 100Mbps of my available bandwidth. Nobody can access my site because all the requests time out, and I can't even log into the server because SSH times out too! This has happened a couple times before, and each time it's lasted a couple hours and gone away on its own.

Occasionally, my website has another distinct but related problem: my server's load average (which is usually around .25) rockets up to 20 or more and nobody can access my site just the same as the other case. It also goes away after a few hours.

Restarting my server doesn't help; what can I do to make my site accessible again, and what is happening?

Relatedly, I found once that for a day or two, every time I started my service, it got a connection from a particular IP address and then crashed. As soon as I started it up again, this happened again and it crashed again. How is that similar, and what can I do about it?

Answer

You are experiencing a denial of service attack. If you see traffic coming from multiple networks (different IPs on different subnets) you've got a distributed denial of service (DDoS); if it's all coming from the same place you have a plain old DoS. It can be helpful to check, if you are able; use netstat to check. This might be hard to do, though.

Denial of service usually falls into a couple categories: traffic-based, and load-based. The last item (with the crashing service) is exploit-based DoS and is quite different.

If you're trying to pin down what type of attack is happening, you may want to capture some traffic (using wireshark, tcpdump, or libpcap). You should, if possible, but also be aware that you will probably capture quite a lot of traffic.

As often as not, these will come from botnets (networks of compromised hosts under the central control of some attacker, whose bidding they will do). This is a good way for the attacker to (very cheaply) acquire the upstream bandwidth of lots of different hosts on different networks to attack you with, while covering their tracks. The Low Orbit Ion Cannon is one example of a botnet (despite being voluntary instead of malware-derived); Zeus is a more typical one.

Traffic-based

If you're under a traffic-based DoS, you're finding that there is just so much traffic coming to your server that its connection to the Internet is completely saturated. There is a high packet loss rate when pinging your server from elsewhere, and (depending on routing methods in use) sometimes you're also seeing really high latency (the ping is high). This kind of attack is usually a DDoS.

While this is a really "loud" attack, and it's obvious what is going on, it's hard for a server administrator to mitigate (and basically impossible for a user of shared hosting to mitigate). You're going to need help from your ISP; let them know you're under a DDoS and they might be able to help.

However, most ISPs and transit providers will proactively realize what is going on and publish a blackhole route for your server. What this means is that they publish a route to your server with as little cost as possible, via 0.0.0.0: they make traffic to your server no longer routeable on the Internet. These routes are typically /32s and eventually they are removed. This doesn't help you at all; the purpose is to protect the ISP's network from the deluge. For the duration, your server will effectively lose Internet access.

The only way your ISP (or you, if you have your own AS) is going to be able to help is if they are using intelligent traffic shapers that can detect and rate-limit probable DDoS traffic. Not everyone has this technology. However, if the traffic is coming from one or two networks, or one host, they might also be able to block the traffic ahead of you.

In short, there is very little you can do about this problem. The best long-term solution is to host your services in many different locations on the Internet which would have to be DDoSed individually and simultaneously, making the DDoS much more expensive. Strategies for this depend on the service you need to protect; DNS can be protected with multiple authoritative nameservers, SMTP with backup MX records and mail exchangers, and HTTP with round-robin DNS or multihoming (but some degradation might be noticeable for the duration anyway).

Load balancers are rarely an effective solution to this problem, because the load balancer itself is subject to the same problem and merely creates a bottleneck. IPTables or other firewall rules will not help because the problem is that your pipe is saturated. Once the connections are seen by your firewall, it is already too late; the bandwidth into your site has been consumed. It doesn't matter what you do with the connections; the attack is mitigated or finished when the amount of incoming traffic goes back down to normal.

If you are able to do so, consider using a content distribution network (CDN) like Akamai, Limelight and CDN77, or use a DDoS scrubbing service like CloudFlare or Prolexic. These services take active measures to mitigate these types of attacks, and also have so much available bandwidth in so many different places that flooding them is not generally feasible.

If you decide to use CloudFlare (or any other CDN/proxy) remember to hide your server's IP. If an attacker finds out the IP, he can again DDoS your server directly, bypassing CloudFlare. To hide the IP, your server should never communicate directly with other servers/users unless they are safe. For example your server should not send emails directly to users. This doesn't apply if you host all your content on the CDN and don't have a server of your own.

Also, some VPS and hosting providers are better at mitigating these attacks than others. In general, the larger they are, the better they will be at this; a provider which is very well-peered and has lots of bandwidth will be naturally more resilient, and one with an active and fully staffed network operations team will be able to react more quickly.

Load-based

When you are experiencing a load-based DDoS, you notice that the load average is abnormally high (or CPU, RAM, or disk usage, depending on your platform and the specifics). Although the server doesn't appear to be doing anything useful, it is very busy. Often, there will be copious amounts of entries in the logs indicating unusual conditions. More often than not this is coming from a lot of different places and is a DDoS, but that isn't necessarily the case. There don't even have to be a lot of different hosts.

This attack is based on making your service do a lot of expensive stuff. This could be something like opening a gargantuan number of TCP connections and forcing you to maintain state for them, or uploading excessively large or numerous files to your service, or perhaps doing really expensive searches, or really doing anything that is expensive to handle. The traffic is within the limits of what you planned for and can take on, but the types of requests being made are too expensive to handle so many of.

Firstly, that this type of attack is possible is often indicative of a configuration issue or bug in your service. For instance, you may have overly verbose logging turned on, and may be storing logs on something that's very slow to write to. If someone realizes this and does a lot of something which causes you to write copious amounts of logs to disk, your server will slow to a crawl. Your software might also be doing something extremely inefficient for certain input cases; the causes are as numerous as there are programs, but two examples would be a situation that causes your service to not close a session that is otherwise finished, and a situation that causes it to spawn a child process and leave it. If you end up with tens of thousands of open connections with state to keep track of, or tens of thousands of child processes, you'll run into trouble.

The first thing you might be able to do is use a firewall to drop the traffic. This isn't always possible, but if there is a characteristic you can find in the incoming traffic (tcpdump can be nice for this if the traffic is light), you can drop it at the firewall and it will no longer cause trouble. The other thing to do is to fix the bug in your service (get in touch with the vendor and be prepared for a long support experience).

However, if it's a configuration issue, start there. Turn down logging on production systems to a reasonable level (depending on the program this is usually the default, and will usually involve making sure "debug" and "verbose" levels of logging are off; if everything a user does is logged in exact and fine detail, your logging is too verbose). Additionally, check child process and request limits, possibly throttle incoming requests, connections per IP, and the number of allowed child processes, as applicable.

It goes without saying that the better configured and better provisioned your server is, the harder this type of attack will be. Avoid being stingy with RAM and CPU in particular. Ensure your connections to things like backend databases and disk storage are fast and reliable.

Exploit-based

If your service mysteriously crashes extremely quickly after being brought up, particularly if you can establish a pattern of requests that precede the crash and the request is atypical or doesn't match expected use patterns, you might be experiencing an exploit-based DoS. This can come from as few as just one host (with pretty much any type of internet connection), or many hosts.

This is similar to a load-based DoS in many respects, and has basically the same causes and mitigations. The difference is merely that in this case, the bug doesn't cause your server to be wasteful, but to die. The attacker is usually exploiting a remote crash vulnerability, such as garbled input that causes a null-dereference or something in your service.

Handle this similarly to an unauthorized remote access attack. Firewall against the originating hosts and type of traffic if they can be pinned down. Use validating reverse proxies if applicable. Gather forensic evidence (try and capture some of the traffic), file a bug ticket with the vendor, and consider filing an abuse complaint (or legal complaint) against the origin too.

These attacks are fairly cheap to mount, if an exploit can be found, and they can be very potent, but also relatively easy to track down and stop. However, techniques that are useful against traffic-based DDoS are generally useless against exploit-based DoS.

Blog

Search This Blog