Skip to main content

domain name system - Is Round-Robin DNS "good enough" for load balancing static content?



We have a set of shared, static content that we serve up between our websites at http://sstatic.net. Unfortunately, this content is not currently load balanced at all -- it's served from a single server. If that server has problems, all the sites that rely on it are effectively down because the shared resources are essential shared javascript libraries and images.



We are looking at ways to load balance the static content on this server, to avoid the single server dependency.




I realize that round-robin DNS is, at best, a low end (some might even say ghetto) solution, but I can't help wondering -- is round robin DNS a "good enough" solution for basic load balancing of static content?



There is some discussion of this in the [dns] [load-balancing] tags, and I've read through some great posts on the topic.



I am aware of the common downsides of DNS load balancing through multiple round-robin A records:




  • there's typically no heartbeats or failure detection with DNS records, so if a given server in the rotation goes down, its A record must manually be removed from the DNS entries

  • the time to live (TTL) must necessarily be set quite low for this to work at all, since DNS entries are cached aggressively throughout the internet


  • the client computers are responsible for seeing that there are multiple A records and picking the correct one



But, is round robin DNS good enough as a starter, better than nothing, "while we research and implement better alternatives" form of load balancing for our static content? Or is DNS round robin pretty much worthless under any circumstances?


Answer



Jeff, I disagree, load balancing does not imply redundancy, it's quite the opposite in fact. The more servers you have, the more likely you'll have a failure at a given instant. That's why redundancy IS mandatory when doing load balancing, but unfortunately there are a lot of solutions which only provide load balancing without performing any health check, resulting in a less reliable service.



DNS roundrobin is excellent to increase capacity, by distributing the load across multiple points (potentially geographically distributed). But it does not provide fail-over. You must first describe what type of failure you are trying to cover. A server failure must be covered locally using a standard IP address takeover mechanism (VRRP, CARP, ...). A switch failure is covered by resilient links on the server to two switches. A WAN link failure can be covered by a multi-link setup between you and your provider, using either a routing protocol or a layer2 solution (eg: multi-link PPP). A site failure should be covered by BGP : your IP addresses are replicated over multiple sites and you announce them to the net only where they are available.



From your question, it seems that you only need to provide a server fail-over solution, which is the easiest solution since it does not involve any hardware nor contract with any ISP. You just have to setup the appropriate software on your server for that, and it's by far the cheapest and most reliable solution.




You asked "what if an haproxy machine fails ?". It's the same. All people I know who use haproxy for load balancing and high availability have two machines and run either ucarp, keepalived or heartbeat on them to ensure that one of them is always available.



Hoping this helps!


Comments

Popular posts from this blog

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error: I'm using Ubuntu 9.04 and: $ javaws -version Java(TM) Web Start 1.6.0_16 $ uname -a Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux $ firefox -version Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org On Windows + IE it (unsurprisingly) works. I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is). Has anyone out there managed to mount virtual media in the same scenario?

hp proliant - Smart Array P822 with HBA Mode?

We get an HP DL360 G8 with an Smart Array P822 controller. On that controller will come a HP StorageWorks D2700 . Does anybody know, that it is possible to run the Smart Array P822 in HBA mode? I found only information about the P410i, who can run HBA. If this is not supported, what you think about the LSI 9207-8e controller? Will this fit good in that setup? The Hardware we get is used but all original from HP. The StorageWorks has 25 x 900 GB SAS 10K disks. Because the disks are not new I would like to use only 22 for raid6, and the rest for spare (I need to see if the disk count is optimal or not for zfs). It would be nice if I'm not stick to SAS in future. As OS I would like to install debian stretch with zfs 0.71 as file system and software raid. I have see that hp has an page for debian to. I would like to use hba mode because it is recommend, that zfs know at most as possible about the disk, and I'm independent from the raid controller. For us zfs have many benefits,

apache 2.2 - Server Potentially Compromised -- c99madshell

So, low and behold, a legacy site we've been hosting for a client had a version of FCKEditor that allowed someone to upload the dreaded c99madshell exploit onto our web host. I'm not a big security buff -- frankly I'm just a dev currently responsible for S/A duties due to a loss of personnel. Accordingly, I'd love any help you server-faulters could provide in assessing the damage from the exploit. To give you a bit of information: The file was uploaded into a directory within the webroot, "/_img/fck_uploads/File/". The Apache user and group are restricted such that they can't log in and don't have permissions outside of the directory from which we serve sites. All the files had 770 permissions (user rwx, group rwx, other none) -- something I wanted to fix but was told to hold off on as it wasn't "high priority" (hopefully this changes that). So it seems the hackers could've easily executed the script. Now I wasn't able