Skip to main content

domain name system - Why does DNS work the way it does?





This is a Canonical Question about DNS (Domain Name Service).




If my understanding of the DNS system is correct, the .com registry holds a table that maps domains (www.example.com) to DNS servers.




  1. What is the advantage? Why not map directly to an IP address?


  2. If the only record that needs to change when I am configuring a DNS server to point to a different IP address, is located at the DNS server, why isn't the process instant?


  3. If the only reason for the delay are DNS caches, is it possible to bypass them, so I can see what is happening in real time?




Answer



Actually, it's more complicated than that - rather than one "central registry (that) holds a table that maps domains (www.mysite.com) to DNS servers", there are several layers of hierarchy



There's a central registry (the Root Servers) which contain only a small set of entries: the NS (nameserver) records for all the top-level domains - .com, .net, .org, .uk, .us, .au, and so on.



Those servers just contain NS records for the next level down. To pick one example, the nameservers for the .uk domain just has entries for .co.uk, .ac.uk, and the other second-level zones in use in the UK.



Those servers just contain NS records for the next level down - to continue the example, they tell you where to find the NS records for google.co.uk. It's on those servers that you'll finally find a mapping between a hostname like www.google.co.uk and an IP address.



As an extra wrinkle, each layer will also serve up 'glue' records. Each NS record maps a domain to a hostname - for instance, the NS records for .uk list nsa.nic.uk as one of the servers. To get to the next level, we need to find out the NS records for nic.uk are, and they turn out to include nsa.nic.uk as well. So now we need to know the IP of nsa.nic.uk, but to find that out we need to make a query to nsa.nic.uk, but we can't make that query until we know the IP for nsa.nic.uk...




To resolve this quandary, the servers for .uk add the A record for nsa.nic.uk into the ADDITIONAL SECTION of the response (response below trimmed for brevity):



jamezpolley@li101-70:~$dig nic.uk ns

; <<>> DiG 9.7.0-P1 <<>> nic.uk ns
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21768
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 14


;; QUESTION SECTION:
;nic.uk. IN NS

;; ANSWER SECTION:
nic.uk. 172800 IN NS nsb.nic.uk.
nic.uk. 172800 IN NS nsa.nic.uk.

;; ADDITIONAL SECTION:
nsa.nic.uk. 172800 IN A 156.154.100.3

nsb.nic.uk. 172800 IN A 156.154.101.3


Without these extra glue records, we'd never be able to find the nameservers for nic.uk. and so we'd never be able to look up any domains hosted there.



To get back to your questions...




a) What is the advantage? Why not map directly to an IP address?





For one thing, it allows edits to each individual zone to be distributed. If you want to update the entry for www.mydomain.co.uk, you just need to edit the information on your mydomain.co.uk's nameserver. There's no need to notify the central .co.uk servers, or the .uk servers, or the root nameservers. If there was only a single central registry that mapped all the levels all the way down the hierarchy that had to be notified about every single change of a DNS entry all the way down the chain, it would be absolutely swamped with traffic.



Before 1982, this was actually how name resolution happened. One central registry was notified about all updates, and they distributed a file called hosts.txt which contained the hostname and IP address of every machine on the internet. A new version of this file was published every few weeks, and every machine on the internet would have to download a new copy. Well before 1982, this was starting to become problematic, and so DNS was invented to provide a more distributed system.



For another thing, this would be a Single Point of Failure - if the single central registry went down, the entire internet would be offline. Having a distributed system means that failures only affect small sections of the internet, not the whole thing.



(To provide extra redundancy, there are actually 13 separate clusters of servers that serve the root zone. Any changes to the top-level domain records have to be pushed to all 13; imagine having to coordinate updating all 13 of them for every single change to any hostname anywhere in the world...)





b) If the only record that needs to change when I am configuring a DNS
server to point to a different IP address is located at the DNS
server, why isn't the process instant?




Because DNS utilises a lot of caching to both speed things up and decrease the load on the NSes. Without caching, every single time you visited google.co.uk your computer would have to go out to the network to look up the servers for .uk, then .co.uk, then .google.co.uk, then www.google.co.uk. Those answers don't actually change much, so looking them up every time is a waste of time and network traffic. Instead, when the NS returns records to your computer, it will include a TTL value, that tells your computer to cache the results for a number of seconds.



For example, the NS records for .uk have a TTL of 172800 seconds - 2 days. Google are even more conservative - the NS records for google.co.uk have a TTL of 4 days. Services which rely on being able to update quickly can choose a much lower TTL - for instance, telegraph.co.uk has a TTL of just 600 seconds on their NS records.



If you want updates to your zone to be near-instant, you can choose to lower your TTL as far down as you like. The lower your set it, the more traffic your servers will see, as clients refresh their records more often. Every time a client has to contact your servers to do a query, this will cause some lag as it's slower than looking up the answer on their local cache, so you'll also want to consider the tradeoff between fast updates and a fast service.





c) If the only reason for the delay are DNS caches, is it possible to
bypass them, so I can see what is happening in real time?




Yes, this is easy if you're testing manually with dig or similar tools - just tell it which server to contact.



Here's an example of a cached response:




jamezpolley@host:~$dig telegraph.co.uk NS

; <<>> DiG 9.7.0-P1 <<>> telegraph.co.uk NS
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36675
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;telegraph.co.uk. IN NS


;; ANSWER SECTION:
telegraph.co.uk. 319 IN NS ns1-63.akam.net.
telegraph.co.uk. 319 IN NS eur3.akam.net.
telegraph.co.uk. 319 IN NS use2.akam.net.
telegraph.co.uk. 319 IN NS usw2.akam.net.
telegraph.co.uk. 319 IN NS use4.akam.net.
telegraph.co.uk. 319 IN NS use1.akam.net.
telegraph.co.uk. 319 IN NS usc4.akam.net.
telegraph.co.uk. 319 IN NS ns1-224.akam.net.


;; Query time: 0 msec
;; SERVER: 97.107.133.4#53(97.107.133.4)
;; WHEN: Thu Feb 2 05:46:02 2012
;; MSG SIZE rcvd: 198


The flags section here doesn't contain the aa flag, so we can see that this result came from a cache rather than directly from an authoritative source. In fact, we can see that it came from 97.107.133.4, which happens to be one of Linode's local DNS resolvers. The fact that the answer was served out of a cache very close to me means that it took 0msec for me to get an answer; but as we'll see in a moment, the price I pay for that speed is that the answer is almost 5 minutes out of date.



To bypass Linode's resolver and go straight to the source, just pick one of those NSes and tell dig to contact it directly:




jamezpolley@li101-70:~$dig @ns1-224.akam.net telegraph.co.uk NS

; <<>> DiG 9.7.0-P1 <<>> @ns1-224.akam.net telegraph.co.uk NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23013
;; flags: qr aa rd; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available


;; QUESTION SECTION:
;telegraph.co.uk. IN NS

;; ANSWER SECTION:
telegraph.co.uk. 600 IN NS use2.akam.net.
telegraph.co.uk. 600 IN NS eur3.akam.net.
telegraph.co.uk. 600 IN NS use1.akam.net.
telegraph.co.uk. 600 IN NS ns1-63.akam.net.
telegraph.co.uk. 600 IN NS usc4.akam.net.

telegraph.co.uk. 600 IN NS ns1-224.akam.net.
telegraph.co.uk. 600 IN NS usw2.akam.net.
telegraph.co.uk. 600 IN NS use4.akam.net.

;; Query time: 9 msec
;; SERVER: 193.108.91.224#53(193.108.91.224)
;; WHEN: Thu Feb 2 05:48:47 2012
;; MSG SIZE rcvd: 198



You can see that this time, the results were served directly from the source - note the aa flag, which indicates that the results came from an authoritative source. In my earlier example, the results came from my local cache, so they lack the aa flag. I can see that the authoritative source for this domain sets a TTL of 600 seconds. The results I got earlier from a local cache had a TTL of just 319 seconds, which tells me that they'd been sitting in the cache for (600-319) seconds - almost 5 minutes - before I saw them.



Although the TTL here is only 600 seconds, some ISPs will attempt to reduce their traffic even further by forcing their DNS resolvers to cache the results for longer - in some cases, for 24 hours or more. It's traditional (in a we-don't-know-if-this-is-really-neccessary-but-let's-be-safe kind of way) to assume that any DNS change you make won't be visible everywhere on the internet for 24-48 hours.


Comments

Popular posts from this blog

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error: I'm using Ubuntu 9.04 and: $ javaws -version Java(TM) Web Start 1.6.0_16 $ uname -a Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux $ firefox -version Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org On Windows + IE it (unsurprisingly) works. I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is). Has anyone out there managed to mount virtual media in the same scenario?

hp proliant - Smart Array P822 with HBA Mode?

We get an HP DL360 G8 with an Smart Array P822 controller. On that controller will come a HP StorageWorks D2700 . Does anybody know, that it is possible to run the Smart Array P822 in HBA mode? I found only information about the P410i, who can run HBA. If this is not supported, what you think about the LSI 9207-8e controller? Will this fit good in that setup? The Hardware we get is used but all original from HP. The StorageWorks has 25 x 900 GB SAS 10K disks. Because the disks are not new I would like to use only 22 for raid6, and the rest for spare (I need to see if the disk count is optimal or not for zfs). It would be nice if I'm not stick to SAS in future. As OS I would like to install debian stretch with zfs 0.71 as file system and software raid. I have see that hp has an page for debian to. I would like to use hba mode because it is recommend, that zfs know at most as possible about the disk, and I'm independent from the raid controller. For us zfs have many benefits,

apache 2.2 - Server Potentially Compromised -- c99madshell

So, low and behold, a legacy site we've been hosting for a client had a version of FCKEditor that allowed someone to upload the dreaded c99madshell exploit onto our web host. I'm not a big security buff -- frankly I'm just a dev currently responsible for S/A duties due to a loss of personnel. Accordingly, I'd love any help you server-faulters could provide in assessing the damage from the exploit. To give you a bit of information: The file was uploaded into a directory within the webroot, "/_img/fck_uploads/File/". The Apache user and group are restricted such that they can't log in and don't have permissions outside of the directory from which we serve sites. All the files had 770 permissions (user rwx, group rwx, other none) -- something I wanted to fix but was told to hold off on as it wasn't "high priority" (hopefully this changes that). So it seems the hackers could've easily executed the script. Now I wasn't able