Background of the Partial Failure of the Name Service for .de Domains
Course of Events and Effects
Starting about 13:30 (CEST) on Wednesday, 12 May 2010, DENIC faced the situation that, depending on the service location and the domain queried, users sometimes received the incorrect reply “domain does not exist”. In such cases, the respective user could not reach the .de domain concerned via the domain’s address, and e-mails from or to this address were rejected or not sent.
The reason why this situation occurred was an incomplete copying process during the regular name service data update, which is performed at 2-hour intervals. Due to this, an incomplete update of the name service data (a so-called zone file) was triggered at 12 of the 16 service locations.
The immediately activated incident response team analyzed the error and subsequently, starting at 14:20, successively switched off those locations that were giving faulty responses. Since it was not immediately clear whether the reason for the zone defect was a faulty database or a fault in the zone generation process, the registration systems were also temporarily put on hold. The latter was also decided due to the fact that the registration systems had to cope with a high load stemming from an unusually high number of registration attempts of reputedly available domains.
Starting at 14:30, the switched-off locations were successively provided with a complete zone file and re-integrated into the name server network. Due to the data volume and the worldwide distribution of the locations, the entire distribution process and the subsequent re-start of all the service locations concerned took until about 15:45. At this point in time, with respect to DENIC, the service had been fully restored with its full performance.
However, due to the caching of ISPs, Internet users might sometimes have had to put up with disturbances up to 2 hours after the resolution by DENIC.
After completion of the ongoing detailed analyses, additional actions might have to be considered.
Technical Details
The zone file generated from the registration database is checked for completeness and plausibility several times before it is released for use by the globally distributed locations. These checks were also successfully executed for the named zone file. This is the reason that four locations were not provided with a faulty zone file and that the Frankfurt IPv6 DNS location and the DNSSEC testbed were not affected.
However, within the scope of a project for creating a new name service infrastructure, the concept for the zone file distribution was also redefined. All the quality checks mentioned above were positively applied to the named zone file as well, but, according to the new concept the zone was copied once again, before it was distributed to the locations. This copying process was interrupted resulting in a file that held just one third of the domain data.
Provisions were even in place to insure the copying was performed correctly. Unfortunately, since the securing mechanism itself wasn’t working properly the copying error was not detected and processing was not stopped.
As such, the incident is not related to the switch of the data centre operation from Amsterdam to Frankfurt, which took place on the preceding Tuesday. Neither is there any correlation to the DNSSEC testbed, nor were services provided for cooperating partners or secondaries operated by DENIC for other TLDs affected. The incident would not have had any impact on the anycast services we are planning to provide for our TLD customers either.