BIND Best Practices - Authoritative Server (KB derived)


#1
  • It is strongly recommended that you run BIND on a server dedicated to DNS only. Reasons include:

    • Minimized risk of impact to DNS services as a result of other applications consuming server resources (perhaps due to an attack on those services, or due to application error).

    • Conversely, minimized risk to other applications as a result of BIND consuming all system or network resources.

    • Reduced likelihood of unauthorized access to the DNS server (e.g. via a code defect and root access exploit made possible via another application).

    • Improved ability to monitor DNS server performance (since the server is dedicated to one service).

    • Improved ability to troubleshoot problems.

  • Run BIND as an unprivileged user.

    To open low-numbered UDP and TCP ports BIND must be launched as root, but an alternate uid can be specified using the -u command line argument; after opening needed resources named will change its runtime uid to an unprivileged account. (Please see the end of this document for note (1) concerning use of this feature under Linux.)

  • If following the preceding advice (running BIND as an unprivileged user on a dedicated server) chrooting is “de-emphasized.” Our operations experts feel that chrooting does not substantially improve security under those conditions and do not affirmatively recommend it, but they do not explicitly discourage it.

  • Make use of BIND access control mechanisms such as address match lists to restrict recursive query service to known and authorized clients. Ideally your Internet-facing authoritative servers should not perform recursion for any clients at all.

  • Consider DNSSEC-signing your public authoritative zones. (Recursive servers will then be able to use DNSSEC-validation to authenticate your records).

  • Consider deploying Response Rate Limiting (RRL). For information on Response Rate Limiting, see: A Quick Introduction to Response Rate Limiting.

  • Ensure (and confirm through testing) that your infrastructure supports EDNS0 and large UDP packet sizes.

  • Consider the length of the TTLs on the delegation records that you manage within your zones, as well as those that are provided by the parent zones that delegate authority to your nameservers. Longer TTLs protect the visibility of a zone, but shorter ones allow for a faster change of nameservers. Long TTLs can also help protect the visibility of a zone when the parent zone’s nameservers are under attack. See https://www.dns-oarc.net/wiki/mitigating-dns-denial-of-service-attacks for more information.

  • Do not combine authoritative and recursive nameserver functions – have each function performed by separate server sets. This advice primarily concerns separation of public-facing authoritative services from internal client-facing recursive services - administrators may, for convenience, choose to serve some internal-only zones authoritatively from their recursive servers, having determined that the benefit outweighs any risks associated with this policy.

    If you share recursive and authoritative functions in the one server - if there is a problem that impacts authoritative servers only - for example, that causes all of your authoritative servers to fail, then it will break your recursive service too.

  • Run multiple, distributed authoritative servers, avoiding single points of failure in critical resource paths. A variety of strategies are available (including anycast and load-balancing) to ensure robust geographic and network diversity in your deployment.

  • Provision sufficient capacity to handle burst traffic up to 20x normal level ( see also the above point on load-balanced configurations - adequate overprovisioning will help to avoid some of the pitfalls ).

    Remember that excess capacity must take into account not only server CPU and memory resources but also send and receive capacity along the entire network path

  • In most instances we would not recommend the use of inbound packet filtering for authoritative nameservers, Response Rate Limiting is the recommended solution. However there are some circumstances where filtering at very high inbound packet rates can be helpful - please contact ISC if you think you might benefit from our operational experience in this area.

  • Ensure that system outbound network buffers are large enough to handle your rates of outbound query traffic. Some OS implementations (linux particularly some versions) by default assume low rates of outbound network traffic - but an authoritative server will often be responding with significantly larger packets than the queries it received, particularly for signed zones.

  • Put in place monitoring scripts to continually check health of servers and alert if conditions change substantially.

    Conditions to monitor include:

    • process presence
    • CPU utilization
    • memory usage
    • network throughput and buffering (inbound/outbound)
    • filesystem utilization (on the log filesystem and also the filesystem containing the named working directory)
  • By design, and for security purposes, the most common mode of failure for BIND is intentional process termination when it encounters an inconsistent state. An automated minder process capable of restarting BIND intelligently is recommended if you do not have 24-hour operations support (and possibly even if you do.) It is especially helpful if any such script can checkpoint and archive the logs when this happens.

  • Logs should be examined periodically for error and warning messages which may provide a tip-off for incipient problems before they become critical.

  • Review the logging configuration to ensure it meets your requirements. BIND’s logging defaults are generally sane (passing most of the work to syslog), but may not line up with organizational policy and/or desired data collection/retention standards.

  • When using size-limited files for logging, plan the size of the files and number to retain so that an increased level of logging due to a problem is unlikely to cause the logs from the start of the problem to become unavailable. The exact settings will depend on how quickly problems can be detected and the details of the baseline retention policy.

  • Query logging adds substantial overhead (on the order of 10x) and so should not be turned on without careful consideration.

  • Prior to any trouble, ensure that a strategy is in place for collecting post-mortem information if a server does encounter a problem. This includes:

    • Building named with debug symbols enabled
    • Enabling the BIND XML statistics channel for easy data collection.
    • Designing an appropriate logging strategy and reserving sufficient space on the log filesystem for information to be collected for a significant context period before an event (several hours at least, 24 hours+ preferred.)
    • Ensuring that the uid under which named is running has write permission sufficient to write a core image to its working directory if it segmentation faults and to write named.dump or named.run files if requested by operator.

    See What to do with a misbehaving BIND server and What to do if your BIND or DHCP server has crashed for guidance on troubleshooting problems and the type of information that is useful to collect in those circumstances.

  • Run a multi-threaded BIND build and launch named with an appropriate number of task threads tuned for the hardware and CPU architecture.

  • Observe query loads periodically to establish baseline expectations. This will enable you to monitor for anything unusual - as defined by the range of ‘normal’ for your specific operational environment.

  • Run currently-supported version(s) of BIND in your environment.

  • You should have a strategy that includes both a planned upgrade path to ensure that you can take advantage of improved features and functionality, as well well as how you will respond if there is a security advisory released that has the potential to impact your servers and services. See Which version of BIND do I want to download and install? for more information.

  • Our general advice for security practices is included in the list above. However many large production environments with mission-critical DNS needs may opt to run servers on multiple hardware and/or OS platforms to increase the “eco-diversity” of their DNS infrastructure. This also includes running different versions of BIND for resilience to potential defects that may not impact all currently supported versions.