Debugging a Kerberos authentication problem

Debugging a Kerberos authentication problem

Kerberos is a notorious authentication mechanism/protocol known for its complexity. I probably wouldn't exaggerate if assume that it's the first response you would get if googled the most common authentication protocol for a distributed system. Just recently I've been working on fixing the connection to Kafka secured by Kerberos where I used a couple of techniques that I hope you will find useful.

Let's review the architecture first:

Here we have the App, the KDC server, and the Kafka server to which I would like to connect to get information about how many topics are there.

Due to its complexity, it may be hard to comprehend how Kerberos actually work so I will pass this task to Mike Pound from this Computerphile video and instead will outline the 3 main point essential to my case:

  1. To connect to Kafka (K) I will need Kerberos to grant me a time-based token first (AK - App to Kafka token)
  2. If the App is known to the KDC, the token will be granted
  3. The App will use the token AK to connect to Kafka to get a list of available Topics.

The problem

Connection to the Kafka broker was not happening and eventually timed out.

The first thing which I wanted to confirm is that the App could see the Kafka broker,  the easiest way to do this is to ping the Kafka's default port:

telnet kafka-server.org 9092

If this command returns something like Connected to ... it means that Kafka is visible to the App which was the case. This pointed out to the fact that the broker was working correctly and the problem laid somewhere in the authentication process.

The next most obvious problem could be incorrect the keytab or/and a krb5.conf file. The krb5.conf file was correct, but I was not sure about the keytab. Just reading the content of the keytab will not be enough so I had to check that the keytab was valid.

This was pretty easy to do by using kinit command:

KRB5_CONFIG=/etc/krb5.conf KRB5_TRACE=/dev/stdout kinit -kVt kafka.key kafka/kafka.server@SOME.REALM

Here I am passing KRB5_TRACE environment variable accompanying with -V argument to print the Kerberos logs:

2021-04-02T13:47:06 Plugin AppSSOLocatePlugin_macOS is signed by Apple
2021-04-02T13:47:06 Plugin AppSSOConfigPlugin_macOS is signed by Apple
2021-04-02T13:47:06 Plugin heimdalodpac is signed by Apple
2021-04-02T13:47:06 Plugin Reachability is signed by Apple
2021-04-02T13:47:06 Plugin SCKerberosConfig is signed by Apple
2021-04-02T13:47:07 set-error: -1765328242: Reached end of credential caches
2021-04-02T13:47:07 set-error: -1765328243: Principal kafka/kafka.server@SOME.REALM not found in any credential cache
2021-04-02T13:47:07 Adding PA mech: ENCRYPTED_CHALLENGE
2021-04-02T13:47:07 Adding PA mech: ENCRYPTED_TIMESTAMP
2021-04-02T13:47:07 krb5_get_init_creds: loop 1
2021-04-02T13:47:07 KDC sent 0 patypes
2021-04-02T13:47:07 fast disabled, not doing any fast wrapping
2021-04-02T13:47:07 Trying to find service kdc for realm SOME.REALM flags 0
2021-04-02T13:47:07 configuration file for realm SOME.REALM found
2021-04-02T13:47:07 submissing new requests to new host
2021-04-02T13:47:07 connecting to host: udp 10.20.207.14:88 (kdc.com) tid: 00000001
2021-04-02T13:47:07 writing packet: udp 10.20.207.14:88 (kdc.com) tid: 00000001
2021-04-02T13:47:08 Configuration exists for realm SOME.REALM, wont go to DNS
2021-04-02T13:47:08 out of hosts, waiting for replies
2021-04-02T13:47:18 retrying sending to: udp 10.20.207.14:88 (kdc.com) tid: 00000001
2021-04-02T13:47:18 writing packet: udp 10.20.207.14:88 (kdc.com) tid: 00000001
2021-04-02T13:47:29 retrying sending to: udp 10.20.207.14:88 (kdc.com) tid: 00000001
2021-04-02T13:47:29 writing packet: udp 10.20.207.14:88 (kdc.com) tid: 00000001
2021-04-02T13:47:40 host timed out: udp 10.20.207.14:88 (kdc.com) tid: 00000001
2021-04-02T13:47:40 no more hosts to send/recv packets to/from trying to pulling more hosts
2021-04-02T13:47:40 set-error: -1765328228: unable to reach any KDC in realm SOME.REALM, tried 1 KDC
2021-04-02T13:47:40 krb5_sendto_context SOME.REALM done: -1765328228 hosts 1 packets 3 wc: 33.118119 nr: 0.024906 kh: 0.001958 tid: 00000001
kinit: krb5_get_init_creds: unable to reach any KDC in realm SOME.REALM, tried 1 KDC

By looking at the logs it became evident that the initial connection to KDC is not happening and in particular to 88/udp port.

For pinging UDP port telnet was useless so I chose more advanced tools like nmap (with -T for TCP and -U for UDP ports) for doing this job. Considering all possible issues, I run this tool against all exposed by the KDC ports:

     ftp           21/tcp           # Kerberos ftp and telnet use the
     telnet        23/tcp           # default ports
     kerberos      88/udp    kdc    # Kerberos V5 KDC
     kerberos      88/tcp    kdc    # Kerberos V5 KDC
     klogin        543/tcp          # Kerberos authenticated rlogin
     kshell        544/tcp   cmd    # and remote shell
     kerberos-adm  749/tcp          # Kerberos 5 admin/changepw
     kerberos-adm  749/udp          # Kerberos 5 admin/changepw
     krb5_prop     754/tcp          # Kerberos slave propagation
     
     eklogin       2105/tcp         # Kerberos auth. & encrypted rlogin
     krb524        4444/tcp         # Kerberos 5 to 4 ticket translator

If ports are open the output should look like this:

> nmap -p 88 -sU kdc.com

PORT STATE SERVICE
88/udp open kerberos-sec

In my case, all ports were opened and reachable, except for 88/udp which state was open|filtered. I was suspecting that it happened due to the complexity of the Company network, but it's just an assumption. Unfortunately, I was not able to put a finger on why exactly connection to 88/udp was not happening so I decided to move on with a different approach.

This is the place where krb5.conf property called udp_preference_limit came to my rescue. This property set the limit in bytes after which TCP protocol will be used since it's 1 – all connections to KDC will be using TCP instead of UDP.

[libdefaults]
    default_realm = TEST.CONFLUENT.IO
    ticket_lifetime = 90000
    renew_lifetime = 432000
    renewable = true
    noaddresses = TRUE
    allow_weak_crypto = TRUE
    udp_preference_limit = 1

In the end it helped and connection to the Kafka server was succefully established.