Kerberos is a notorious authentication mechanism/protocol known for its complexity. I probably wouldn't exaggerate if assume that it's the first response you would get if googled the most common authentication protocol for a distributed system. Just recently I've been working on fixing the connection to Kafka secured by Kerberos where I used a couple of techniques that I hope you will find useful.
Let's review the architecture first:
Here we have the App, the KDC server, and the Kafka server to which I would like to connect to get information about how many topics are there.
Due to its complexity, it may be hard to comprehend how Kerberos actually work so I will pass this task to Mike Pound from this Computerphile video and instead will outline the 3 main point essential to my case:
- To connect to Kafka (K) I will need Kerberos to grant me a time-based token first (AK - App to Kafka token)
- If the App is known to the KDC, the token will be granted
- The App will use the token AK to connect to Kafka to get a list of available Topics.
Connection to the Kafka broker was not happening and eventually timed out.
The first thing which I wanted to confirm is that the App could see the Kafka broker, the easiest way to do this is to ping the Kafka's default port:
telnet kafka-server.org 9092
If this command returns something like Connected to ... it means that Kafka is visible to the App which was the case. This pointed out to the fact that the broker was working correctly and the problem laid somewhere in the authentication process.
The next most obvious problem could be incorrect the keytab or/and a krb5.conf file. The krb5.conf file was correct, but I was not sure about the keytab. Just reading the content of the keytab will not be enough so I had to check that the keytab was valid.
This was pretty easy to do by using kinit command:
KRB5_CONFIG=/etc/krb5.conf KRB5_TRACE=/dev/stdout kinit -kVt kafka.key kafka/kafka.server@SOME.REALM
Here I am passing KRB5_TRACE environment variable accompanying with -V argument to print the Kerberos logs:
2021-04-02T13:47:06 Plugin AppSSOLocatePlugin_macOS is signed by Apple 2021-04-02T13:47:06 Plugin AppSSOConfigPlugin_macOS is signed by Apple 2021-04-02T13:47:06 Plugin heimdalodpac is signed by Apple 2021-04-02T13:47:06 Plugin Reachability is signed by Apple 2021-04-02T13:47:06 Plugin SCKerberosConfig is signed by Apple 2021-04-02T13:47:07 set-error: -1765328242: Reached end of credential caches 2021-04-02T13:47:07 set-error: -1765328243: Principal kafka/kafka.server@SOME.REALM not found in any credential cache 2021-04-02T13:47:07 Adding PA mech: ENCRYPTED_CHALLENGE 2021-04-02T13:47:07 Adding PA mech: ENCRYPTED_TIMESTAMP 2021-04-02T13:47:07 krb5_get_init_creds: loop 1 2021-04-02T13:47:07 KDC sent 0 patypes 2021-04-02T13:47:07 fast disabled, not doing any fast wrapping 2021-04-02T13:47:07 Trying to find service kdc for realm SOME.REALM flags 0 2021-04-02T13:47:07 configuration file for realm SOME.REALM found 2021-04-02T13:47:07 submissing new requests to new host 2021-04-02T13:47:07 connecting to host: udp 10.20.207.14:88 (kdc.com) tid: 00000001 2021-04-02T13:47:07 writing packet: udp 10.20.207.14:88 (kdc.com) tid: 00000001 2021-04-02T13:47:08 Configuration exists for realm SOME.REALM, wont go to DNS 2021-04-02T13:47:08 out of hosts, waiting for replies 2021-04-02T13:47:18 retrying sending to: udp 10.20.207.14:88 (kdc.com) tid: 00000001 2021-04-02T13:47:18 writing packet: udp 10.20.207.14:88 (kdc.com) tid: 00000001 2021-04-02T13:47:29 retrying sending to: udp 10.20.207.14:88 (kdc.com) tid: 00000001 2021-04-02T13:47:29 writing packet: udp 10.20.207.14:88 (kdc.com) tid: 00000001 2021-04-02T13:47:40 host timed out: udp 10.20.207.14:88 (kdc.com) tid: 00000001 2021-04-02T13:47:40 no more hosts to send/recv packets to/from trying to pulling more hosts 2021-04-02T13:47:40 set-error: -1765328228: unable to reach any KDC in realm SOME.REALM, tried 1 KDC 2021-04-02T13:47:40 krb5_sendto_context SOME.REALM done: -1765328228 hosts 1 packets 3 wc: 33.118119 nr: 0.024906 kh: 0.001958 tid: 00000001 kinit: krb5_get_init_creds: unable to reach any KDC in realm SOME.REALM, tried 1 KDC
By looking at the logs it became evident that the initial connection to KDC is not happening and in particular to 88/udp port.
For pinging UDP port telnet was useless so I chose more advanced tools like nmap (with -T for TCP and -U for UDP ports) for doing this job. Considering all possible issues, I run this tool against all exposed by the KDC ports:
ftp 21/tcp # Kerberos ftp and telnet use the telnet 23/tcp # default ports kerberos 88/udp kdc # Kerberos V5 KDC kerberos 88/tcp kdc # Kerberos V5 KDC klogin 543/tcp # Kerberos authenticated rlogin kshell 544/tcp cmd # and remote shell kerberos-adm 749/tcp # Kerberos 5 admin/changepw kerberos-adm 749/udp # Kerberos 5 admin/changepw krb5_prop 754/tcp # Kerberos slave propagation eklogin 2105/tcp # Kerberos auth. & encrypted rlogin krb524 4444/tcp # Kerberos 5 to 4 ticket translator
If ports are open the output should look like this:
> nmap -p 88 -sU kdc.com PORT STATE SERVICE 88/udp open kerberos-sec
In my case, all ports were opened and reachable, except for 88/udp which state was open|filtered. I was suspecting that it happened due to the complexity of the Company network, but it's just an assumption. Unfortunately, I was not able to put a finger on why exactly connection to 88/udp was not happening so I decided to move on with a different approach.
This is the place where krb5.conf property called udp_preference_limit came to my rescue. This property set the limit in bytes after which TCP protocol will be used since it's 1 – all connections to KDC will be using TCP instead of UDP.
[libdefaults] default_realm = TEST.CONFLUENT.IO ticket_lifetime = 90000 renew_lifetime = 432000 renewable = true noaddresses = TRUE allow_weak_crypto = TRUE udp_preference_limit = 1
In the end it helped and connection to the Kafka server was succefully established.