postsacm-ctf3-enterprisemail

ACM 2018-2019 CTF3 Writeup: Enterprisemail

Posted 07/24/19 #security #ctf #python #acm

ACM UMN ran a series of CTF (Capture The Flag) cybersecurity competitions during the 2018-2019 academic year. Of these, CTF3 was the most exciting: a live in-person Attack-Defense style competition. For this competition I wrote a number of challenges. This article breaks down some exploits against one of them: enterprisemail.


What does it do?

"Speedy, Secure, or Enterprise. We chose Enterprise."

enterprisemail is a dead simple "email" server. It allows users to create accounts, sign into them, send each other mail, and check their inboxes. Authentication was via PyNaCl Curve25519 keying: users generated a private key, it's public key was associated with their account at creation, and authentication was a very simple challenge-response signing-request protocol. There was no crypto on the messages actually exchanged by the server

Client ←→ server communication was over a DIY XML serialization protocol, with it's own share of bugs. The server was implemented (badly) using the python stdlib socketserver module. Message storage the filesystem: a storage folder with one folder per user account. Messages were internally a list of "parts", each of which was just a python object (but in practice, only strings were used - however, the facilities for representing other datatypes had some serious security implications!) Each message was given in internal timestamp-derived ID that was used as the filename to store it in in the recipient user's folder. Additionally each user also had a file named key that held their public key, used for authentication.

A number of other facilities were required by the service's SLA: the server must provide the ability to list all users along with their public keys, and the server must keep a copy of every message that flows through it in case of an "audit." The former is simple but essential to leverage another exploit, and the latter's implementation contained some important bugs. The audit functionality was implemented by having a "shadow user" named audit that received a copy of every message. audit had their own mailbox, but no keyfile, and whenever mail was saved a copy was dropped there.

Flag storage was implemented against this server by having the gameserver create a new account, send itself a message, and then return at the end of the period to retrieve it. There was no automated enforcement of the audit facility, but we (the gamerunners) had some fun showing up at team's workstation and demanding a mail accounting.

We shipped a simple client application for this protocol along with the server. The client was well-behaved, but intentionally easy to modify to be made to send malicious requests. There was one serious attack that could actually be carried out entirely from within the shipped client!


What does it do wrong?


Denial of Service


Non-threading/forking server implementation

The server was implemented using the python standard library socketserver toolkit. This can easily be configured to use either multiple threads or processes to handle connections, but the one we shipped didn't. As a result, simply dialing the server and leaving the connection open was enough to prevent any other client from being able to connect. We actually shipped the following piece of code as part of the server:

class SrvClass(socketserver.ThreadingMixIn, socketserver.TCPServer): pass
class SrvClass(socketserver.TCPServer): pass
...
server = SrvClass((HOST, PORT), make_handler(PrivateKey.generate().encode(AsciiHexEncoder)))
server.serve_forever()

As a result, it might look to a defender that the server was using the ThreadingMixIn that would resolve this issue, in fact the latter SrvClass definition immediately shadows the first and means that there is no ability to handle more than one request concurrently.

Intended fix

An effective patch would've simply been to comment out the second shadowing definition.

Competitive impact

Not much. It doesn't seem like anyone mounted an effective (intentional) attack against this. The client we shipped behaved well and avoided leaving connections open after it was closed. Judging from the uptime information it seems like some teams' servers got stuck, but this was usually solved with a systemctl restart instead of a real patch.


Broken two-stage authentication-handling code

Authentication was implemented as a very simple challenge-response signing protocol. The client starts with a signin_initiate command that the server replies to with some randomness to sign. The client signs it and sends a signin_complete command to finish the authentication. However, the server doesn't check that the signin_complete request comes after a signin_initiate request, and so can be made to read from variables that haven't been set, causing a crash:

if action == 'signin_initiate':
    authenticatingas = params['user']
...
elif action == 'signin_complete':
    correct_fingerprint =\
        getfingerprint(mail.get_user_key(authenticatingas))

If the server receives a signin_complete first, it will attempt to read authenticatingas and crash.

Another similar bug was that trying to authenticate as a user that dosen't exist will cause the server to attempt to read that users public key file and crash when it did not exist:

def get_user_key(user):
    with open("store/"+user+"/key", 'r') as fd:
        return fd.read()

As a result, sending a nonexistant user in the signin_initiate step would cause a fatal error in the signin_complete step.

There were probably qute a few other logic bombs like this sitting around in the codebase, but these were the "biggest" ones.

Intended fix

Make sure that a previous signin_initiate request had been sent. A simple solution would be to set authenticatingas = None in the handler initialization and check it in the signin_complete branch before proceeding.

The second issue could be fixed just by checking that the user's folder and keyfile existed, and reporting an error to the user or just reporting a failed authentication

Competitive impact

Zero. I don't think anyone used anything other than the client we shipped, which uses the correct auth flow and didn't trip either of these bugs.


Billion Laughs

Yes, it loads XML using the python stdlib xml.etree, so it's vulnerable to our old friend the Billion Laughs Attack.

import xml.etree.ElementTree as ET
...
def load_xml_msg(message):
    root = ET.fromstring(message)
Intended fix

Using the threading or forking mixin would have made this a lot harder to exploit, and setting a maximum deserialization time or disabling entity support would've been a complete fix.

Competitive impact

Zero. I don't think anyone used anything other than the client we shipped, which sends well-formed messages.



Disclosure


audit account is registrable

The audit facility in the server we shipped was implemented by sending a copy of every message to the inbox of a user called audit. This user didn't have a keyfile, and so couldn't be logged in as. However, it was perfectly possible to register a user named audit and set it's key to one you control. This would then allow you to request the mail for this user and get a copy of every message that had flowed through the server, including messages from the gameserver with flags.

Relevant mail-sending snippet where a audit is added to the to-list of all messages:

def send_mail(fromusr, tousrs, subject, parts):
    timestamp = str(time.time()).split(".")[0]
    msg = Message(timestamp, fromusr, tousrs, subject, parts)
    for usr in tousrs + ['audit']:
        with open("store/"+usr+"/msg_"+str(timestamp), 'wb') as fd:
            pickle.dump(msg, fd)

Relevant account-registration snippet:

def create_account(user, key):
    try:
        open('store/'+user+'/key', 'r').close()
        print("account already exists")
        return False #User already exists
    except IOError:
        pass #user does not exist
    os.system("mkdir store/"+user)
    with open("store/"+user+"/key", 'w') as fd:
        fd.write(key)
    return True

Note how the keyfile, not the directory is checked for. This means that although the audit user directory is created at setup time, it's still possible to register an account. Also note that the os.system call to create the directory dosen't have a return code check of any sort, so although it will fail because the audit directory already exists the registration will proceed.

Intended fix

One could specifically check to see if the account being created or authenticated for was audit and disallow that. A quick-and-dirty solution would be to drop an empty key file into the audit directory, causing any authentication attempt to fail.

Competitive impact

A little. This took a long time to be discovered by anyone, and the team that finally did only got a few points from it since it was so close to the end and they did not build an automated solution.


Broken signature "fingerprint" comparison code

When a user completed the challenge-based auth scheme, the key it used to sign the challenge is compared with the key on file for the account it was trying to authenticate with:

correct_fingerprint = getfingerprint(mail.get_user_key(authenticatingas))
clientkey = params['key']
client_fingerprint = getfingerprint(clientkey.encode(AsciiHexEncoder))
signature = params['signature']
clientkey.verify(signature)
if client_fingerprint == correct_fingerprint:
    #Success...

You'll notice however that we are actually comparing the "fingerprint" of the keys. This is a completely useless step that actually makes the authentication scheme completely insecure - while using crypto-sounding language! Instead of computing a real fingerprint, we do the following:

def getfingerprint(key):
    return key[:4]

(If you're unfamilar with Python's slice notation, this means take the first four characters of key.) key here holds a string of hexadecimal characters (not a PyNaCl VerifyKey object!) As a result, taking the first four characters means that we are in effect only comparing 16 bits of the key instead of the full length (2 hex digits in a byte, 16 bits in two bytes/4 hex digits.) As a result of this, it's easy to brute force a key that shares the first 16 bits with the public key of a user that has already registered (for example, the gameserver's flag-rotation user!)

And luckily, part of the SLA says that the server must support the listusers command - which has to return their public keys as well! This means that it's easy to find users and generate an alternate key for them. Here's a minimal collision-finder:

tofind = input()

while 1:
    k = nacl.signing.SigningKey.generate()
    if k.verify_key.encode(AsciiHexEncoder)[:4].startswith(tofind):
        print(k.encode(AsciiHexEncoder))
        break

This takes less than a minuite to find a collision on a Pi, and since rounds are 5 min at their fastest this would work fine in practice: wait for a key rotation, find the new account, bruteforce an alternate key, and slurp out the flag.

Intended fix

Just remove the calls to getfingerprint. There's no reason not to just compare the whole key. The whole key is stored, just not compared, so there's no transitional phase needed.

Competitive impact

Nobody found this ;-;.



Remote Code Execution


Absolute garbage fire of a serialization system

The communication protocol between client and server is a custom XML-based serialization format representing Python objects. XML nodes represent Python objects, including type data. This type data is used to retrieve a deserializing function that is run over the node:

def load_obj(ele):
    clsstr=ele.attrib['type']
    cls = {
        "int": int,
        "str": str,
        "list": list,
        "_pickle": lambda text: pickle.loads(bytes.fromhex(text)),
        "_pusigbkey": lambda text: VerifyKey(text, AsciiHexEncoder),
        "_bytes": lambda text: bytes.fromhex(text)
    }.get(clsstr)
    if cls is None: cls = eval(clsstr)
    if not len(ele) and cls is list: return []
    if not len(ele):
        t = ele.text
        if ele.attrib.get('strip', False):
            t = t.strip()
        val = cls(t)
    else:
        args = list(map(load_obj, ele))
        val = cls(args)
    return val

One issue that should jump right out is the use of eval as a fallback for unknown classnames. This is "intended" to make it so that other builtin Python classes (e.g. tuple) could be used if the implementer "forgot" to register them. In actuality, it means that you can simply submit a request with Python source code in the type field and it will be executed by the server. A second glaring flaw is the the use of pickle. Pickle is not secure! There's even a big red warning on it's docs page. Running pickle.loads on untrusted data (read: anything you get during a CTF!) can lead trivially to remote code execution.

Intended fix

Just disable the eval-fallback and _pickle case. They aren't used by the gameserver's key-rotation client, and nothing beyond text support is required by the SLA. The _pusigbkey (a misspelling of _pubsigkey) and _bytes types are required for the authentiation part of the protocol, but don't pose an RCE vulnerability.

Competitive impact

Zero. I don't think anyone used anything other than the client we shipped, which encodes requests using only the non funky type field parameters.



Conclusion

enterprisemail was a fun little service to put together. I had a lot of fun designing the downright evil serialization format it used. Unfortunately, none of the flaws were widely exploited, and I don't think any teams did any real patching beyond periodic restart scripts to deal with accidental DoS cases.

In a future post I will discuss phipper, another Python service I wrote for ACM CTF3 - which saw much more attack and defence during the CTF!