[05-04-2021] Update: [03-05-2025] High Speed Methods For Blind SQL Injections Ruben V Piña https://nzt-48.org X: @ruben_v_pina Mastodon: @ruben_v_pina 1. Introduction ================ This paper presents optimized methods for exfiltrating information from databases with blind SQL injection attacks. Duality, one of the methods described in this paper, involves using inferencial algorithms that request only certain fragments of the information and deduce the missing data (which work 100% of the time with any given data-type). 2. High-speed injections ========================= 2.1 Overview of methods that have been publish as to this date 2.1.1. Bisection, the most used method so far. One of the most used techniques to extract information from a database using blind SQL injection attacks is the bisection method. This technique makes use of a binary search algorithm with which is possible to retrieve any character within the ASCII range with a maximum of 7 requests, according to the sqlmap documentation [1]. To extract a character within the UTF-8 Latin range it would take a maximum of 8 requests. Example: Finding a character between the a-z range: ?sqli=1' and (select mid(password,1,1) from users limit 1) REGEXP '^[a-m] TRUE response ?sqli=1' and (select mid(password,1,1) from users limit 1) REGEXP '^[a-g] FALSE response ?sqli=1' and (select mid(password,1,1) from users limit 1) REGEXP '^[g-j] TRUE response ?sqli=1' and (select mid(password,1,1) from users limit 1) REGEXP '^[g-i] TRUE response ?sqli=1' and (select mid(password,1,1) from users limit 1) REGEXP '^h TRUE response This method has threading limitations because the requests must be performed in a sequential manner; in order to know how to forge the following request the attacker needs to know what was the result of the previous request. However, threads can be used to exfiltrate different chunks of the data at the same time. 2.1.2. The change of paradigm Usually, performing blind SQL injections relies on guesswork that gathers boolean responses from the vulnerable application in order to narrow down the possibilities of what the desired character might be. A very useful paradigm to optimize this old technique is the realization that everything inside the machine is binary. From the same perspective, it can be concluded that all the information stored in the computer is, in essence, Boolean. So instead of trying to guess what the character might be, it is easier to convert all the information into its binary representation and then just ask directly for TRUE and FALSE answers bit by bit. Set bits are equal to TRUE responses and unset bits are equal to FALSE responses. By gathering all of these Boolean responses, it is possible to reconstruct the exact bits used to represent each character of the information desired to be extracted. This idea of extracting information in its binary representation can be implemented in many different ways, as demonstrated in various blog posts and text files[3][4][5][6], but it is not known who came up with this idea, although it probably was Jelmer de Hen. 2.1.3. sql-anding, former fastest method for boolean blind SQL injections Back in 2013, new SQL Injection optimization techniques were presented in Black Hat USA by Roberto Salgado[8]. One of these techniques was created by me. It would be helpful to explain this technique in order to have a better comprehension of the new methods explained later in further sections. The code of the injection is the following: 1 AND (SELECT ASCII(SUBSTRING(password, n, 1))FROM users LIMIT 1) & %d The first thing this injection does is to select a single character from the desired value to extract by using the SUBSTRING function. The next thing done by this injection is to convert the selected character to its ASCII numeric value by using the ASCII function. The last part of the injection performs an AND bitwise operation against %d, a placeholder that will iterate through 7 different values: %d = [ 1, 2, 4, 8, 16, 32, 64 ] The binary representations of these values are the following: 1 = 00000001 2 = 00000010 4 = 00000100 8 = 00001000 16 = 00010000 32 = 00100000 64 = 01000000 128 = 10000000 Since all of these numbers are powers of 2, all the bits are unset except for only one bit that is set and shifts its position to the left with each iteration. The desired character to extract (being 'a' in this case, with a binary value of 01100001) is used to perform the AND bitwise operation with each one of the previously listed powers of 2: 01100001 01100001 01100001 01100001 00000001 00000010 00000100 00001000 ======== ======== ======== ======== 00000001 00000000 00000000 00000000 = TRUE = FALSE = FALSE = FALSE 01100001 01100001 01100001 01100001 00010000 00100000 01000000 10000000 ======== ======== ======== ======== 00000000 00100000 01000000 00000000 = FALSE = TRUE = TRUE = FALSE If the vulnerable page responds with a TRUE response, then the bit being tested is equal to 1. If it is FALSE, the bit is equal to 0. By iterating through the mentioned powers of 2 and gathering the Boolean responses, all the bits which represent the desired character can be determined and used to reconstruct the bit string of the data being extracted. If it is known that the character being retrieved is contained in the ASCII range, only 7 requests need to be done because, for the ASCII range, the most significant bit is always 0. This technique has an advantage: in contrast to the bisection method, each bit can be retrieved regardless if the other bits are known or not, which means that these injections doesn't have to be sequential. Threads can be used to retrieve all the character’s bits at the same moment. This is the reason of why sql-anding.py is very fast. There is already a recorded demo of this technique, publicly available to watch [7]. 2.1.5. Overview of pos2bin method pos2bin is another method which implements the idea of working with bits to extract information in a faster way. It was presented in Black Hat USA 2013 by Roberto Salgado [8]. I edited the injection a little bit to have a briefer explanation: ?sqli=1' and IF( @x:=(SUBSTRING(BIN( POSITION( SUBSTRING((SELECT password FROM users LIMIT 1), 1, 1) IN (CHAR(48,49,50,51,52,53,54,55,56,57,65,66,67,68,69,70)) ) ), 1, 1) )!=space(0), /* THEN */ 2-@x, /* ELSE */ 0/0) - SUBSTRING is used to extract the first character of column 'password' from table 'users' - Then POSITION is used like this: - A character set is defined, from 0-9 to A-F: CHAR(48,49,50,...,69,70) - POSITION IN is used to determine in which position the password character happens to be in this set: if the password character is 1 (ASCII 49), then its position is '2' - The BIN function is used to convert the returned 2 into binary, so the result is '10' - The SUBSTRING function selects only the first bit of the resulted '10', so it returns 1 - The resulted '1' is stored in the @x variable - The IF functions returns 2-@x (@x can be either a 1 or 0) if @x is not NULL, as a result you can get 2 possible responses depending if the returned bit is 1 or 0 - If @x is empty the injection returns 0/0 which is an error response, meaning that all the bits of the character's position have already been instracted. Then the injection is performed one more time to return the second bit of the characters position. In this way, with only 3 requests character '1' was extracted. Even though this method is very efficient, it has some constraints that make it inconvenient in certain situations. The main optimization idea used in this technique is to use reduced character sets. This is highly useful when it is known that only digits or hexadecimal characters are expected. The problem is that if you're working with larger character sets, such as a DB that uses the CJK character set (Chinese, Japanese and Korean), the character set would become excessively large and the possible positions would approach 21,000. So not only the injection would have more than 21 thousand characters, the bits that represent such position would be around 16 bits. 2.2 Explanation of new extraction methods 2.2.1 Duality: optimization of bit-based blind SQL injection methods The purpose of Duality.py is to be able to optimize the bit extraction process without the need of error-based responses. The main idea behind Duality is that, it doesn't have to request all the bits that compose each character. Duality is able to request only certain bits of the information and then deduces the missing data, reducing the amount of requests to an average of 5 requests by character. In some cases, Duality is able to retrive a character with only 2 to 3 injections. Also, Duality.py works better with larger character sets. With a character set that contains 21,000 characters, it would take around 10 request to find a character. For maximum optimization, Duality.py contains different decision-making trees which are chosen depending on the character set used by the data being extracted. For example, if the attacker is extracting passwords hashed with the MD5 algorithm, there are only 16 hexadecimal characters expected to be found, shortening the range of possible characters. There is also a decision-making tree used for when only numeric digits are expected: 2.2.1.1 Extracting Numbers It is going to be demonstrated how it is possible to extract a character by knowing just 1 or 2 of its bits. The easier example to explain how Duality does this is by assuming that the data to be extracted contains only numbers. The extraction strategy is the following: The following table shows the binary representation of the ASCII values of all the numeric characters: Numeric character Binary representation --------------------------------------------- 0 0110000 1 0110001 2 0110010 3 0110011 4 0110100 5 0110101 6 0110110 7 0110111 8 0111000 9 0111001 It is apparent that the first 3 bits of the binary strings are always the same: ‘011’. Since this bitstring is present in all numbers, the first step is to remove it so that only the last 4 bits remain. Normally, each one of the 4 remaining bits would have to be extracted, but Duality reduces the number of bits needed. This method starts by using the bitwise function BIT_COUNT() to determine how many bits are set in the character being extracted. Numeric character Binary BIT_COUNT ------------------------------------------------- 0 0000 0 1 0001 1 2 0010 1 3 0011 2 4 0100 1 5 0101 2 6 0110 2 7 0111 3 8 1000 1 9 1001 2 By using BIT_COUNT(), it becomes possible to classify all the characters into different groups depending on their quantity of set bits. This reduces the number of possible bit combinations to be found. For instance, if BIT_COUNT returns 3, it can be determined right away that the numeric character is ‘7’, because it is the only number which has a total of 3 set bits; only one request was made to find the character. Continuing with the example, if BIT_COUNT returns a value of 1 instead, the possibilities are reduced to the following characters: Numeric character Binary representation BIT_COUNT ----------------------------------------------------------------- 1 0001 1 2 0010 1 4 0100 1 8 1000 1 A good strategy would be to test the most significant bit. If it is set, then it would be revealed that the character is ‘8’, because it is the only character in this BIT_COUNT group whose most significant bit is set. The rest of the bits don’t need to be extracted anymore. If the bit is unset, the value ‘8’ can be discarded and the next step would be to test if the second most significant bit is set, because the only bitstring of this group whose second most significant bit is set is ‘4’. This would reveal that the character is ‘4’ without the need to extract the rest of the bits. If the number is not ‘8’ nor ‘4’, the possible values left are ‘0001’ and ‘0010’. By retrieving the value of just the last bit, it would be revealed which one of these 2 remaining characters is the one being extracted, without the need of requesting both bits. The sequence of how the bits are tested depends on the character range of the data wanted to be extracted. Duality’s code has multiple decision trees for the different BIT_COUNTs in each different character range (digits, hex, alpha, alnum, full-ascii). These decision-making trees have been optimized for each one of these specific cases by determining which sequences are the most likely to be encountered and what is the best way to identify unique patterns in the bitstrings in order to select and discard the optimum amount of bits. 2.2.1.2 Dealing with words When dealing with cases in which the data to be extracted consists of letters only, several optimization ideas can be used. The first thing to notice is that, for letter characters in the ASCII range, the first two significant bits are always ‘010’. These bits can be removed from the bitstrings since their values are already known and only 5 bits of the bitstring will remain. The second thing to do is to use BIT_COUNT to find out how many of the last 5 bits are set. For instance, if the result of BIT_COUNT is 3, the possible characters would be reduced to only 5 possible characters: Character Binary representation BIT_COUNT --------------------------------------------------------- G 0 0111 3 K 0 1011 3 M 0 1101 3 N 0 1110 3 W 1 0111 3 It is evident that there is a sequence of bits that appears twice in the table: ‘0111’ is used to represent the characters ‘G’ and ‘W’. Since this is the only sequence that is repeated, it has the higher chance of happening, so this is the first possibility that should be tested. It is also evident that this sequence of bits is the only one whose second bit is ‘0’. This means that with only 1 request to extract the second bit, the 5 possibilities can be reduced to only 2 characters if the result is ‘0’, and just 1 more request would be needed to determine if the value is ‘G’ or ‘W’. This kind of logic can be used in every scenario to start testing for the most probable values, and then finding bit sequences that distinguish these characters from the rest to keep discarding the greatest amount of possibilities until the character in question is found with the least amount of requests. 2.2.1.3 Dealing with the full ASCII range In situations when it is needed to deal with the full ASCII range, all the bits are used, in contrast with the character class specific scenarios where some bit combinations never happen because the character range doesn't contain all possible characters. The decision tree used when the BIT_COUNT is equal to 2 is the following one: .---------------------. | BIT_COUNT = 0x02 | '---------------------' | | 0 <------------------------------R2#0---------------------------------> 1 | | | | v v R2H ----> 1 ----> 000011 R2H#0 | | | | v 0 <-------------'-----> 1 0 | | | | | | .-----R0#0 ----> 1 v v | | R0#0 R1#0 ---------> 1 | | | | | v v | | | R1L R0L 0 <-----'--> 1 v v .-------' | || | | R1L R1L v | 0 <-----'| v | | | 0 v | v R1L v | | | 1 | 1 | R0L | 0 <----'---> 1 | | | | | | 0 <-'---> 1 | | v v | v 0<-'-> 1 | | | | | 001001 000101 | 010001 | | | | | v v | | | | v | R0L R0L 100001 | | | 110000 | | | | | 0 <---| | | | | | | 1 | | | v | | | 001100 | | 001010 | | | v v | | v 101000 011000 v | 010010 000110 | | v 100010 The previous tree is used when the bitstring has 2 bits set. The tree extracts specific pairs of bits and then, based on the results it gets, other groups of bits are strategically selected with the objective of finding the quickest path to extract each possible value. When certain routes of the tree are taken, other possibilities are discarded. The core of the technique is to logically select specific groups of bits to deduce others. This table shows how many requests must be done for finding a character with the previous example (neglecting the BIT_COUNT): -------------------------------- 2 requests: 1 possibility 4 requests: 10 possibiilities 5 requests: 4 possibilities ---------------------------------- To emphasize the optimization it would be worth mentioning that sqlmap takes always 7 requests to find a character. So the worst case scenario of this method is even more convenient than sqlmap's best case scenario. 2.2.1.4 Binary Tree Inversion This is an example of another optimization technique that can be used when dealing with the full ASCII range. For instance. if the BIT_COUNT of a character is equal to 4, then it means that the possibility of finding a ‘1’ is of 4/6 and the possibility of finding a ‘0’ is 2/6; it is more probable to find a 1 than a 0, looking for 1s is faster than lookin for 0s. In contrast, if the BIT_COUNT of a character is equal to 2, then the possibility of finding a ‘1’ is equal to 2/6, meaning that it is more likely to find the location of bits equal to ‘0’ (4/6). To optimize the extraction process, if the BIT_COUNT is 2. The values of the bits should be inverted meaning that instead of having 4 bits with a value of ‘0’, the bits are going to be inverted leaving a total of 4 bits with a value of ‘1’, this way the probability of finding bits that are set will be increased, making the process faster and shorter. Once the set bits are found, the bits of the entire string will be inverted once again to obtain the original string of bits. In this way, the decision-making trees can be reused in different scenarios, making the code shorter. A quick test-case was made to determine the difference in requests and time when a normal tree is used against when the tree is inverted: Method Number of requests Time -------------------------------------------------- Normal tree 9090 71 seconds Tree inversion 8734 61 seconds 2.2.2. Bit superposition: returning 2 bits at once with only 1 injection Since each blind injection can only result in 1 of 2 possible boolean responses (TRUE or FALSE), it is assumed that it would be impossible to have 4 possible sequences of bits (00, 01, 10, 11) and determine with only 1 response which one of the four sequences is being selected. Regardless of this belief, it is possible to retrieve 2 bits at a time with only 1 boolean request. It works like this: ?id=1 AND (CASE WHEN (@:=(select mid(bin(ascii(mid(password,1,1))),1,2) from usuarios limit 1))='00' THEN 0 WHEN @='01' THEN 1 WHEN @='10' THEN (0 OR NOT SLEEP(0.5)) ELSE (1 AND NOT SLEEP(0.5)) END) - If the bit sequence is ‘00’, the query returns FALSE. - If the bit sequence is ‘01’, the query returns TRUE. - If the bit sequence is ‘10’, the query returns FALSE but after a delay since a SLEEP is used. - If the bit sequence is ‘11’, the query returns TRUE after a delay because of the SLEEP instruction. Even if the application can only reply with only 2 possible responses, 4 possible bit sequences can be found with just 1 injection. This technique can be used for scenarios in which it is more convenient to do as less requests as possible (for example, when you want to be stealthy by doing big intervals of time between each request to avoid detection). It would take half the requests. 2.2.3. Taking Advantage Of Non-Boolean Blind SQL Injections (Lightspeed.py) OWASP defines a blind SQL injection as the following: “Blind SQL (Structured Query Language) injection is a type of SQL Injection attack that asks the database true or false questions and determines the answer based on the applications response.” [8] The most common example of such kind of security flaw is a vulnerable login form that provides authentication functionality for a web application. Attacking this type of component is generally considered blind because only 2 types of boolean responses are possible: the user either logs in, or not. However, this paper presents a method to allow the data extraction process to become significantly faster. Consider the following vulnerable code: $login = sql_query("SELECT * FROM users where user='$username'") This vulnerablity can allow attackers to bypass the login. But it can also be exploited to query information from the database and get as a result a ACCESS GRANTED or ACCESS DENIED. Nevertheless, the reality is that in most applications this isn’t a blind injection. Consider the following SQL injection: ' or user=(case when (@:=(select mid(password, 1, 1) from users limit 0,1))='A' then 'zero-cool' when @='B' then 'crash-overrride' when @='C' then 'acid-burn' end)-- - It is possible to test multiple conditions in only 1 request which causes the client to log in with different users depending on the result of the conditions. By determining which user was authenticated, it is possible to know which of the multiple condition was met. This injection demonstrates that the application can respond with not only one TRUE response, but actually multiple types of different TRUE responses. By being able to login with different users, it is possible to gather information faster by testing many conditions with just 1 request. This can accelerate the data exfiltration process significantly. It would be a waste of resources to create a lot of accounts to test as many conditions as possible. The method that will be presented in this section retrieves big amounts of information with only 3 accounts needed. Two bits of the data will be returned at once. Think of the binary representation of the numbers 0 to 3: Binary bitstring Decimal Representation ---------------------------------------------- 00 0 01 1 10 2 11 3 To perform this technique, the attacker must have control of 3 different accounts. Consider the following vulnerable login: $login = sql_query("SELECT * FROM users where user='$username'") The injection to attack this component, looks like this: ' or user=(case when(@:=(select mid(bin(mid(password, 1, 1),2,1) from users limit 0,1)) ='00' then 'zero-cool' when @='01' then 'crash-override' when @='10' then 'acid-burn' else '' end)-- - This injection retrieves 2 bits at once, by seeing which user logged-in or if no user was logged-in at all, it is possible to determine which of the 4 bit combinations was found. Since the requests don't have to be sequential as each query is independent from the previous one, threading can be used to retrive all 8 bits in only 4 instantaneous requests. 7 accounts could be used to retrieve all 8 bits in 3 requests. 4. Extraction Strategies 4.1. Comparison between compressed string lengths To further accelerate the data extraction process, the built-in compress() function can be used. group_concat() is used for concatenating all the rows in the result into a single row. The limitation is that group_concat can only contain a maximum of 1024 characters. New DBMS now support the json_array() function, which can also concatenate all the rows but with a maximum of more than 1 million characters. The following table shows the corresponding lengths of group concatenated strings and their respective compressed lengths. Query Length compress() ----------------------------------------------------------------- group_concat(table_name) 1024 440 json_arrayagg(table_name) 1316 447 group_concat(column_name) 1024 458 json_arrayagg(column_name) 56897 8313 5. References [1] https://github.com/sqlmapproject/sqlmap/wiki/Techniques [2] https://media.blackhat.com/us-13/US-13-Salgado-SQLi-Optimization-and-Obfuscation-Techniques-Slides.pdf [3] https://enderspub.kubertu.com/blind-mysql-injection-using-bit-shifting [4] https://stealingthe.network/efficient-time-based-blind-sql-injection-using-mysql-bit-functions-and-operators/ [5] https://ajxchapman.github.io/security/2017/01/14/blind-sql-injection.html [6] http://blog.k3170makan.com/2012/01/bit-shifting-blind-injection-simplified.html [7] https://youtu.be/HMzq_KoTFxU [8] https://owasp.org/www-community/attacks/Blind_SQL_Injection [9] https://youtu.be/_hvv5OBWed4 [10]https://www.websec.ca/publication/Blog/solutions-challenge-2B