Implementing Ethereum trading front-runs on the Bancor exchange in Python

FLy_鵬程萬里發表於2018-06-24


                            Launching the attack: the green letters look just like on TV

This post is a deep-dive into programmatically trading on the Ethereum / Bancor exchange and exploiting a game-theoretic security flaw in Bancor, a high-profile smart contract on the Ethereum blockchain. The full code can be found athttps://github.com/bogatyy/bancor. We collaborated with the Bancor team to make sure the current exploit is protected against, although for a little while there would still be a chance to make some beer money for educational purposes.

Imagine trying to hack Bank of America — except you can read all of their code in advance, all of their transactions are public, and if you steal the money it’s irreversible. Sounds like a paranoid worst-case scenario? Well, this is exactly the setup Ethereum smart contract developers have to deal with every day. Bitcoin and the blockchain technology unlocked tremendous possibilities in international payments, and the Ethereum further magnified it by allowing to manage these payments through programs called smart contracts. However, smart contracts also give hackers a much easier setup for attacks.

Front-running is one such attack. The term originated in the stock market, back in the days when trades were executed on paper, carried by hand between the trading desks. A broker would receive an order from a client to buy a certain stock, but then place a buy order for themselves in front. That way the broker benefits from the price increase at the expense of their client. Naturally, the practice is unfair and was outlawed.

On the blockchain, the problem becomes a lot more severe. First, all the transactions are broadcast publicly. More importantly, blockchain participants across the world are not bound by the same relationship as a broker and their client, so attackers can exploit their knowledge of a pending transaction with impunity.


Several months ago, researchers at Cornell uncovered that Bancor, an ICO that spectacularly raised over $150M in funding over a few minutes, was vulnerable to front-running. They pointed out that miners would be able to front-run any transactions on Bancor, since miners are free to re-order transactions within a block they’ve mined. While the Bancor team gave athoughtful response, up until very recently, there has not been any progress on fixing the issue (more on that later).

Our research goes a step further. In fact, we show that it is both possible and practical to front-run Bancor as a non-miner. Which means you don’t need to the lucky miner who happens to mine the block with a Bancor trade to profit from front-running. You simply need to be a regular user monitoring the blockchain to perform this attack.

Surprisingly, the vulnerability does not seem to have been exploited so far (front-running is readily identifiable on the blockchain), so in this post we’ll examine exactly how one implements such an attack. Turns out, all it takes is about 150 lines of Python to get a working front-running algorithm. We also ran simulations to determine how much money one could make from front-running consistently (spoiler: an attacker could have had a ~117% ROI on the money they invested into the attack over July and August, chipping away from other Bancor users). Finally, I executed the attack against a single trade, making~$150 net of all fees, after which I returned the money to the person I front-ran and stopped the program.

Now, I know that relinquishing a working trading strategy would be a cardinal sin to any trader, but as it turns out, I am more curious than greedy. Implementing and countering attacks is not only a fascinating game, but also the cornerstone of advancing cryptographic security. Most importantly, I believe in the long-term impact of the blockchain ecosystem, and for the blockchain economy to fully develop, vulnerabilities like this need to be understood and protected against.

So let’s dig in.

Background

Bancor is a protocol for trading and pricing Ethereum ERC-20 tokens, as well as the eponymous token, abbreviated as BNT, with the current market capitalization of approximately 180 million dollars. The core problem Bancor solves is as follows: normally, for a trade to happen, there has to be a buyer and a seller, having opposing desires (to buy and to sell) at the same moment in time. This limitation may be fine for large publicly traded stocks, but for long-tail crypto tokens this can create a serious inconvenience.

Bancor solves this problem by allowing anyone to trade against a public smart contract, which offers an automatically calculated token price following a precise formula. Essentially, Bancor is fulfilling the role of market-makers in traditional finance. The smart contract has an Ethereum reserve, and as more people buy the token, reserves grow and the price goes up. Consequently, when people sell, the contract adjusts the price to go down, so that the reserve is never depleted entirely. Unlike most other exchanges, where trades are managed off-chain, with Bancor every order is a self-contained Ethereum transaction (money + data).

Unfortunately, the current setup contains a flaw, allowing anyone to front-run large transactions and make guaranteed profit. Let us expand on what makes the attack possible. When somebody broadcasts a transaction on the Ethereum network, it becomes available to other nodes almost immediately as a pending transaction and is added to the common queue, but it is not confirmed until the block confirmation hash is found by some miner (thus confirming all the transactions in that block), which tends to happen once every ~20 seconds. Further, up until the block is confirmed, the order of the pending transactions is up for grabs, and miners basically sort transactions by how much they’re paid per gas (that is, per unit of computation they’ll have to perform).

This discrepancy creates an attack vector: any user running a full-node Ethereum client can spot a pending transaction and insert their own transaction in front of it by paying more per gas. If you see a large BUY is about to happen, you know the BNT price will increase (following their deterministic formula), so if you buy in before that transaction you get an instant appreciation of your tokens and a guaranteed return on your investment. Similarly, if somebody sent out a pending SELL, an attacker can sell their tokens in front.


Given we will be implementing our attack using Ethereum client API, now would be a good time to take a step back and give a general overview of how the Ethereum distributed applications (or DApps for short) landscape looks like. At a high level, implementing DApps is fairly similar to regular Web applications. The backend is a smart contract, running on the Ethereum blockchain, typically implemented in Solidity and then deployed to the network. Then there is client software, which interacts with the backend by sending transactions.

Just like in the regular world, smart contracts (the backends) receive most of the attention, with many high-profile smart contracts and several high-quality developer guides appearing recently (personally, what I’ve found most useful was fiddling with the examples from Solidity official docs, as well as great intro guides by Hudson and Karl). However, on the front-end side, I currently do not know of any non-trivial client-side applications (an example of such an application would be a decentralized poker client, where the majority of compute has to happen off-chain, on the players’ machines). Right now, the only way for users to interact with smart contracts is to send transactions manually, either by running their own full node (for example, by using the geth client), or by relying on third-party web services (like MyEtherWallet). Clearly, this would have to change: the current situation is about as convenient as manually sending POST requests through Telnet every time you wanted to browse the Web.


Easy Mode: high-frequency trading by hand

The simplest way to confirm the vulnerability is by hand. We will not need any tools except a Web browser and a wallet with some Ether.

First, separate your Ether between two wallets equally (it will not work from a single wallet). Second, go to MyEtherWallet and, following Bancor purchase instructions, set up two BUY transactions (do not click Send yet!). You should prepare two equivalent transactions from both wallets, but make sure gas price on the first wallet is lower than on the second.


Sending ETH to the Bancor purchase contract automatically returns BNT to your account. Note: mind the gas!

Now, when it’s set up, click “Send” on the first wallet (with the lower gas price), and then “Send” on the second wallet, with the higher gas price. If you did everything right, the transaction that was submitted second would actually be processed first, and get more BNT per same deposit!

Transaction 1 (transaction from wallet 1, submitted first, fulfilled second)

BNT tokens received: 11.014424733254973428

Transaction 2 (transaction from wallet 2, fulfilled first, should get better price)

BNT tokens received: 11.014423186864343663

Notice the letdown: while the front-running transaction should’ve gotten a better price, it actually got a slightly worse one. Upon careful investigation, I realized this is because the rather arcane implementation of the Bancor formula can deviate quite significantly from the theoretical formula, especially for smaller amounts (specifically, error in 4th digit for 0.1 ETHor roughly $20 transactions). After querying the BancorFormula contract for a bit, I learned that even for 1 ETH the precision is not good enough, but if we’re willing to make 10 ETH-sized transactions or larger, we’d actually observe a better price as predicted. The Bancor team recently updated the formula and mentioned it is a lot more accurate. Ultimately, I decided to skip re-doing this experiment with larger transaction sizes or the new formula in favor of the actual front-run.


Hard Mode: trading automatically

Now, unless we are ready to sit in front of a computer all day and hit refresh on etherscan.io, the process needs to be automated. Luckily, most Ethereum clients provide a JSON RPC to interact with the blockchain and automate away the low-level details of interacting with the blockchain.

You just need to run a full node client and send API requests tolocalhost:8545.

$ sudo apt-get install software-properties-common
$ sudo add-apt-repository -y ppa:ethereum/ethereum
$ sudo apt-get update
$ sudo apt-get install ethereum
$ geth --rpc

Here is an example curl request that looks up a transaction by hash (in this case, a huge BUY order on Bancor):


$ curl -X POST --data \
    '{"jsonrpc":"2.0","method":"eth_getTransactionByHash", \
    "params":["0x314e0246cfc55bc0882cbf165145c168834e99924e3ff7619ebd8290e713386d"], \
    "id":123}' localhost:8545

{"jsonrpc":"2.0","id":123, "result":{
"blockHash":"0x42ea1578c23b159186853961dfbdfcdec6b40d23d8f1d971827412bc6948386b",
"blockNumber":"0x3dfb88",
"From":"0xcaf82fcb3a0323566c0f306684376e3e66d6284b",
# this is the Bancor Purchase contract address
"To":"0x77a77eca75445841875ebb67a33d0a97dc34d924",
# converting hex to decimal, this is 50 gwei, making it a 2$ fee on half-million dollar transaction
"gasPrice":"0xba43b7400",
"hash":"0x314e0246cfc55bc0882cbf165145c168834e99924e3ff7619ebd8290e713386d",
# converting hex to decimal, this is 2*10^21, which is 2000 ETH, or half-million USD
"value":"0x6c6b935b8bbd400000",
...}}
Now let’s send the same request using Python:


$ pip install requests
$ python

import json
import requests

def get_transaction(tx_hash):
  url = 'http://localhost:8545'
  headers = {'content-type': 'application/json'}
  request = {
      'jsonrpc': '2.0', 'id': 1,
      'method': 'eth_getTransactionByHash',
      'params': [tx_hash]
  }
  return requests.post(
          url, data=json.dumps(request), headers=headers).json()

print get_transaction('0x314e0246cfc55bc0882cbf165145c168834e99924e3ff7619ebd8290e713386d')

If you got the same output as from curl, congratulations! The hardest part of learning to programmatically interact with the blockchain is already over.

Now implementing the front-running trader becomes a matter of putting together a few common API requests (pseudocode below for brevity). If the goal is to be maximally efficient, it is better to avoid cashing out in between transactions: front-running can be done in both directions, and it would be the most profitable to only sell once we see a pending sell, and only buy once we see a pending buy:

filter_id = requests.post(...Set up pending transactions filter...)
while True:
	new_pending_tx = requests.post(...Listen to filter_id...)
       if holding_bnt and is_large_sell(new_pending_tx):
		requests.post(...Sell everything with a larger gas price...)
		holding_bnt = False
       elif (not holding_bnt) and is_large_buy(new_pending_tx):
		requests.post(...Buy everything with a larger gas price...)
		holding_bnt = True

In my case I only wanted to prove that the idea works (and yields non-negligible amounts of money), so I did the front-run once and sold immediately. Better yet, in this case the “loser” of the trade is easy to pin-point (the only person losing money is the owner of the trade being front-ran), so it was easy to return that money too. More details to be found in thefull code on my GitHub.

The results of the front-run (transactions in reverse chronological order):

We made 0.477 ETH, or a ~0.5% return in less than a minute! I calculated the amounts (my own principal and the threshold for front-running) so that the profit would be at least $100, which I deemed convincing enough for the purposes of my post. Noteworthy, just a few days later there was a whopping5856 ETH trade purchasing Bancor, which would have yielded an approximately 9% return, or about $3000 given the same 100 ETHprincipal!

The next part will explain where do these numbers come from and how to calculate the return from front-running a given trade.



Simulations & ROI evaluation

Let us describe the two core assumptions behind the Bancor pricing system, and the exact formula that is derived from those assumptions. First, Bancor maintains a constant ratio between the traded token market capitalization and total reserve token value. As of right now, the only traded token isBNT, and the only reserve token is ETH. Assume there is approximately 70K ETH in reserves (which is roughly the case) and the reserve ratio is 10%. This means the whole market cap of the system is implied to be 700K ETH, and the price per BNT is determined from this total market cap, divided by totalBNT supply. If somebody buys BNT with ETH, the whole ETH amount is added to the reserves, thus pushing the price up (there was more value added to the reserves than tokens issued), increasing the value of everybody else’s tokens. Conversely, selling BNT reduces the reserve and thus the price per token (this way, the reserve never gets depleted, you just get less and less of it per token).

The second assumption is that a large trade should be equivalent to making a set of smaller trades of the same size (so to determine the final price, one would have to calculate an integral of Price d(Size)). Turns out these two assumptions are sufficient to uniquely define the behavior of the system in all cases. For those with a strong mathematical inclination, there is proofavailable. Here are the resulting formulas:



For our practical purposes though, the exact formulas are not really necessary (and as we have learned, the actual implementation is not very precise anyway). Approximating the pricing formula with the linear part of its Taylor’s series, we get:

NEW_PRICE ~= OLD_PRICE * (1 + DEPOSIT / RESERVE_BALANCE)

Interestingly, the RESERVE_RATIO is not a part of the approximation. I’ll skip the full derivations here, but basically, it cancels out in the numerator and the denominator, so the only thing that ends up mattering isRESERVE_BALANCE = TOTAL_MONEY * RESERVE_RATIO. This pricing formula is a good approximation as long as the deposit is much smaller than the total reserve. So if BNT has 70K ETH in reserves, and somebody invests 700 ETH, the price goes up by 700 / 70K = 1%. If somebody withdraws 700 ETH out of the system, the price drops 1%. Thus in both cases front-running those transactions would give us instant 1% ROINote that our theoretical approximation matched practice quite well: a 350 ETH trade yielded a 0.477% return, against approximately 0.5% predicted.

Based on these calculations, I wrote some code to evaluate how much money could have been made front-running Bancor in July and August. Assuming we leave small transactions alone and only go after big ones (> 100 ETH) so that gas prices don’t really matter, we get:

$ python simulation.py
...
ROI for front-running all transaction >= 100 ETH:
July 88.7%
August 28.6%
With a principal of 100 ETH, that would make you $35190

Not bad: an attacker can more than double the money they invested into the attack over a couple months, chipping away from other Bancor users. In practice, the profitability threshold is smaller than 100 ETH (the total fees one would need to beat come out to a few dollars), though it also depends on the principal invested by the attacker.

Front-running other Bancor-exchangeable tokens

While we did show that it was possible, BNT itself is relatively hard to front-run because of the large reserves: its price changes very little between transactions. But for any smaller token the 1 / RESERVE_BALANCE fraction would skyrocket an attacker’s profits and rob honest investors very fast. To prove that, I have deployed my own token following the Bancor Protocol (addresscode) and made instant 2X profit front-running a large transaction.

Our FBT (short for Front-Runnable Bancor Token) was initialized to have a total supply of 1M wei (where wei is the smallest possible unit in Ethereum, equal to 10^-18 ETH ), and the total initial supply of 2M tokens. Given the 10% reserve rate ratio, the implied market capitalization of our token becomes 10M wei , and the price per token 10M / 2M = 5 wei . Now what happens when the front-runner makes a deposit before a large “honest” BUY, but sells after?

TX1, front-run:1000 wei gives 199 FBT , in line with the price we calculated

TX2: very large buy, increasing the price roughly 2X

TX3: attacker withdraws 199 FBT for 1910 wei, doubling their money

Since the intended use of Bancor is to serve the lesser known tokens (which may not have enough demand and liquidity for regular exchanges), these tokens would naturally have smaller reserves, making this vulnerability especially dangerous. In an extreme case where the reserve is very low and an attacker has a lot of money, they can extract more value from an honest investor’s deposit than the investor would get themselves!

Further, all of this is possible as a mere full node (that is, a person with a decent laptop). Miners, and especially miner pools, are in a privileged position and can do an order of magnitude more damage. Full-node attackers have to broadcast their transactions and thus risk their principal, whereas miners can mine blocks with their own front-run included, but never reveal it publicly unless they do mine the block. That way, they can profit at no risk or cost to themselves. Further, they can rearrange transactions within a block in whatever way they want, arbitrarily creating winners and losers out of other participants.

Ethics

Like any new technology, this situation raises very interesting ethical questions. Is the strategy we discussed “hacking”? Is it high-frequency trading? Is it simply an ability to make informed decisions based on public information faster than other investors? Ultimately, is this kind of trading “bad”?

Personally, I took an easy way out and made the decision that lets me sleep the most soundly: returning the money after proving the point and stopping the program. Nevertheless, I would not have blamed someone in a similar spot if they decided to do otherwise.

Solutions

Over the past month, Haseeb Qureshi and I discussed several solutions with the Bancor team, making sure the vulnerability is contained. Yesterday, the team released a fix that handles most of the risk in practice. Long-term, theoretically robust solutions are fairly complex, and while analyzing those can be a whole different post, I want to briefly mention the options available.

TL;DR don’t send large transactions and use Bancor Web3 UIthat sets a minReturn for you.

One partial solution is to set a minReturn on trades, basically canceling your order if you realize someone squeezed in front of you. This does prevent attackers from making guaranteed money instantly, but raises the question: what is a buyer supposed to do next? The order just revealed their intention to buy, and presumably they still want their Bancor, so they’ll place another buy order sometime in the near future, which will eventually raise the Bancor price and, on average, will profit the attacker just the same.

Yesterday, the Bancor team released a Web3 interface that implements theminReturn solution. Longer-term, and assuming perfectly intelligent adversaries, this might lead to some curious Nash equilibria (e.g. front-runners might block trades unless they are allowed a small profit margin, though a lower one than if they were front-running entirely naive users), but right now, this should solve most of the practical risk.

The Bancor team also suggested setting a universal maxGasPrice, to make sure non-miner front-runners cannot bid higher. This would fully protect the users from non-miner attacks (at a cost of lower liquidity during network congestion), although the original front-runs by miners would not be affected. This fix will be out soon too.

More robust solutions would involve a version of the commit-reveal scheme, one of the go-to instruments in a cryptographer’s toolbox. Submarine sendsby Cornell researchers proposes the most beautiful and general solution that I know of. However, given the specifics of the Bancor protocol, a much smaller solution could suffice: do a commit-reveal with a penalty for non-revealing in Bancor tokens themselves. Basically, whenever someone commits a hash of a trade and doesn’t reveal, a percentage of both their Bancor and ERC-20 Ethereum tokens is burned (note that it works irrespective of the direction of the trade, thus revealing no information). If designed with careful attention to detail (for example, making sure that reveals are only accepted for commits in a previous block), this can fully solve the front-running problem, including front-running by miners.

Of course, even the simplified scheme is so complicated that almost no one would want to perform it by hand in MyEtherWallet. As the ecosystem grows, more sophisticated client programs (like the auto-trader we just implemented, or the commit-reveal client) would have to take that role on behalf of the users. Again: people aren’t sending their own Telnet requests every time they want to browse the Web, so why should the crypto world be any different?

Acknowledgements

This project started in collaboration with Haseeb Qureshi and Preethi Kasireddy at the IC3 Ethereum bootcamp under guidance from Ari Juels,Iddo Bentov and Phil Daian. The post was rewritten considerably with massive help from Haseeb and Nader Al-Naji, who is about to take over the world with BaseCoin.

A disclaimer just in case: this is a personal project and it does not represent my employer or anyone else’s opinions. Since I worked on it in my spare weekends, the timeline ended up being very protracted, and I hope the attentive readers will forgive some small discrepancies (the Ethereum price may oscillate across the post, and some Bancor contract addresses had to be updated).




相關文章