Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality NASA ADS

Chuyên mục: Crypto News

deep reinforcement learning

Before any estimates can be given, both estimators need to have their buffers filled. By default the lengths of these buffers are set to be 200 ticks. In case of the trading_intensity estimator only order book snapshots different from preceding snapshots count as valid ticks.

To obtain MDA values we applied a random forest classifier to the dataset split in 4 folds.
With the same assumptions and quadratic utility function as in Case 1 in Sect.
Kumar , who uses Spooner’s RL algorithm as a benchmark, proposes using deep recurrent Q-networks as an improved alternative to DQNs for a time-series data environment such as trading.
The solution will be based on two different choices of utility functions, quadratic and exponential, in the sequel.

The successive orders generated by this procedure maximize the expected exponential utility of the trader’s profit and loss (P&L) profile at a future time, T , for a given level of agent inventory risk aversion. Inventory management is therefore central to market making strategies , and particularly important in high-frequency algorithmic trading. In an influential paper , Avellaneda and Stoikov expounded a strategy addressing market maker inventory risk. The optimal bid and ask quotes are obtained from a set of formulas built around these parameters. These formulas prescribe the AS strategy for placing limit orders.

🛠️ Strategy configs¶

The RL agents (Alpha-AS) developed to use the Avellaneda-Stoikov equations to determine their actions are described in Section 4.1. An agent that simply applies the Avellaneda-Stoikov procedure with fixed parameters (Gen-AS), and the genetic algorithm to obtain said parameters, are presented in Section 4.2. In setting the risk_factor it’s important to observe the reservation price in regards to the mid price. If the user wishes the spread between these two prices to be wider, the risk factor should be set to a higher value. The further away the reservation price is from the mid price, the more aggressive the strategy is in pursuing its target portfolio allocation, because orders on one side will be far more likely to be filled than on the other.

Saved! Here’s the compiled thread: https://t.co/bLkJt5B5AX

🪄 AI-generated summary:

‘The thread discusses a 2008 paper on high frequency trading by Marco Avellaneda and Sasha Stoikov, which introduced a new way for financial firms to market make more…

— Mem (@memdotai) October 14, 2022

Furthermore, the threshold of signals can be adjusted according to investors’ risk aversion. This type of labeling closely reflects actual transactions and earnings. High-frequency trading is a popular form of algorithmic trading that leverages electronic trading tools and high-frequency financial data. A GMT typical HFT algorithm is based on limit order book data (Baldauf and Mollner, 2020, Brogaard et al., 2014, Kirilenko et al., 2017). 1 illustrates the bid and ask prices and their 5-level queues for a stock at two consecutive time points . In this study, we implement a LOB trading strategy to enter and exit the market by processing LOB data.

2 Gen-AS: Avellaneda-Stoikov model with genetically tuned parameters

On the other hand, she does not face with the liquidation risk on the negative inventory levels but wants to receive higher amount for selling the assets. Is Markovian, the optimization problem can be solved using the stochastic control approach (Bates 2016; Björk 2012; Pham 2009). The solution will be based on two different choices of utility functions, quadratic and exponential, in the sequel. The role of a dealer in securities markets is to provide liquidity on the exchange by quoting bid and ask prices at which he is willing to buy and sell a specific quantity of assets.

learning

So, if we’re https://www.beaxy.com/ing more or less the same things, all good/accurate models tend to be analogous; they must correspond to one another if they each correspond to the same underlying physical phenomena. Stay informed on the latest trending ML papers with code, research developments, LINK libraries, methods, and datasets. Comparison of values for Max DD and P&L-to-MAP between the Gen-AS model and the Alpha-AS models (αAS1 and αAS2). Table 8 provides further insight combining the results for Max DD and P&L-to-MAP.

IEEE Transactions on Knowledge and Data Engineering

For the case of exponential utility function, now we explore the results of optimal controls obtained by solving the HJB Eq. Is the set of the admissible strategies, F and G are the instantaneous and terminal reward functions, respectively. Are the related depths at which the market maker posts the limit orders.

Gleaned some valuable insights on progress in the MM space since 2008 after speaking with Sasha Stoikov (from seminal 2008 Avellaneda-Stoikov paper: ‘High-frequency trading in a limit order book’).

— Philip Forte (@0xPhilip) October 14, 2022

Finally, the best-performing model overall, with its corresponding parameter values contained in its chromosome, is retained for subsequent application to the problem at hand. In our case, it will be the AS model used as a baseline against which to compare the performance of our Alpha-AS model. A wide variety of RL techniques have been developed to allow the agent to learn from the rewards it receives as a result of its successive interactions with the environment.

The actions performed by our RL agent are the setting of the AS parameter values for the next execution cycle. With these values, the AS model will determine the next reservation price and spread to use for the following orders. In other words, we do not entrust the entire order placement decision process to the RL algorithm, learning through blind trial and error. Rather, taking inspiration from Teleña , we mediate the order placement decisions through the AS model (our “avatar”, taking the term from ), leveraging its ability to provide quotes that maximize profit in the ideal case. In humble homage to Google’s AlphaGo programme, we will refer to our double DQN algorithm as Alpha-Avellaneda-Stoikov (Alpha-AS). One of the most active areas of research in algorithmic trading is, broadly, the application of machine learning algorithms to derive trading decisions based on underlying trends in the volatile and hard to predict activity of securities markets.

The results indicate that the proposed ranking methods yield quite more encouraging insights than the recent state-of-the-art works and can be acquired for ranking cricket teams. We also plan to compare the avellaneda stoikov of the Alpha-AS models with that of leading RL models in the literature that do not work with the Avellaneda-Stoikov procedure. Post-hoc Mann-Whitney tests were conducted to analyse selected pairwise differences between the models regarding these performance indicators. For every day of data the number of ticks occurring in each 5-second interval had positively skewed, long-tailed distributions. The means of these thirty-two distributions ranged from 33 to 110 ticks per 5-second interval, the standard deviations from 21 to 67, the minimums ran from 0 to 20, the maximums from 233 to 1338, and the skew ranged from 1.0 to 4.4.

Cricket teams are ranked to indicate their supremacy over their counter peers in order to get precedence. Various authors have proposed different statistical techniques in cricketing works to evaluate teams. However, it does not work well to realize the consistency of the teams’ performance. With this aim, effective features are constructed for evaluating bowling and batting precedence of teams with others. Eventually, these features are integrated to formulate the Consistency Index Rank to rank cricket teams. The performance of the proposed methodology is investigated with recent state-of-the-art works and International Cricket Council rankings using the Spearman Rank Correlation Coefficient for all the 3 formats of cricket, i.e., Test, One Day International , and Twenty20 .

In that case, the user is taking more inventory risk, because there will be no skew on the orders positions aiming to reach the inventory_target_base_pct. Parameter min_spread has a different meaning, parameter risk_factor is being used differently in the calculations and therefore attains a different range of values. Continuous-time stochastic control and optimization with financial applications. Risk metrics and fine tuning of high frequency trading strategies.

Top 10 Quant Professors 2022 – Rebellion Research

Top 10 Quant Professors 2022.

Posted: Thu, 13 Oct 2022 07:00:00 GMT [source]

The results obtained in this fashion encourage us to explore refinements such as s with continuous action spaces. The logic of the Alpha-AS model might also be adapted to exploit alpha signals . In the framework of the optimal trading strategy for high-frequency trading in a LOB, there have been many papers following early studies of Grossman and Miller and Ho and Stoll . Avellaneda and Stoikov have revised the study of Ho and Stoll building a practical model that considers a single dealer trading a single stock facing with a stochastic demand modeled by a continuous time Poisson process. The literature on the optimal market making problem has been burgeoning since 2008 with the work of Avellaneda and Stoikov , inspiring Guilbaud and Pham to derive a model involving limit and market orders with optimal stochastic spreads. Bayraktar and Ludkovski have considered the optimal liquidation problem where they model the order arrivals with intensities depending on the liquidation price.

level

Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality NASA ADS

🛠️ Strategy configs¶

2 Gen-AS: Avellaneda-Stoikov model with genetically tuned parameters

IEEE Transactions on Knowledge and Data Engineering

Top 10 Quant Professors 2022 – Rebellion Research

Bài liên quan

Gọi ngay để nhận tư vấn