Algorithm for Finding the Best Trading Strategy Quickly

Suppose you signed up for Interactive Brokers and wanted to find the best Algo in the most efficient (fastest) way possible—spending as little time testing the inferior strategies as humanly possible while being assured of finding the best Algo.

Also assume you want to try only one Algo at a time and you have a binary criteria for success/failure. Something like: a success is a complete fill below the market open. Then your problem is simple.

This is PROVEN to get you to the best strategy in the most efficient way possible with the THEORETICAL MINIMUM REQUIREMENT FOR TESTING INFERIOR STRATEGIES (exploring). Unless you are psychic, of course.

You needed to use Python for this and update the number of successes and failure of each strategy as you go (with this being just one possible example of successes/failures for each strategy):

strategy1_successes = 3
strategy1_failures = 2
strategy2_successes = 4
strategy2_failures = 2
import numpy as np
a1=strategy1_successes +1
b1=strategy1_failures + 1
a2=strategy2_successes + 1
b2=strategy2_failures + 1
trade_strategy1 = np.random.beta(a1, b1)
trade_strategy2 = np.random.beta(a2, b2)
if trade_strategy1 > trade_strategy2:

I have tried this a large number of times. Python can run it a HUGE number of times with hypothetical numbers, automatically. When there is a significant difference in the strategies it gets you using the best strategy very quickly.

This is an implementation of Thompson Sampling and has been proven to be optimal for binary (success/failure) decisions. It is an example of explore/exploit algorithms. In this context, explore new trading Algos efficiently but get to the business of exploiting the best algorithm a soon as possible.

There is a series of 3 YouTube videos that go into this is some detail with expanded Python code. This is probably the most useful resource (since the code works well and it sticks to practical uses) for anyone who wants to explore this deeper: [url=][/url]



I just wanted to provide some additional code so you can test this method (multiple times with different parameters) and see how well it works. See for yourself if this interests you.

I guess I am posting because I am frankly astonished at how well it works. The end of classical statistical testing?

There is serious discussion about using this in medicine. And it has been used in medicine–saving lives in the placebo control group when it has been tried. The purpose of using this would be keep the use of an inferior treatment to a minimum (e.g., an inferior vaccine or placebo). Possibly reducing the number of deaths for serious diseases–by keeping the use of an ineffective treatment to a minimum.

So this code uses (Bernoulli) payouts for multiple slot machines (i.e., the classic Bernoulli multi-armed bandit problem). But can be replaced with treatment method (or trade strategy).

I can guarantee that you have participated in such trials numerous times. Websites frequently present different pages (or ads) using this algorithm to determine which page (or ad) to present to the next visitor to the site: you are the test subject.

Anyway, if interested see for yourself. Modify the number of “slot machines” (or trade strategies), the number of trials (turns), the range of conversation rates (probability of vaccine successes or trade wins) and add a random seed if you wish (remove the #s). Decide for yourself. But I am absolutely amazed at how quickly and effectively it finds the best machine (treatment or trade strategy).


import numpy as np
number_of_turns = 1000
number_of_slot_machines = 5
conversion_rates = np.random.uniform(0.1, 0.9, number_of_slot_machines)
for i in range(number_of_slot_machines):
print(‘Conversion rate for slot machine {0}: {1: .2%}’ .format(i, conversion_rates[i]))
number_of_positive_rewards = np.zeros(number_of_slot_machines)
number_of_negative_rewards = np.zeros(number_of_slot_machines)
outcomes = np.zeros((number_of_turns, number_of_slot_machines))
for turn_index in range(number_of_turns):
for slot_machine_index in range(number_of_slot_machines):
if np.random.rand() <= conversion_rates[slot_machine_index]:
outcomes[turn_index][slot_machine_index] = 1
#print(outcomes[0:15, 0:6])
for i in range(number_of_slot_machines):
print(‘Mean for column {0}: {1: .2%}’ .format(i, np.mean(outcomes[:, i])))
for turn_index in range(number_of_turns):
index_of_machine_to_play = -1
max_beta = -1
for slot_machine_index in range(number_of_slot_machines):
a = number_of_positive_rewards[slot_machine_index] + 1
b = number_of_negative_rewards[slot_machine_index] + 1
random_beta= np.random.beta(a,b)
if random_beta > max_beta:
max_beta = random_beta
index_of_machine_to_play = slot_machine_index
if outcomes[turn_index][index_of_machine_to_play] == 1:
number_of_positive_rewards[index_of_machine_to_play] +=1
number_of_negative_rewards[index_of_machine_to_play] += 1
number_of_times_played = number_of_positive_rewards + number_of_negative_rewards
for slot_machine_index in range(number_of_slot_machines):
print(‘Slot machine {0} was played {1} times’ .format(slot_machine_index, number_of_times_played[slot_machine_index]))

print(‘\nOverall Conclusion: The best slot machine to play is machine{}!’ .format(np.argmax(number_of_times_played)))

Interesting. It’s a great algorithm, especially in medicine, although I don’t have much hope that it will be widely used in the near future.


Thank you.

I could not agree more. I was introduced to this type of algorithm in this book. Even thought the specific algorithm above is never mentioned: Algorithms to Live By: The Computer Science of Human Decisions

The book makes your point very well. Here is a small excerpt that makes your point with one study of extracorporeal membrane oxygenation (ECMO):

“The team was clear that they wanted to address, as they put it, “the ethical issue of withholding an unproven but potentially lifesaving treatment,” and were “reluctant to withhold a lifesaving treatment from alternate patients simply to meet conventional random assignment technique.” Hence they turned to Zelen’s algorithm. The strategy resulted in one infant being assigned the “conventional” treatment and dying, and eleven infants in a row being assigned the experimental ECMO treatment, all of them surviving.”

But Chaim, exactly as you say, the medical community did not accept this and went on to do another study using classical statistics with many moire infants dying. The study using classical statistics confirmed the original study and ECMO is now the standard of care.

The book actually discusses Gittins Index that is similar to the algorithm above (in its goal) but it can be looked up in a table. You can Google it. Interestingly the Gittens Index requires a discount factor–like the discount dividend model–which may fit in well with many financial problems.

The book has a lot of other great algorithms. Like the optimal way to find the best secretary (or spouse) under certain constrained assumptions. This is a class of algorithms called early stopping.

And one can get a greater appreciation for what P123 does. Even sorting is not a trivial thing: does P123 use the Bubble Sort? I am guessing not after reading the book.

The book has no equations (or code). It is for the layperson and perhaps for programmers that want some light bedtime reading. I highly recommend it.

But Chaim, even if you do not read the book it pretty much proves your point.



Which book is this?

One reason why I don’t expect this to catch on in the medical community is because people like to keep doing things the way they were trained, even when the data changes.

For example, to disprove conventional training on the cause (and treatment) for ulcers, two doctors needed to go to extraordinary measures to convince the medical community. Doctors continued to treat ulcers with invasive surgery that was not very effective.

They had to pull off a stunt of swallowing this bacteria, getting ulcers, and then curing themselves with this simple treatment of Pepto Bismol and an antibiotic before the medical community took them seriously.

This discovery took place back in the 1980’s. Yet, I know a man who had surgery done for his ulcer 20 years after this discovery became well known.

This is just one of the obstacles faced in changing standard medical study methodology. There are a number of other obstacles that are even more difficult to overcome.

P.S. After thinking this over, I realized that there are situations when Thompson Sampling is not ideal. For example, in guiding the development of refinements to the treatment (or trading algo). If you can prove, for example, that Ivermectin improves mortality significantly in COVID-19 patients, it could give a clue to try other similar drugs as well, to see if they might perhaps work even better. But if you do Thompson Sampling, it’s a good way to choose between existing treatment options but it doesn’t help guide future research as well. At least as far as I understand it.

Algorithms to Live By: The Computer Science of Human Decisions

Thompson sampling is used pretty extensively. Microsoft uses Thompson sampling in Bing to help decide which ad you will see for example. But it is just the beginning of the subject.

Thompson sampling can be done by us because of Python’s (Numpy’s) np.random.beta(a,b) function. Upper confidence bound algorithm equations can be entered into an Excel Spreadsheet in a pinch.

But the fact that these algorithms are easy enough that we can use them does not mean they will answer every problem we encounter.

For any good/professional programmers out there I would pay for an iPhone app that was essentially my own personal recommender system. Should I download another Lee Child book from or might I be better off exploring a new author? What restaurant tonight? A new wine or my favorite (for now)? I can’t have my laptop with me all of the time :wink:


So obviously, one might want to do this with continuous variables. Or at least I would like to program several promising technical trading strategies into a computer and have the computer find the strategy that works best–or even just works. Find the strategy with the best returns (returns is a continuous variable) efficiently and automatically.

In other words, it would be nice to use some reinforcement learning to find the trading strategies that maximize returns.

There are lots of papers on how to do this with continuous variables. Frankly, most are not that good. Part of the reason for this is that all of the publications try to prove an ‘upper bound on regret’ with mathematical theorems. This limits what they can try in practical terms. Basically, there has to be an upper bound on the returns to be able to prove the theorems: as there is with the Bernoulli Thompson Sampling example above.

Here is a nice article that tries to extend Thompson sampling to continuous variables:Thompson Sampling using Conjugate Priors

Unfortunately they do not quite get it right! The best charging “socket” in their example was used 1111 times out of 2000. It did find the best socket and exploited it most of the time. So nice start. But it is not really Thompson Sampling yet. And my strategy does better.

Their mistake (in frequentist terms) is that they use the standard deviation for the probability distributions rather than the standard error of the mean. They fail to use the improving knowledge of the best estimate of the mean that develops toward the end of 2000 trials (that the standard error of the mean provides).

Also I love Bayesian statistics but I question its use here. No one would argue that it doesn’t add bias here (which may or may not be useful at times).

So anyway. I coded my own method. Running the code just now, the program exploited the best socket (best trading strategy for us) 1675 times (the link above used it just 1111 times). There is some variance from trial to trial but, basically, I aways do better than the link above. And clearly should do better. They did not get it quite right.

A few thing about the code. I run each strategy 5 times to get an initial estimate that will not be too affected by any wild outliers. This is better than using a Bayesian prior, IMHO. You can play with the number of initial trials. The best number of initial trails will depend on the data to some extent, I believe. But even 2 or 3 initial trials works well with this data–as does 20 or 30 trials.

Change the mean and the variance for the data if you want. This is the same means and variances used in the link above (to start with). And the article has some nice graphics to supplement my ideas: since the I use the same variables as the article.

As you can see I am not the best coder in the world! A good coder would have used np.argmax() at least once and indexed the ‘sockets’, I am sure. And use a few more ‘for loops’ no doubt.

But code LIKE this is the best for continuous variables that I have found. I think it would work for what I would like to do with a self-learning trading program. No doubt one could improve an what I coded this morning (after considerable reading and thought on the subject). Please let me know what you think (if you got this far in this post):

for i in range(5):
for i in range(5):
for i in range(5):
for i in range(5):
for i in range(5):
import numpy as np
for i in range(1975):
mean1 = np.mean(r1)
mean2 = np.mean(r2)
mean3 = np.mean(r3)
mean4 = np.mean(r4)
mean5 = np.mean(r5)
se1 = np.std(r1)/trials1**.5
se2 = np.std(r2)/trials2**.5
se3 = np.std(r3)/trials3**.5
se4 = np.std(r4)/trials4**.5
se5 = np.std(r5)/trials5**.5
mr1 =np.random.normal(mean1,se1)
mr2 =np.random.normal(mean2,se2)
mr3 =np.random.normal(mean3,se3)
mr4 =np.random.normal(mean4,se4)
mr5 =np.random.normal(mean5,se5)
if mr1==maxmr:
trials1 =trials1+1
elif mr2==maxmr:
i= np.random.normal(10,5)
trials2 =trials2+1
elif mr3==maxmr:
i= np.random.normal(8,3)
trials3 =trials3+1
elif mr4==maxmr:
i= np.random.normal(12,1)
trials4 =trials4+1
elif mr5==maxmr:
i= np.random.normal(11,6)
trials5 =trials5+1

print(‘mean1:’, mean1)
print(‘se1:’, se1)
print(‘trials1:’, trials1)
print(‘se2’, se2)
print(‘trials2:’, trials2)
print(‘mean3:’, mean3)
print(‘se3:’, se3)
print(‘trials3:’, trials3)
print(‘mean4:’, mean4)
print(‘se4:’, se4)
print(‘trials4:’, trials4)
print(‘mean5:’, mean5)
print(‘se5:’, se5)
print(‘trials5:’, trials5)