So obviously, one might want to do this with continuous variables. Or at least I would like to program several promising technical trading strategies into a computer and have the computer find the strategy that works best–or even just works. Find the strategy with the best returns (returns is a continuous variable) efficiently and automatically.

In other words, it would be nice to use some reinforcement learning to find the trading strategies that maximize returns.

There are lots of papers on how to do this with continuous variables. Frankly, most are not that good. Part of the reason for this is that all of the publications try to prove an ‘upper bound on regret’ with mathematical theorems. This limits what they can try in practical terms. Basically, there has to be an upper bound on the returns to be able to prove the theorems: as there is with the Bernoulli Thompson Sampling example above.

Here is a nice article that tries to extend Thompson sampling to continuous variables:Thompson Sampling using Conjugate Priors

Unfortunately they do not quite get it right! The best charging “socket” in their example was used 1111 times out of 2000. It did find the best socket and exploited it most of the time. So nice start. But it is not really Thompson Sampling yet. And my strategy does better.

Their mistake (in frequentist terms) is that they use the standard deviation for the probability distributions rather than the standard error of the mean. They fail to use the improving knowledge of the best estimate of the mean that develops toward the end of 2000 trials (that the standard error of the mean provides).

Also I love Bayesian statistics but I question its use here. No one would argue that it doesn’t add bias here (which may or may not be useful at times).

So anyway. I coded my own method. Running the code just now, the program exploited the best socket (best trading strategy for us) 1675 times (the link above used it just 1111 times). There is some variance from trial to trial but, basically, I aways do better than the link above. And clearly should do better. They did not get it quite right.

A few thing about the code. I run each strategy 5 times to get an initial estimate that will not be too affected by any wild outliers. This is better than using a Bayesian prior, IMHO. You can play with the number of initial trials. The best number of initial trails will depend on the data to some extent, I believe. But even 2 or 3 initial trials works well with this data–as does 20 or 30 trials.

Change the mean and the variance for the data if you want. This is the same means and variances used in the link above (to start with). And the article has some nice graphics to supplement my ideas: since the I use the same variables as the article.

As you can see I am not the best coder in the world! A good coder would have used np.argmax() at least once and indexed the ‘sockets’, I am sure. And use a few more ‘for loops’ no doubt.

But code LIKE this is the best for continuous variables that I have found. I think it would work for what I would like to do with a self-learning trading program. No doubt one could improve an what I coded this morning (after considerable reading and thought on the subject). Please let me know what you think (if you got this far in this post):

r1=[]

for i in range(5):

i=np.random.normal(6,2)

r1.append(i)

r2=[]

for i in range(5):

i=np.random.normal(10,5)

r2.append(i)

r3=[]

for i in range(5):

i=np.random.normal(8,3)

r3.append(i)

r4=[]

for i in range(5):

i=np.random.normal(12,1)

r4.append(i)

r5=[]

for i in range(5):

i=np.random.normal(11,6)

r5.append(i)

trials1=5

trials2=5

trials3=5

trials4=5

trials5=5

import numpy as np

for i in range(1975):

mean1 = np.mean(r1)

mean2 = np.mean(r2)

mean3 = np.mean(r3)

mean4 = np.mean(r4)

mean5 = np.mean(r5)

se1 = np.std(r1)/trials1**.5

se2 = np.std(r2)/trials2**.5

se3 = np.std(r3)/trials3**.5

se4 = np.std(r4)/trials4**.5

se5 = np.std(r5)/trials5**.5

mr1 =np.random.normal(mean1,se1)

mr2 =np.random.normal(mean2,se2)

mr3 =np.random.normal(mean3,se3)

mr4 =np.random.normal(mean4,se4)

mr5 =np.random.normal(mean5,se5)

maxmr=max(mr1,mr2,mr3,mr4,mr5)

if mr1==maxmr:

i=np.random.normal(6,2)

r1.append(i)

trials1 =trials1+1

elif mr2==maxmr:

i= np.random.normal(10,5)

r2.append(i)

trials2 =trials2+1

elif mr3==maxmr:

i= np.random.normal(8,3)

r3.append(i)

trials3 =trials3+1

elif mr4==maxmr:

i= np.random.normal(12,1)

r4.append(i)

trials4 =trials4+1

elif mr5==maxmr:

i= np.random.normal(11,6)

r5.append(i)

trials5 =trials5+1

print(‘mean1:’, mean1)

print(‘se1:’, se1)

print(‘trials1:’, trials1)

print(‘mean2:’,mean2)

print(‘se2’, se2)

print(‘trials2:’, trials2)

print(‘mean3:’, mean3)

print(‘se3:’, se3)

print(‘trials3:’, trials3)

print(‘mean4:’, mean4)

print(‘se4:’, se4)

print(‘trials4:’, trials4)

print(‘mean5:’, mean5)

print(‘se5:’, se5)

print(‘trials5:’, trials5)