All,
I asked chatGPT: “write python code for Thompson sampling of a 2-armed bandid.” Even misspelling bandit I might note.
I think it got it right but I though it could have added a summary of the results……
Me: “please add a summary of the results in the code.” In short, it did.
But I also wanted the probabilities associated with each arm so I asked it: “please add the probability associated with each arm.”
Here is the final output of the code run in my Jupyter Notebook without any changes: "Arm 1 was pulled 7 times and obtained a total reward of 0. The probability of success for arm 1 is 0.11
Arm 2 was pulled 993 times and obtained a total reward of 499. The probability of success for arm 2 is 0.50"
This is basically on online method for performing an A/B test. You do not have to go through 1000 samples before starting to picking the bandit (or trading strategy) with the best payout more frequently is that advantage of this method.
The inferior strategy was tried only 7 times here.
Here is a second run: “Arm 1 was pulled 44 times and obtained a total reward of 16. The probability of success for arm 1 is 0.37
Arm 2 was pulled 956 times and obtained a total reward of 480. The probability of success for arm 2 is 0.50”
More generally, ChatGPT does some fairly advanced machine learning. Impressive I think.
Jim
The code generated:
import numpy as np
Define the parameters of the bandit
p1 = 0.3 # Probability of getting a reward for arm 1
p2 = 0.5 # Probability of getting a reward for arm 2
Define the prior distribution for each arm
a1, b1 = 1, 1 # Beta distribution parameters for arm 1
a2, b2 = 1, 1 # Beta distribution parameters for arm 2
Define the number of times to pull the arms
num_pulls = 1000
Define variables to track the number of times each arm is pulled and the total reward obtained
num_pulls_arm1 = 0
num_pulls_arm2 = 0
total_reward_arm1 = 0
total_reward_arm2 = 0
Pull the arms using Thompson Sampling
for i in range(num_pulls):
# Sample a probability of success for each arm from its prior distribution
theta1 = np.random.beta(a1, b1)
theta2 = np.random.beta(a2, b2)
# Select the arm with the highest sampled probability of success
if theta1 > theta2:
arm = 1
else:
arm = 2
# Sample a reward from the Bernoulli distribution with the corresponding probability of success
if arm == 1:
reward = np.random.binomial(1, p1)
num_pulls_arm1 += 1
total_reward_arm1 += reward
# Update the posterior distribution for arm 1
a1 += reward
b1 += 1 - reward
else:
reward = np.random.binomial(1, p2)
num_pulls_arm2 += 1
total_reward_arm2 += reward
# Update the posterior distribution for arm 2
a2 += reward
b2 += 1 - reward
# Print the results of each pull
print(f"Pull {i + 1}: Arm {arm}, Reward {reward}")
Calculate the probability of success for each arm based on the posterior distribution
prob_arm1 = a1 / (a1 + b1)
prob_arm2 = a2 / (a2 + b2)
Print the final results
print(f"Arm 1 was pulled {num_pulls_arm1} times and obtained a total reward of {total_reward_arm1}. The probability of success for arm 1 is {prob_arm1:.2f}“)
print(f"Arm 2 was pulled {num_pulls_arm2} times and obtained a total reward of {total_reward_arm2}. The probability of success for arm 2 is {prob_arm2:.2f}”)
In this updated version, we calculate the probability of success for each arm based on the posterior distribution using the formula a / (a + b). We then print the final results, which show the number of times each arm was pulled, the total reward obtained for each arm, and the probability of success for each arm. This allows us to evaluate the performance of the Thompson Sampling algorithm and compare the probability of success for each arm.
Regenerate response
ChatGPT Mar 14 Version. Free Research Preview. Our goal is