so I went and looked at that game theory (study of field) textbook that had algorithms to solve payoff matrices
and I took the time to actually understand the estimation algorithm
turns out the estimation algorithm is very very very computationally feasible on a gba
you could probably run it even on a gameboy
so, now that actually solving a payoff matrix on a game boy is possible, this is how you would design a slightly more complex AI:
- firstly, we need to explain payoff matrices:
- this is an example of a payoff matrix: https://cdn.bulbagarden.net/upload/c/ca/GameTheory2.png
- player 1 is represented by the vertical strategies, while player 2 is represented by the horizontal strategies
- (note: the vertical move user is tyranitar (you), and the horizontal move user is alakazam (opponent))
- each cell represents the payoff (in this case the effective base power of the move) each player gets based on the strategy that they both choose
- for example, if tyranitar chooses earthquake and alakazam chooses HP fire, then player 1 would have a base power of 100 and player 2 would have a base power of 35
- there are algorithms which "solve" payoff matrices by finding the nash equilibrium, which is a state where neither player can do better by changing their own strategy if the other player does not change their strategy 
- there are two types of strategy profiles: pure strategy (always pick this strategy) and mixed strategy (pick a combination of strategies with each strategy having a probability of being picked)
- in addition, there is something called "expected payoff", which is the payoff you would expect given your strategy
- if both players are playing a pure strategy, then the expected payoff is exactly the payoff you'd get
- if both players are playing a mixed strategy, then the expected payoff is the average payoff over a long (infinite) amount of games
- for this specific example, assuming that all you care about is effective base power, the nash equilibrium would be for player 1 to always pick pursuit and for player 2 to always pick shock wave
- however, effective base power is not the only measure of a good choice in pokemon
- if we look at pokemon hypothetically, the only reward of a battle is winning (excluding side effects like levelling up)
- so if we look at a hypothetical turn before the battle ends (assuming that battles are finite), you either win or lose
- the expected payoff of "win" or "lose" is either absolute, or an average. therefore it represents the probability of winning
- now with our probability of winning this turn, we can go back to the turn before, which each cell would have their own probabilities
- thus, we can work our way back all the way to turn 1 to get the probability of winning
- note that this is somewhat of an oversimplification, but I think the general concept is correct
- of course, you wouldn't be able to search deep enough to "all possible endgame states" as you'd run out of memory
- but this idea tells us that hypothetically, the payoffs are probabilities of winning
- of course, the issue is that estimating the probabilities of winning from turn 1 is really hard and the probabilities might be around 50% for each action
- and even if there was a way to get very accurate estimations of probabilities, just picking probabilities isn't the only key thing for a good AI. exploitation is what a good AI does
- for example, there are rock paper scissors AIs which get a higher win % than what would be expected (33% win, 33% tie, 33% lose), because it exploits the fact that humans are terrible random number generators
- e.g. try playing 50 rounds of rock paper scissors (WITHOUT USING A RANDOM NUMBER GENERATOR): https://www.afiniti.com/corporate/rock-paper-scissors
- also, human players are not necessarily rational, are not able to evaluate a large number of options, and we don't necessarily know the most optimal estimation strategy
- therefore, an AI that exploits would recognize common patterns that human players do and factor that into calculations
- for example, you can assume that on turn 1, if the player has a statusing move (e.g. toxic) then they are likely to use it. made-up probabilites could be 75% status move, 25% damaging move for the player
- an even better AI would profile the player to use their past actions as an indicator of what they would do
- one way to do this would be to create preset profiles of how most pokemon players play games, and try to guess which profile the player matches the most
- note that I don't have an algorithm to determine the best probabilities to pick a move given that we know how the player will choose their move, but I am guessing that it exists somewhere and is simple to implement

- so the tl;dr is:
- make a payoff matrix given every possible action of the player and AI, and guess the probability of winning for each combination of strategies
- find the nash equilibrium of the payoff matrix
- out of the possible strategies, choose each strategy with the probability provided

- even more simplified tl;dr:
- do something like the image except the numbers are "probability of winning"
- the numbers don't necessarily need to be "probability of winning", if you find something that works better then use it

sample code for the algorithm (no context so it probably won't help unless if you already have a basic understanding of game theory): https://gist.github.com/luckytyphlosion/e7c520d34dd7db6fa904d02df44a8205

Textbook referenced: https://www.math.ucla.edu/~tom/Game_Theory/mat.pdf
Algorithm taken from "4.7 Approximating the Solution: Fictitious Play"