You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Drew Bednar bdd2c689f5 Fix resources 2 days ago
.gitignore Adding initial project structure and README notes 2 days ago
.python-version Adding initial project structure and README notes 2 days ago
README.md Fix resources 2 days ago
main.py Adding initial project structure and README notes 2 days ago
pyproject.toml Adding initial project structure and README notes 2 days ago

README.md

Learn Game AI

I've always wanted to know how bots for simple games are developed. We don't always have a friend to play with us, and a good bot can define the user experience for a game. The focus of this learning is "How to build bots we play against".

Common Methods

This is a such a large space and these are just the methods I have happened to come across. Submit a PR with additional information if you would like to clarify or exand this section.

Simple Rules Based

The set of instructions that the bot follows is manually coded. It doesn't "think" ahead, and only reacts to the current board. This assumes we can write down a set of rules of what the bot should do in a particular state, but we also need to be able to query what state they will be in after taking an action.

  • Random Bot: The bot looks at all legal moves and picks one at random.
  • Greedy Bot: The bot evaluates the immediate result of every legal move (e.g., "Which move gains me the most points right now?") and picks the best one.
  • Weighted Heuristics: You assign values to specific conditions. For example, in Monopoly, a bot might have a rules:
    if (landed_on_unowned_property AND cash > property_cost + 500):
        buy_property()
    elif (own_monopoly AND cash > house_cost):
        buy_house()
    elif (in_jail):
        pay_fine()
    else:
        end_turn()
    
  • Scripted/Pattern-Based Bot: Uses predefined strategies or "playbooks." For example, in a deck-building game, it might follow "Always buy cards costing 5+ if you have the money." More sophisticated than random, but still no lookahead.
  • Lookup Table Bot: Pre-computed best moves for common positions (used in solved or partially-solved games like Connect Four). Essentially a giant cheat sheet.

Best when:

  • Decisions are atomic (one choice per turn, no multi-step plans)
  • The game state is fully evaluated each turn anyway (like most board games)
  • Logic is simple enough that hierarchy doesn't add value
  • You want maximum transparency—easier to debug a 20-line if-else chain than a 50-node behavior tree

Intermediate

Attempts to calculate into the future. Classic AI used for chess, checkers, etc generally use these methods.

  • Minimax Algorithm: The bot assumes the player will play perfectly. It explores a "tree" of possibilities, trying to maximize its own score while assuming the player will try to minimize it.
    • Heuristic Evaluation Function: Some games, like chess, can't be simulated to the very end. Instead the bot stops at a certain depth and uses a formula to guess who is winning. For example, a bot simulated a chess game at a depth of 6 turns and uses a formula to determine which player is ahead based on the pieces left on the board. This method is best for turn-based games where both players see everything (aka "perfect information").
    • Alpha-Beta Pruning: An optimization for Minimax. It "prunes" branches of the tree that are clearly worse than paths already explored, allowing the bot to look much deeper into the future without slowing down.
  • Expectimax: Extension of Minimax for games with randomness (dice, card draws). Instead of assuming perfect opponent play, it calculates expected values over random events.
  • Negamax: A simplified implementation of Minimax that exploits symmetry in zero-sum games. Easier to code, same result.
  • Iterative Deepening: Runs Minimax/Alpha-Beta repeatedly at increasing depths (depth 1, then 2, then 3...) until time runs out. Ensures you always have some answer even with strict time limits.
  • Beam Search: Explores only the N most promising moves at each level, ignoring others. Much faster than full Minimax but can miss optimal moves.

Advanced

  • Behavior Trees: Hierarchical structure used in video game AI. The bot follows a tree of conditions and actions, allowing complex adaptive behavior without deep learning. You may notice this sounds like a rules based strategies from above. Many rules based methods can be thought of as simple shallow behavior trees. You would probably forgoe a behavior tree for most boardgames, unless you are building something like a 4x strategy game.
    • Bavavior Trees are best when:
      • You need stateful actions (multi-turn plans like "gather resources → build barracks → train units")
      • You want reusable components (the "flee when low health" subtree works for any unit type)
      • The game has interruptions (you're executing a plan, but a new threat appears—the tree can elegantly switch branches)
      • Designers (not just programmers) need to tweak AI behavior—visual tree editors are easier than code
  • Planning Algorithms (PDDL solvers, Hierarchical Task Networks): Used for games with complex action sequences or resource management. The bot builds explicit plans to achieve goals.
  • Monte Carlo Tree Search (MCTS). the algorithm is at a current state and runs many simulated "playouts" (Monte Carlo simulations) from that specific state to see which action looks best. This builds out a tree of possibilities. It uses these simulations to decide on the single next move in the real world. Once it takes that move and reaches a new state, it often throws away the old tree and starts the process over from the new state
    • when it learns: Only at the end of an episode.
    • data used: Actual returns from the whole trip.
    • environment: Needs episodic tasks (with an end)
    • Decision Focus: Global (averaging many long trips).

Deep Reinforcement Learning

  • Temporal Difference (TD):TD methods are ones that learn "online" or "step-by-step." They take an action, see the immediate reward and the new state, and immediately update their knowledge. They "bootstrap," meaning they use their current guess of the next state's value to update the current state's value. Unlike minimax, which requires a person to write a formula to score the board, temporal difference "discovers" the score on it's own. It plays a game, realizes a certain board state usually leads to a loss, and lowers that state's score automatically.
    • when it learns: After every single step.
    • data used: Immediate reward + a guess of the future.
    • environment: Can work on "infinite" or continuous tasks.
    • Decision Focus: Local (optimizing the very next transition).
    • Algoriths:
      • Q-Learning :Off-policy TD method. Learns the value of state-action pairs. Works well for discrete action spaces.
      • SARSA: On-policy TD method. More conservative than Q-Learning.
      • TD(λ)

Source:[Learning to Play, Imitate and Collaborate with Pesky Humans]

Resources