Poker Copilot V2 Data Architecture

Continuing on with my analysis of V2 requirements, some thoughts on data handling. The first piece of the puzzle is a data hub, the table(s) containing the raw, parsed input. In some scenarios there could be multiple hub tables, such as one for cash games and another for tournament games. While a single table is simple to implement, splitting things out might be more efficient in terms of downstream query processes.

Some aspects of tournament play require additional columns for data like starting stack size and blind speed parameters and all that might best be stored in a tournament-specific table. Additionally, should you decide to add Omaha Hi and Omaha Hi/Low games in V3 you would definitely need to keep the raw data in distinct tables. So it makes sense to me to begin implementing that base architecture for the parse slurp now. The complication comes when you want to analyze profitability across all games, but that's really a rather trivial join across several summary views or tables.

Parsing comes in three flavors: program data initialization, game play hand addition, and data import from other sources, such as the files you can buy from Player Table Ratings. One caveat about supporting the import of external data files is that it may create an issue with Full Tilt and Poker Stars. Both sites' terms of service specify that a player may only use data from their own play sessions.

The program data initialization mode should begin with a simple parse and write process. The user should be warned at initial program startup to close all active game sessions and shut down FT / PS before continuing. From my experience with V2 so far, it appears that triggering a mass data update at the same time as you are capturing changes to active hand files causes the system to overload and freeze.

After all the files have been parsed and imported you then begin the query process to build the HUD stat table or tables. Separate data sets for cash and tourney stats (and play money stats if you choose to support it) should be implemented. During any mass load process those tables should only be updated after the data slurp is completed. During game play those tables update after each new hand parse, with the user having access to frequency settings in a preference pane similar to that of V1. Warn the user not to start playing until the process is complete whenever you find a bunch of new hand files to import during startup checks.

Data handling for Game Analysis should use a similar incremental build approach. The process steps are selecting the Game Analysis Tab, which triggers a load / update sequence for the core analysis tables. Again, separate core tables for tourney, cash game, and play money data. Then, as each second-level Game Analysis tab is selected a query process runs to populate a reporting table for that specific screen. A table is generally better than a view for the reporting tables in the sense that a user would only have to wait for a specific query to run once during each analysis session. These sort of view / table data architecture decisions would be best made during alpha testing, using a data set of at least 100,000 hands. Be prepared to go to the well multiple times to get a really good handle on it lolz.

As for Hand Replay, that seems to me to be entirely different in terms of scope and process. Rather than using the static database it seems better to throw up a standard file selection popup and have the user select the hand history file they want to review. If you use that architecture then you do very quick slurp of the file and then trigger a video playback sort of screen to step through the session and hand. With that approach you avoid all the mess of having tied yourself to a static data architecture, which may in fact be modified from time to time thus is not really static at all. What you would lose with that approach is a more free-form data selection process, the ability to queue up all the hands played with a specific player, for example. Personally I would save those sorts of deep data hooks for V3, because I guarantee you that after working with V2 in production for a year you're going to significantly re-architect the data layer for V3 lolz.

Thanks for all your hard work!
 
happy I’m excited
Inappropriate?
1 person likes this idea

User_default_medium