152 lines
5.8 KiB
Plaintext
152 lines
5.8 KiB
Plaintext
1. Title: Poker Hand Dataset
|
||
|
||
2. Source Information
|
||
|
||
a) Creators:
|
||
|
||
Robert Cattral (cattral@gmail.com)
|
||
|
||
Franz Oppacher (oppacher@scs.carleton.ca)
|
||
Carleton University, Department of Computer Science
|
||
Intelligent Systems Research Unit
|
||
1125 Colonel By Drive, Ottawa, Ontario, Canada, K1S5B6
|
||
|
||
c) Date of release: Jan 2007
|
||
|
||
3. Past Usage:
|
||
1. R. Cattral, F. Oppacher, D. Deugo. Evolutionary Data Mining
|
||
with Automatic Rule Generalization. Recent Advances in Computers,
|
||
Computing and Communications, pp.296-300, WSEAS Press, 2002.
|
||
- Note: This was a slightly different dataset that had more
|
||
classes, and was considerably more difficult.
|
||
|
||
- Predictive attribute: Poker Hand (labeled ‘class’)
|
||
- Found to be a challenging dataset for classification algorithms
|
||
- Relational learners have an advantage for some classes
|
||
- The ability to learn high level constructs has an advantage
|
||
|
||
4. Relevant Information:
|
||
Each record is an example of a hand consisting of five playing
|
||
cards drawn from a standard deck of 52. Each card is described
|
||
using two attributes (suit and rank), for a total of 10 predictive
|
||
attributes. There is one Class attribute that describes the
|
||
“Poker Hand”. The order of cards is important, which is why there
|
||
are 480 possible Royal Flush hands as compared to 4 (one for each
|
||
suit – explained in more detail below).
|
||
|
||
5. Number of Instances: 25010 training, 1,000,000 testing
|
||
|
||
6. Number of Attributes: 10 predictive attributes, 1 goal attribute
|
||
|
||
7. Attribute Information:
|
||
1) S1 “Suit of card #1”
|
||
Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
||
|
||
2) C1 “Rank of card #1”
|
||
Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
||
|
||
3) S2 “Suit of card #2”
|
||
Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
||
|
||
4) C2 “Rank of card #2”
|
||
Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
||
|
||
5) S3 “Suit of card #3”
|
||
Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
||
|
||
6) C3 “Rank of card #3”
|
||
Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
||
|
||
7) S4 “Suit of card #4”
|
||
Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
||
|
||
8) C4 “Rank of card #4”
|
||
Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
||
|
||
9) S5 “Suit of card #5”
|
||
Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
||
|
||
10) C5 “Rank of card 5”
|
||
Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
||
|
||
11) CLASS “Poker Hand”
|
||
Ordinal (0-9)
|
||
|
||
0: Nothing in hand; not a recognized poker hand
|
||
1: One pair; one pair of equal ranks within five cards
|
||
2: Two pairs; two pairs of equal ranks within five cards
|
||
3: Three of a kind; three equal ranks within five cards
|
||
4: Straight; five cards, sequentially ranked with no gaps
|
||
5: Flush; five cards with the same suit
|
||
6: Full house; pair + different rank three of a kind
|
||
7: Four of a kind; four equal ranks within five cards
|
||
8: Straight flush; straight + flush
|
||
9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush
|
||
|
||
|
||
8. Missing Attribute Values: None
|
||
|
||
9. Class Distribution:
|
||
|
||
The first percentage in parenthesis is the representation
|
||
within the training set. The second is the probability in the full domain.
|
||
|
||
Training set:
|
||
|
||
0: Nothing in hand, 12493 instances (49.95202% / 50.117739%)
|
||
1: One pair, 10599 instances, (42.37905% / 42.256903%)
|
||
2: Two pairs, 1206 instances, (4.82207% / 4.753902%)
|
||
3: Three of a kind, 513 instances, (2.05118% / 2.112845%)
|
||
4: Straight, 93 instances, (0.37185% / 0.392465%)
|
||
5: Flush, 54 instances, (0.21591% / 0.19654%)
|
||
6: Full house, 36 instances, (0.14394% / 0.144058%)
|
||
7: Four of a kind, 6 instances, (0.02399% / 0.02401%)
|
||
8: Straight flush, 5 instances, (0.01999% / 0.001385%)
|
||
9: Royal flush, 5 instances, (0.01999% / 0.000154%)
|
||
|
||
The Straight flush and Royal flush hands are not as representative of
|
||
the true domain because they have been over-sampled. The Straight flush
|
||
is 14.43 times more likely to occur in the training set, while the
|
||
Royal flush is 129.82 times more likely.
|
||
|
||
Total of 25010 instances in a domain of 311,875,200.
|
||
|
||
Testing set:
|
||
|
||
The value inside parenthesis indicates the representation within the test
|
||
set as compared to the entire domain. 1.0 would be perfect representation,
|
||
while <1.0 are under-represented and >1.0 are over-represented.
|
||
|
||
0: Nothing in hand, 501209 instances,(1.000063)
|
||
1: One pair, 422498 instances,(0.999832)
|
||
2: Two pairs, 47622 instances, (1.001746)
|
||
3: Three of a kind, 21121 instances, (0.999647)
|
||
4: Straight, 3885 instances, (0.989897)
|
||
5: Flush, 1996 instances, (1.015569)
|
||
6: Full house, 1424 instances, (0.988491)
|
||
7: Four of a kind, 230 instances, (0.957934)
|
||
8: Straight flush, 12 instances, (0.866426)
|
||
9: Royal flush, 3 instances, (1.948052)
|
||
|
||
Total of one million instances in a domain of 311,875,200.
|
||
|
||
|
||
10. Statistics
|
||
|
||
Poker Hand # of hands Probability # of combinations
|
||
Royal Flush 4 0.00000154 480
|
||
Straight Flush 36 0.00001385 4320
|
||
Four of a kind 624 0.0002401 74880
|
||
Full house 3744 0.00144058 449280
|
||
Flush 5108 0.0019654 612960
|
||
Straight 10200 0.00392464 1224000
|
||
Three of a kind 54912 0.02112845 6589440
|
||
Two pairs 123552 0.04753902 14826240
|
||
One pair 1098240 0.42256903 131788800
|
||
Nothing 1302540 0.50117739 156304800
|
||
|
||
Total 2598960 1.0 311875200
|
||
|
||
The number of combinations represents the number of instances in the entire domain.
|
||
|