datasets
[This list is incomplete]
Formatted datasets:Each datasets follow strictly the same formats (CSV and compact), detailed in the next sections. Compact format enables datasets usage in size-constrained environments (e.g. in browsers).
Note: Use raw files to use data not included in current formats.- CMU (Source) [Raw] [Same-Text / 51 users / 400 entries per users]
- GREYC-Keystroke (Source) [Raw] [Same-Text / 133 users / 5 to 107 entries per users]
- GREYC-NISLAB (Source) [Raw] [Same-Text / 110 users / 20 entries per users]
- GREYC-Web (Source) [Raw] [Same-Text / 103 to 117 users / 2 to 456 entries per users]
- meta-data: age, gender M/F
- laboratoire greyc [CSV] [Compact]
/!\\ 46 incoherent entries removed.
/!\\ 3 very long values truncated to 0x7FFF in compact - sésame [CSV] [Compact]
/!\\ 2639 incoherent entries removed.
- KPPDW
- Other datasets coming soon.
Other sources:
You can find Keystroke datasets on Vinnie Monaco's website. GREYC Keystroke datasets can be found on Christophe Rosenberger's website.
CSV format description
- First line: header
- Column 1: User's name
- Column 2: Typed text
- Column 3 to N: User meta-data
- Column N+2k+1: Press to Release durations [=A->a] (integer in ms)
- Column N+2k+2: Press to Press durations with next typed character [=A->B] (integers in ms)
Example:
UserID Text gender A->a A->B B->b
1 ab M 1 2 3
1 ab M 1 2 3
2 ab F 1 2 3
Compact format description
The formatted datasets follow this format (in UTF-16):- The typed text
- 0x0000
- Users meta-data array (JSON)
- 0x0000
- For each users its number of entries (2-bytes integer)
- 0x0000
- For each users:
- For each user's entries:
- For each typed characters:
- Press to Release durations [=A->a] (2-bytes integers in ms)
- Press to Press durations with next typed character [=A->B] (2-bytes integers in ms)
Example:
ab # typed text.
0x0000 # Beginning of users meta-data.
[{"gender":"M"},{"gender":"F"}] # 2 users, user 1 is male, user 2 is female.
0x0000 # Beginning of users number of entries.
0x0002 # 2 entries for user 1.
0x0001 # 1 entry for user 2.
0x0000 # Beginning of users entries.
0x0001 0x0002 0x0003 # User 1 first entry.
0x0001 0x0002 0x0003 # User 1 second entry.
0x0001 0x0002 0x0003 # User 2 first entry.
# 0x0001 => A->a = Release(A) - Press(A) = 1ms
# 0x0002 => A->B = Press(B) - Press(A) = 2ms
# 0x0003 => B->b = Release(B) - Press(B) = 3ms