H10 Database Specification

Introduction
iriver H10 media database, which is necessary for playback and navigation in MUSIC mode, consists of a number of files: H10DB.hdr, H10DB.dat, and H10DB_*.idx (typically H10DB_@DEV.idx, H10DB_FPTH.idx, H10DB_FNAM.idx, H10DB_FRMT.idx, H10DB_TIT2.idx, H10DB_TPE1.idx, H10DB_TALB.idx, H10DB_TCON.idx, H10DB_@DU3.idx, H10DB_@DU4.idx, H10DB_@DU5.idx, H10DB_TRCK.idx, H10DB_@DU1.idx, and H10DB_@DU2.idx). These files are generated by EasyH10, iriver Plus, or Windows Media Player 10 and stored in \System\DATA folder in H10 player's hard disc drive. Even H10 player updates media database during the playback to store dynamic information such as recent playback time, user ratings, and playback counts.

H10DB.dat is a list of records that store music information. Each record has 24 (= 2 + 22) fields such as track title, artist, album, genre, year, genre, rating, recent play time, play count, etc. The MUSIC mode of iriver H10 player does not read music information from actual files but relies only on these records for displaying music information. When a user listens to the music with a player, the number of records increases because, whenever a music file is about to be played, H10 player duplicates the record for the file, adds the duplicated new record into the list, and inactivate the original record. Therefore, H10DB.dat may contain a number of inactive records.

H10DB.hdr stores general information among H10 media database such as: the number of records in H10DB.dat; the number of inactive records; path to H10DB.dat; offsets to all records and fields in H10DB.dat (for quick access); and field descriptor that defines 22 fields in H10DB.dat record. The structure of H10DB.hdr slightly differs depending on the firmware type (UMS or MTP) and HDD capacity (5/6GB or 20GB).

H10DB_*.idx (typically 14 files) store sorted indexes that point to H10DB.dat records. Each file has a different sorting order. For example, H10DB_FPTH.idx sorts records in alphabetical order of file paths; H10DB_TIT2.idx in alphabetical order of track titles; H10DB_TPE1.idx in alphabetical order of track artists; H10DB_@DU3.idx in numerical order of user ratings; H10DB_TRCK.idx in numerical order of track number, etc. These sorted indexes are used for displaying music tracks in different orders: e.g., alphabetical order of track title/artist/album; and numerical order of track number in an album. Since the H10 firmware seems to implement an efficient search method such as binary search, all indexes must be arranged in proper orders as the firmware assumes. If not, a player may lose sight of some music files, show duplicated entries, or show wrong information because of the inconsistency.

This document describes the structure of H10 media database. Since the information is obtained from reverse engineering and analysis, it may not be correct. We have already developed a tool (EasyH10 CUI version with -D option) to dump the content of a database in a text. If you are interested, we suggest you to use it to view an actual database at hand.

Typedefs
This document defines four types for explanation. Values in uint16_t, uint32_t, and ucs2_char_t are stored in little endian byte-order.


 * uint8_t : 8 bit unsigned integer
 * uint16_t : 16 bit unsigned integer (little endian)
 * uint32_t : 32 bit unsigned integer (little endian)
 * ucs2_char_t : 16 bit UCS-2 character (little endian)

Constants
The document uses the following constants.


 * 1) define	H10DB_MAX_PATH		256
 * 1) if	H10_MODEL == INTERNATIONAL			/* H10 International models. */
 * 2) define	H10DB_HDR_PADDING	1032
 * 3) elif	H10_MODEL == NORTHAMERICAN			/* H10 North American models. */
 * 4) define	H10DB_HDR_PADDING	0
 * 5) endif
 * 1) if	H10_CAPACITY == 5000 || H10_CAPACITY == 6000	/* H10 5GB and 6GB models. */
 * 2) define	H10DB_MAX_DAT_ENTRIES	4000
 * 3) elif	H10_CAPACITY == 20000				/* H10 20GB models. */
 * 4) define	H10DB_MAX_DAT_ENTRIES	8000
 * 5) endif

H10DB.hdr
H10DB.hdr stores general information among H10 media database such as: the number of records in H10DB.dat; the number of inactive records; path to H10DB.dat; offsets to all records and fields in H10DB.dat (for quick access); and field descriptor that defines 22 fields in H10DB.dat record. The structure of H10DB.hdr slightly differs depending on the firmware type (UMS or MTP) and HDD capacity (5/6GB or 20GB).

Definition
typedef struct { uint32_t	id; uint32_t	field_type; uint32_t	max_length; uint32_t	unknown5; uint32_t	unknown6; uint32_t	has_index; uint32_t	unknown7; uint32_t	unknown8; ucs2_char_t	idx_pathname[H10DB_MAX_PATH]; } h10db_fd_t; typedef struct { uint32_t	unknown1; uint32_t	unknown2; ucs2_char_t	pathname_dat[H10DB_MAX_PATH]; uint32_t	unknown3; ucs2_char_t	pathname_hdr[H10DB_MAX_PATH]; uint32_t	unknown4; uint32_t	num_dat_records; uint32_t	num_dat_inactive_records; uint32_t	num_dat_fields; h10db_fd_t	fd[num_dat_fields]; uint32_t	max_dat_field_offsets[num_dat_fields]; uint32_t	dat_size; uint8_t		padding[H10DB_HDR_PADDING]; uint16_t	dat_field_offset[num_dat_fields][H10DB_MAX_DAT_ENTRIES]; uint32_t	dat_record_offset[H10DB_MAX_DAT_ENTRIES+1]; } h10db_hdr_t; h10db_hdr_t hdr;

h10db_hdr_t

 * unknown1 : an integer value reserved as zero.
 * unknown2 : an integer value reserved as zero.
 * pathname_dat : a UCS-2 null-terminated string that specifies the path and file name of H10DB.dat. The typical value of this field is "System\DATA\H10DB.dat".
 * unknown3 : an integer value reserved as one.
 * pathname_hdr : a UCS-2 null-terminated string that specifies the path and file name of H10DB.hdr. The typical value of this field is "System\DATA\H10DB.hdr".
 * unknown4 : an integer value reserved for an unknown purpose. The typical value of this field is 0x00000428 (for the international players) or 0x00000D10 (for the North Americal players).
 * num_dat_records : an integer value that indicates the number of records in H10DB.dat. This figure includes inactivated records.
 * num_dat_inactive_records : an integer value that indicates the number of inactive (unused) records in H10DB.dat. The number of actual music files is calculated by (num_dat_records - num_dat_inactive_records).
 * num_dat_fields : an integer value that specifies the number of fields in an database entry. The value seems to be fixed as 22.
 * fd : an array of 22 field descriptors. See the semantics of field descriptor (h10db_fd_t).
 * max_dat_field_offsets : an array of 22 integer values. Given an integer value i, max_dat_field_offsets[i] indicates the latest offset position of field #i within a record. These values can be calculated as follows:
 * max_dat_field_offset[0] = 8;
 * max_dat_field_offset[i+1] = max_dat_field_offset[i] + (fd[i].field_type == 1 ? fd[i].max_length*sizeof(ucs2_char_t) : fd[i].max_length);
 * The international H10 players have the following values:
 * max_dat_field_offset = {8, 12, 268, 524, 528, 608, 688, 768, 808, 812, 816, 820, 824, 828, 832, 836, 840, 844, 848, 852, 932, 936};


 * dat_size : an integer value that indicates file size of H10DB.dat
 * padding : filled by zero for the international players.
 * dat_field_offset : a two-dimentional array of integer values that indicate offset positions of fields for each record. See the explanation in dat_record_offset.
 * dat_record_offset : an array of (H10DB_MAX_DAT_ENTRIES+1) integer values that indicate offset position for each record. Given an integer value i no larger than num_dat_records, hdr.dat_record_offset[i] represents the offset position where record #i begins in H10DB.dat. hdr.dat_record_offset[num_dat_records] must be equal to dat_size. Given an integer value j, (hdr.dat_record_offset[i] + hdr.dat_field_offset[i][j]) represents the offset position where field #j of record #i begins in H10DB.dat.

h10db_fd_t

 * id : an integer value that represents an identifier of a field.
 * field_type : an integer value that specifies the field type (1: UCS-2; 2: uint32_t). This value also indicates calculation method of check value used in H10DB_*.idx (1: CRC-32; 2: copy of actual value).
 * max_length : an integer value that specifies the maximum size or length of the field. If (field_type == 1), this value must be interpreted as maximum number of UCS-2 characters. If (field_type == 2), it must be interpreted as size in bytes. Therefore, if (field_type == 1), multiplying it by sizeof(ucs2_char_t) gives the size in bytes.
 * unknown5 : an integer value reserved as zero.
 * unknown6 : an integer value reserved as zero.
 * has_index : an integer value that implies the existence of sorted index (H10DB_*.idx) for this field.
 * unknown7 : an integer value reserved as zero.
 * unknown8 : an integer value reserved as zero.
 * idx_pathname : a UCS-2 null-terminated string that represents a path/file name to an index file. This value is empty string if (has_index == 0).

The following code shows the actual field descriptor used in the international players. h10db_fd_t fd[] = { {0x0000F001, 2,  4, 0, 0, 1, 0, 0, L"System\DATA\H10DB_@DEV.idx"}, {0x0000F002, 1, 128, 0, 0, 1, 0, 0, L"System\DATA\H10DB_FPTH.idx"}, {0x0000F003, 1, 128, 0, 0, 1, 0, 0, L"System\DATA\H10DB_FNAM.idx"}, {0x0000F00A, 2,  4, 0, 0, 1, 0, 0, L"System\DATA\H10DB_FRMT.idx"}, {0x0000002E, 1, 40, 0, 0, 1, 0, 0, L"System\DATA\H10DB_TIT2.idx"}, {0x0000003C, 1, 40, 0, 0, 1, 0, 0, L"System\DATA\H10DB_TPE1.idx"}, {0x0000001C, 1, 40, 0, 0, 1, 0, 0, L"System\DATA\H10DB_TALB.idx"}, {0x0000001F, 1, 20, 0, 0, 1, 0, 0, L"System\DATA\H10DB_TCON.idx"}, {0x0000E002, 2,  4, 0, 0, 1, 0, 0, L"System\DATA\H10DB_@DU3.idx"}, {0x0000E003, 2,  4, 0, 0, 1, 0, 0, L"System\DATA\H10DB_@DU4.idx"}, {0x0000E004, 2,  4, 0, 0, 1, 0, 0, L"System\DATA\H10DB_@DU5.idx"}, {0x0000E005, 2,  4, 0, 0, 0, 0, 0, L""}, {0x00000043, 2,  4, 0, 0, 1, 0, 0, L"System\DATA\H10DB_TRCK.idx"}, {0x0000004E, 2,  4, 0, 0, 0, 0, 0, L""}, {0x0000F009, 2,  4, 0, 0, 0, 0, 0, L""}, {0x0000F007, 2,  4, 0, 0, 0, 0, 0, L""}, {0x0000F006, 2,  4, 0, 0, 0, 0, 0, L""}, {0x0000F005, 2,  4, 0, 0, 0, 0, 0, L""}, {0x0000E000, 2,  4, 0, 0, 1, 0, 0, L"System\DATA\H10DB_@DU1.idx"}, {0x0000E001, 1, 40, 0, 0, 1, 0, 0, L"System\DATA\H10DB_@DU2.idx"}, {0x00000083, 2,  4, 0, 0, 0, 0, 0, L""}, {0x00000084, 1, 64, 0, 0, 0, 0, 0, L""}, };

H10DB.dat
H10DB.dat is a list of records that store music information. Each record has 24 (= 2 + 22) fields such as track title, artist, album, genre, year, genre, rating, recent play time, play count, etc. The MUSIC mode of iriver H10 player does not read music information from actual files but relies only on these records for displaying music information. When a user listens to the music with a player, the number of records increases because, whenever a music file is about to be played, H10 player duplicates the record for the file, adds the duplicated new record into the list, and inactivate the original record. Therefore, H10DB.dat may contain a number of inactive records.

UCS-2 string in H10DB.dat has variable size (length) to save the disk space. The actual size (in bytes) should be calculated from hdr.dat_field_offset. For example, Given a record #i, the size of file_path (field #1) UCS-2 string field is calculated as the following:
 * (hdr.dat_field_offset[i][1+1] - hdr.dat_field_offset[i][1])

Similarly, the size of title (field #4) UCS-2 string should be calculated as the following:
 * (hdr.dat_field_offset[i][4+1] - hdr.dat_field_offset[i][4])

Definition
typedef struct { uint32_t	status; uint32_t	unknown1; uint32_t	unknown2;		/* field # 0 */ ucs2_char_t	file_path[];		/* field # 1 */ ucs2_char_t	file_name[];		/* field # 2 */ uint32_t	media_type;		/* field # 3 */ ucs2_char_t	title[];		/* field # 4 */ ucs2_char_t	artist[];		/* field # 5 */ ucs2_char_t	album[];		/* field # 6 */ ucs2_char_t	genre[];		/* field # 7 */ uint32_t	rating;			/* field # 8 */ uint32_t	revision;		/* field # 9 */ uint32_t	recent_play;		/* field #10 */ uint32_t	unknown3;		/* field #11 */ uint32_t	number;			/* field #12 */ uint32_t	year;			/* field #13 */ uint32_t	filesize;		/* field #14 */ uint32_t	duration;		/* field #15 */ uint32_t	samplerate;		/* field #16 */ uint32_t	bitrate;		/* field #17 */ uint32_t	unknown4;		/* field #18 */ ucs2_char_t	unknown5[];		/* field #19 */ uint32_t	unknown6;		/* field #20 */ ucs2_char_t	unknown7[];		/* field #21 */ }; h10db_dat_t dat[hdr.num_dat_records];

Semantics

 * status : an integer that specifies whether this element is active (0) or not (1).
 * unknown1 : an integer value reserved as zero.
 * unknown2 : an integer value reserved as zero.
 * file_path : a UCS-2 string (variable-length) that represents path to a music file.
 * file_name : a UCS-2 string (variable-length) that represents name of a music file.
 * media_type : an integer value reserved as zero.
 * title : a UCS-2 string (variable-length) that represents track title.
 * artist : a UCS-2 string (variable-length) that represents track artist.
 * album : a UCS-2 string (variable-length) that represents track album.
 * genre : a UCS-2 string (variable-length) that represents track genre.
 * rating : an integer value that indicates rating set by a user.
 * revision : an integer value that represents the play count.
 * recent_play : an interger value that represents recent playback time. This value is elapsed seconds from "Sat Jan 01 00:00:00 2000".
 * unknown3 : an integer value reserved as zero.
 * number : an integer value that represents track number.
 * year : an integer value that represents track year.
 * filesize : an integer value that represents file size in bytes.
 * duration : an integer value that represents track length in seconds.
 * samplerate : an integer value that represents sample rate in [Hz].
 * bitrate : an integer value that represents bitrate in [bps].
 * unknown4 : an integer value reserved as zero.
 * unknown5 : a UCS-2 string (variable-length) used for unknown purpose. The typical value is empty string.
 * unknown6 : an integer value reserved as zero.
 * unknown7 : a UCS-2 string (variable-length) used for unknown purpose. The typical value is empty string.

H10DB_*.idx
H10DB_*.idx (typically 14 files) store sorted indexes that point to H10DB.dat records. Each file has a different sorting order. For example, H10DB_FPTH.idx sorts records in alphabetical order of file paths; H10DB_TIT2.idx in alphabetical order of track titles; H10DB_TPE1.idx in alphabetical order of track artists; H10DB_@DU3.idx in numerical order of user ratings; H10DB_TRCK.idx in numerical order of track number, etc. These sorted indexes are used for displaying music tracks in different orders: e.g., alphabetical order of track title/artist/album; and numerical order of track number in an album. Since the H10 firmware seems to implement an efficient search method such as binary search, all indexes must be arranged in proper orders as the firmware assumes. If not, a player may lose sight of some music files, show duplicated entries, or show wrong information because of the inconsistency.

The following table shows a list of index files and respective sorting order and check_value type obtained from H10 [5GB] International firmware 2.05. This table can be obtained from field descriptor in H10DB.hdr.

Definition
Each H10DB_*.idx file consists of an array whose element is described as h10db_idx_t: typedef struct { uint32_t status; uint32_t entry_index; uint32_t check_value; } h10db_idx_t; h10db_idx_t idx[hdr.num_dat_records];

Semantics

 * status : an integer value that specifies whether this element is active (0) or not (1).
 * entry_index : a zero-based index value (integer) that points to a record in the media database (i.e., dat[entry_index]).
 * check_value : an integer value that stores either CRC-32 (for UCS-2 string field) or actual value (for uint32_t field) of a relevant field in a record.

How to calculate CRC-32 value
CRC-32 value of UCS-2 string is calculated with the following parameters: Before calculating a CRC value, all UCS-2 characters in a string must be lower-cased.
 * 1) define POLYNOMIAL            0x04C11DB7
 * 2) define INITIAL_REMAINDER     0x00000000
 * 3) define FINAL_XOR_VALUE       0x00000000
 * 4) define REFLECT_DATA          1
 * 5) define REFLECT_REMAINDER     1
 * 6) define CHECK_VALUE           0x2DFD2D88

How to lower-case UCS-2 characters
There is an ambiguity for converting upper-case characters into lower-case characters in terms of the treatment of European characters (e.g., &Agrave; and &agrave; &Delta; and &delta; ) As the result of our experiments, EasyH10 implements the following function to convert only US-ASCII characters.

ucs2_char_t ucs2lower(ucs2_char_t ch) { 	/* iriver only converts what-is-called US-ASCII characters. */ 	if (!(ch & 0xFF80)) { ch = tolower(ch); } 	return ch; }

Note for sorting UCS-2 strings
The following function (ucs2comp) implements a comparison function used for sorting UCS-2 string values. In addition to converting lower-case characters to upper-case, we must move empty string latter than any other values.

ucs2_char_t ucs2upper(ucs2_char_t ch) { 	/* iriver only converts what we call one-bytes characters. */ 	/* We must limit the range of case conversion. */ 	if (!(ch & 0xFF80)) { ch = toupper(ch); } 	return ch; } int ucs2icmp(const ucs2_char_t* x, const ucs2_char_t* y) { ucs2_char_t a, b; 	do { a = ucs2upper(*x); b = ucs2upper(*y); if (!*x || !*y) { break; } 		x++; y++; } while (a == b); return COMP(a, b); } int ucs2comp(ucs2_char_t* x, ucs2_char_t* y) { /* It seems to be safer to move elements latter that have empty value. */ 	/* We must ignore case during sorting, or H10 fails to recognize music files. */ 	if (!x || !*x) { return ((!y || !*y) ? 0 : 1); } else { return ((!y || !*y) ? -1 : ucs2icmp(x, y)); } }
 * 1) define	COMP(a, b)	((a)>(b))-((a)<(b))

Acknowledgement
I thank Badger for the structure information of H10DB.hdr, Toby Corkindale for the experimental implementation of database construction written in Perl (iriver.pm), and Jevin for the CRC calculation in a MisticRiver thread, H10 database reverse ingeneering.

License
This document was initially written by Nyaochi, an author of EasyH10 software. It is released under GNU Free Documentation License (GFDL).