7.3 Delimiters
7.3.1 Record Delimiter
Each Sales/Usage Report created in accordance with the DSR standard shall be separated into individual Records with each Record being placed into one line terminated by a line feed (Unicode U+000A) or a carriage return and line feed pair (Unicode U+000D 000A).
7.3.2 Primary Delimiter
Cells within a Record are separated by tab characters (Unicode U+0009). The Sales/Usage Reports created in accordance with the DSR standard are therefore TSV files and have a .tsv file extension.
7.3.3 Secondary Delimiter
Should a single Cell contain two or more data elements, these data elements shall be separated by a pipe character (Unicode U+007C).
All data elements in a multi-value Cell shall be of the same primitive data type (see Clause 7.4).
7.3.4 Namespace delimiter
Should a Cell contain a data element whose origin needs to be provided, the data element shall be preceded by a string that provides a "namespace" and two colon characters (Unicode U+003A).
For example a party identifier can be communicated as ISNI::0000000081266409
, indicating that the identifier (0000000081266409) is an International Standard Name Identifier (ISNI).
The licensee creating a Sales/Usage Report in accordance with the DSR standard should ensure that the licensor can, for each specific namespace, ingest data in this form.
7.3.5 Spaces and Delimiters
Delimiters shall not be surrounded by extra space characters.
For example, the writer pair Lennon/McCartney should be communicated as Lennon|McCartney
and not as Lennon⎵|⎵McCartney
.
7.3.6 Received spaces and Delimiters
If a licensee compiling a Sales/Usage Report created in accordance with the DSR standard has received data from a third party that needs to be include in that Sales/Usage Report, which contains extra white spaces, they are encouraged to trim any such extra white space characters when compiling the Sales/Usage Report. For example, if the licensee received data with the writer, Lennon as “Lennon⎵
“ and McCartney as “McCartney⎵
“, then the writer pair should be communicated by the sender as Lennon|McCartney.
However, it is also permitted, for a licensee that received data with the writers Lennon as “Lennon⎵
” and McCartney as “McCartney⎵
”, to communicate the writer pair as Lennon⎵|McCartney⎵
if the licensee is required to provide data “as received” from third parties.
7.3.7 Communicating Delimiters
To communicate a Delimiter as data in a Cell, such a Cell shall not be enclosed in double quote characters. Instead the Delimiter shall be immediately preceded by an escaping code as follows:
To escape a tab character contained in a text string, the escaping code is the backslash character (Unicode U+005C). Therefore, the string A[TAB]B would have to be communicated as
A\[TAB]B
(with [TAB] representing the tabulator);To escape a pipe character contained in a text string, the escaping code is a double backslash character (Unicode U+005C). Therefore, the string A|B would have to be communicated as
A\\|B
; andTo communicate a backslash character, the escaping code is a triple backslash character. Therefore, the string A\B would have to be communicated as
A\\\\B
.
These escaping mechanisms must be used for all special characters in all Cells, whether those Cells allow multiple values or not. A non-escaped pipe character in a single-value Cell is, consequently, an error.
For the avoidance of doubt, escaping a character that should not be escaped, or not escaping a character that should have been escaped, will lead to an invalid Sales/Usage Report.