Neuroinformatics Research Group

Anonymization scripts

Besides allowing users to edit DICOM attributes by hand, DicomBrowser allows batch processing of attribute changes via anonymization scripts. Users can apply a script by using the "Apply anonymization script..." item in the "Edit" menu.

Anonymization scripts are plain text files containing commands in a simple language designed for modifying DICOM attributes. We recommend that script file names end in .das, since DicomBrowser looks for that suffix.

An attribute is identified by its 32-bit DICOM tag, expressed as two 16-bit hexadecimal numbers, separated by a comma and surrounded by parentheses—for example, (0008,0080) is the attribute tag for Institution Name.

There are two types of "phrase" in the language: operations specify what changes should be made to an attribute within a DICOM object, while constraints specify to which objects an operation should be applied.

Operations

An operation specifies a change to an attribute value. It may optionally be preceded by a constraint, which limits the objects to which the operation applies. There are two types of operation: assignment and deletion. (DicomBrowser also allows a Clear operation, which is just assignment to the empty value "".)

Assignment

Here is an example of assigning a new value to an attribute:

(0008,0080) := "Washington University School of Medicine"

The value string is actually a Java Formatter string. The least interesting consequence is that if you want to include a percent sign % in a value, you must "escape" it by preceding it with another percent sign—for example, "a percent sign: %%". The format string can also be used to specify a value that depends on attribute values in the same DICOM object:

(0008,1030) := "%1$s (%2$s)" (0020,0010), (0008,1030)

In this example, the Study Description (0008,1030) is assigned the value of Study ID (0020,0010), with the original Study Description following in parentheses.

Whatever DICOM Value Representation an attribute has, the format string should always treat it as a java.lang.String input, i.e., the last character of every format specifier must be s. Never put a comma between the format string and the first attribute value. Always list at least as many attributes after the format string as there are format specifiers, or the formatting machinery will throw an uncaught exception.

Deletion

An attribute can be removed entirely by preceding its tag with a minus sign:

- (0010,0020)

Constraints

A constraint limits the set of files to which an operation is applied. The constraint is followed by a colon (":"), then the operation to which the constraint applies. For example, the following code sets the Series Description based on the Series ID:

(0020,0011) = "1" : (0008,103E) := "Series One"
(0020,0011) = "2" : (0008,103E) := "Series Two"

In addition to exact value matches as above, constraints can use a tilde ~ to specify regular expressions (see the Java Pattern class) to which attribute values will be matched:

(0020,0010) ~ "\d" : (0008,1030) := "One digit study %1$s" (0020,0010)
(0020,0010) ~ "\d\d" : (0008,1030) := "Two digit study %2$s" (0020,0010)

Constraints can similarly be applied to deletion operations:

// delete the Series description for series 1-5
(0020,0011) ~ "[1-5]" : -(0080,103E)

Command precedence

If multiple assignments apply to the same attribute, the first applicable assignment in the script is used, and subsequent assignments are quietly ignored. In other words, operations appearing earlier in the file take precedence. For example, to assign an attribute a default value if no constraint is met, the default should appear last in the script:

(0020,0011) = "1" : (0008,103E) := "Series One"
(0020,0011) = "2" : (0008,103E) := "Series Two"
(0008,103E) := "Some other series"

If the default assignment were before the conditional assignments, the default assignment would always be made and the conditional ones ignored.

Other language features

Each command in a script file is normally a single line, but text can be continued to the next line by ending a line with a backslash \. Any text starting with two slashes // is a comment; the rest of that line is ignored.

Handling of syntax errors is primitive: while invalid scripts generally produce error messages, the messages tend to be inscrutable. We firmly recommend making no errors in your scripts.


Download a sample anonymization script that removes all attributes listed in the DICOM Basic Application Level Confidentiality Profile.

Report a bug or request a feature