Each year members submit to Smarter Balanced their student test results and student responses to items. The minimum requirement is for de-identified data; members may elect to send identified data. Smarter Balanced uses these data to maintain and improve the assessments including calibrating new items and producing technical reports.
When submitting de-identified data, members should include in the data an Alternate State Student ID (AlternateSSID) that uniquely identifies each student and is consistent from year to year. This alternate ID will be used in future years to compare student performance with previous years’ data and to measure growth.
To preserve student privacy, it should not be possible for anyone to derive the State Student ID (SSID) from the AlternateSSID. Smarter Balanced has recommended the use of the HMAC-SHA1 cryptographic hashing algorithm for this purpose. The algorithm can be applied in various ways that are all equally effective in terms of generating consistent AlternateSSIDs and preserving student privacy. A small variation in implementation, however, will result in different AlternateSSIDs. This document and the accompanying source code sample describe one effective application of the algorithm. It is not a requirement but following this recommendation will result in consistent IDs year over year.
Keyed Cryptographic Hash
A keyed cryptographic hash function accepts a string of data (the State Student ID) and a secret key. From these inputs it produces a hash code. This can be used for the AlternateSSID in de-identified data sets.
The same SSID and secret key will always generate the same AlternateSSID. However, it is computationally impractical to reverse the hash and gain the SSID back, even if the secret key is known. Thus, an entity would have to have both the secret key and the entire roster of all unencrypted student IDs to be able to match a de-identified record to a student ID.
Most contemporary programming environments have a cryptographic library with an implementation of the HMAC-SHA1 algorithm. The HMAC algorithm is described in RFC 2104 and the SHA-1 algorithm is described in RFC 3174.
For this application both the secret key and the SSID are assumed to be Unicode strings of arbitrary length. Here are the steps to the recommended application:
- Remove leading and trailing whitespace characters from the secret key.
- Encode the secret key as a series of bytes using UTF-8 encoding. A byte-order mark must not be present. (Note that certain libraries such as the Microsoft .NET Framework include the byte-order mark by default and this must be suppressed.)
- Hash the secret key using the SHA-1 algorithm. Any terminating null character must not be included in the data being hashed. This results in a 160-bit byte value in the form of a 20-byte array. This will be used as the binary key.
- Remove leading and trailing whitespace characters from the State Student ID (SSID).
- Encode the SSID as a series of bytes using UTF-8 encoding. A byte-order mark must not be present and a terminating null should not be included.
- Apply the HMAC-SHA1 keyed-hash algorithm to the encoded SSID using the binary key from step 3. This results in a 160-bit binary value in the form of a 20-byte array.
- Convert the hash into a 40-character upper-case hexadecimal string.
Smarter Balanced has built an open source sample implemented in Microsoft C#. It is available here. It makes use of the Microsoft cryptographic library. Other platforms have equivalent libraries. For example, here is a similar algorithm provided by Amazon written in Java.
The following data may be used to determine that an implementation is consistent with this description and the sample code. Other approaches may be equally secure but will not match these results.
|BB-8||The Force Awakens||9F5685FB73F7315EA0707202F1B54FAC973875B3|