Immutable keys: Best practice for data access

When first encountered, the terms mutable and immutable sound like words you know only to improve our Scrabble game, but behind them hides some important concepts.

  • mutable
    • Changing over time
  • immutable
    • unable to be changed
Woolly hat, using laptop

Most data in your program will be mutable, or if it isn’t it doesn’t impact you. In other cases, it is useful to know you can count on the value of a particular field to never change, because then you can use it as part of the identity of the data.

Identity versus Filtering

If you have large amount of data, you need some way to traverse it, to find the specific information you are interested in, to update based on a scrict criteria, etc.

In some cases, you can use filtering to find a useful subset of your data. If this filter is based on mutable data, there is no guarantee that over time this filter will return the same set of data. If instead, we can retrieve based on immutable fields, we know that time over time we will get the desired result. This minimum set of immutable fields to identify a specific data instances is often called the index or key.

Managing Immutable Keys

From the simplest perspective, if you ensure that the fields you are using as a key to your data can only be written when the data is created, you have achieved an immutable field. But what happens if I delete the row and add in a new one and use the same key? Is this the same data instance? Probably not.

So, for a key, not only do we want the key to be immutable, we want it to not be reused, at least within a usefully long period of time. To achieve this, we can rely on the client creating the data to do the right thing, or we can make the index field something completely controlled by our program, with little or no input from the client. Note that this can result in a field that is also not terribly meaningful to the user, but this can be offset by providing them an additional field for ‘foreign key’ or ‘name’, or whatever makes sense in a particular project

So, if we our program is keeping track of not only what the current set of used keys are, but those that have been used within a given timeframe, we have some options

  • literally keep track of what is used
  • have an algorithm based on stored information that can always generate a unique and not recently used index
    • last used integer, with rolling window size of 32 or 64 bits
    • timestamp of creation time to sufficient precision
    • a UUID algorithm like that proposed by the IETF in RFC4122