27/04/2013

A case of mistaken identity


Getting started in software development is really easy, getting it right is really, really hard. The technical decisions we make early in the software development life cycle can pivot the success of a project.


You are tasked to develop e-mail template management (crud) functionality for a project that is to be used for the following purposes:
  • Predefined e-mail templates for use in notifications (ex. forgot password e-mail, welcome e-mail)
  • User defined e-mail templates (used for 'forwarding' specific information with optional attachments or as a base template for newsletters)
You decide to use a sequential identifier based on a seed value and increment as the primary key of your e-mail template table (T-SQL example = Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL). To discriminate between the two types of e-mail templates you add an 'Is Predefined' column with a default value of false (T-SQL example = IsPredefined BIT DEFAULT(0) NOT NULL).

In order to reference the predefined e-mail  templates from your code, you decide to add an 'Access Key' column to the table to represent a persistent 'key' that will be used as an enum key (ex. PredefinedEmailTemplate.UserForgotPassword). The side benefit of doing this is being able to refer to a specific e-mail template even though its content identifiers might change, ex.e-mail template name or the subject.

You then decide to create the PredefinedEmailTemplate enum via code generation with the 'Access Key' as the enum key and the primary key 'Id' as its value.

So far so dandy, during development whenever you need to reference an e-mail template you simply use the enum value to reference the 'Id' primary key for the e-mail templates table.

Later on you deploy this to production and all is going well - users are using this functionality and all seems well. Till you get tasked to add another predefined e-mail template.  You think 'sure, that's no problem' - not taking into consideration that production and development data differs and so will the e-mail templates table primary key.

Now you're sitting with a big problem.

This will cause the regeneration of the PredefinedEmailTemplate enumeration values to be invalid. The simplest workaround would be to insert the new e-mail templates on production and then force it to the use the assigned production identifiers locally for development.

Now you're sitting with an even bigger problem  - this 'workaround' will forever be required and is sure to be forgotten about. 

The correct way to fix this would be to assign a specific (non system-generated) identifier to all predefined e-mail templates and use that as the value of the enumeration locally. Going this route; you would still have to update all usages of the PredefinedEmailTemplate enumeration which could potentially be costly.

The general rule here is that you should never be using sequence based system-generated identifiers as predefined type identifiers.

Guidelines on entity Identifiers

Custom vs. Sequence based vs. UUID/GUID

Custom Identifiers
I'm of the opinion that custom identifiers should always be used in conjunction with system-generated identifiers.

There's essentially two types of custom identifiers:

Unique constraints
Unique constraints can be stand-alone (ex. e-mail address of a user) or unique in the specific context (ex. unique name of a sub-category in context of a category { CategoryId, Name }).

It is strongly advised to create context specific unique constraints as in the category/sub-category name where appropriate as I've found that it diminishes confusion from a user point of view.

Custom identifiers
Custom identifiers should contain implicit contextual references or provide some other value. 
In the case of system-generated custom identifiers, an additional sequence identifier is added to ensure uniqueness (ex. serialized invoice number -  {buyer short code}-{date}-{invoice number of buyer or invoice number of the current month). User generated custom identifiers usually require some form of suggestion to ensure uniqueness in order to limit frustration (ex. gmail suggesting account names).

Sequence based identifiers
In terms of system-generated identifiers, sequence based identifiers should be used most of the time. Some exceptions are valid such as a system-generated hash should be used for once-off-use access tokens as providing a sequential identifier has security concerns unless it's coupled with additional authentication (ex. having the user provide some form of personal information). 

Please note that additional authentication is usually preferred with once-off-use access tokens (ex. user reset password forms) even with system-generated hashes especially if the user already exists. As a guideline, consider liability - if the system-generated hash would allow the user to setup his account from blank then it's optional, if the goal is for the user to verify his personal information and choose credentials then some form of additional authentication is required.

Numbering sequence based identifiers
As a rule of thumb you should start numbering sequence based identifiers from 1 (ex. 1,2,3), due to the fact most programming languages consider zero as a default value for value types including enumerations (ea. if there's no default user, then the id should start at 1).

Predefined type' identifiers should be predefined

UUID/GUID
My opinion is that UUID/GUID's should be used when it's impossible or impractical to ensure a sequence. I would suggest that it be used as a last resort for the following reasons:

  • UUID/GUID is generally not supported in all programming languages/databases as an existing type and would need to be stored as a byte array/slice or as a string
  • UUID/GUID size is 16 bytes when stored via an existing type or as a byte array/slice. If stored as a string - depending on the encoding - it can be between 16 and 32 bytes.
  • UUID/GUID's should be ZBase32 encoded for URL's (to limit the characters for pattern matching purposes and ensure case insensitivity if the user would have to fill it in)
Thanks for reading, I hope that these guidelines will prove useful to you. Be sure to share your opinion via the comments or on the hacker news thread!