27/04/2013

A case of mistaken identity


Getting started in software development is really easy, getting it right is really, really hard. The technical decisions we make early in the software development life cycle can pivot the success of a project.


You are tasked to develop e-mail template management (crud) functionality for a project that is to be used for the following purposes:
  • Predefined e-mail templates for use in notifications (ex. forgot password e-mail, welcome e-mail)
  • User defined e-mail templates (used for 'forwarding' specific information with optional attachments or as a base template for newsletters)
You decide to use a sequential identifier based on a seed value and increment as the primary key of your e-mail template table (T-SQL example = Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL). To discriminate between the two types of e-mail templates you add an 'Is Predefined' column with a default value of false (T-SQL example = IsPredefined BIT DEFAULT(0) NOT NULL).

In order to reference the predefined e-mail  templates from your code, you decide to add an 'Access Key' column to the table to represent a persistent 'key' that will be used as an enum key (ex. PredefinedEmailTemplate.UserForgotPassword). The side benefit of doing this is being able to refer to a specific e-mail template even though its content identifiers might change, ex.e-mail template name or the subject.

You then decide to create the PredefinedEmailTemplate enum via code generation with the 'Access Key' as the enum key and the primary key 'Id' as its value.

So far so dandy, during development whenever you need to reference an e-mail template you simply use the enum value to reference the 'Id' primary key for the e-mail templates table.

Later on you deploy this to production and all is going well - users are using this functionality and all seems well. Till you get tasked to add another predefined e-mail template.  You think 'sure, that's no problem' - not taking into consideration that production and development data differs and so will the e-mail templates table primary key.

Now you're sitting with a big problem.

This will cause the regeneration of the PredefinedEmailTemplate enumeration values to be invalid. The simplest workaround would be to insert the new e-mail templates on production and then force it to the use the assigned production identifiers locally for development.

Now you're sitting with an even bigger problem  - this 'workaround' will forever be required and is sure to be forgotten about. 

The correct way to fix this would be to assign a specific (non system-generated) identifier to all predefined e-mail templates and use that as the value of the enumeration locally. Going this route; you would still have to update all usages of the PredefinedEmailTemplate enumeration which could potentially be costly.

The general rule here is that you should never be using sequence based system-generated identifiers as predefined type identifiers.

Guidelines on entity Identifiers

Custom vs. Sequence based vs. UUID/GUID

Custom Identifiers
I'm of the opinion that custom identifiers should always be used in conjunction with system-generated identifiers.

There's essentially two types of custom identifiers:

Unique constraints
Unique constraints can be stand-alone (ex. e-mail address of a user) or unique in the specific context (ex. unique name of a sub-category in context of a category { CategoryId, Name }).

It is strongly advised to create context specific unique constraints as in the category/sub-category name where appropriate as I've found that it diminishes confusion from a user point of view.

Custom identifiers
Custom identifiers should contain implicit contextual references or provide some other value. 
In the case of system-generated custom identifiers, an additional sequence identifier is added to ensure uniqueness (ex. serialized invoice number -  {buyer short code}-{date}-{invoice number of buyer or invoice number of the current month). User generated custom identifiers usually require some form of suggestion to ensure uniqueness in order to limit frustration (ex. gmail suggesting account names).

Sequence based identifiers
In terms of system-generated identifiers, sequence based identifiers should be used most of the time. Some exceptions are valid such as a system-generated hash should be used for once-off-use access tokens as providing a sequential identifier has security concerns unless it's coupled with additional authentication (ex. having the user provide some form of personal information). 

Please note that additional authentication is usually preferred with once-off-use access tokens (ex. user reset password forms) even with system-generated hashes especially if the user already exists. As a guideline, consider liability - if the system-generated hash would allow the user to setup his account from blank then it's optional, if the goal is for the user to verify his personal information and choose credentials then some form of additional authentication is required.

Numbering sequence based identifiers
As a rule of thumb you should start numbering sequence based identifiers from 1 (ex. 1,2,3), due to the fact most programming languages consider zero as a default value for value types including enumerations (ea. if there's no default user, then the id should start at 1).

Predefined type' identifiers should be predefined

UUID/GUID
My opinion is that UUID/GUID's should be used when it's impossible or impractical to ensure a sequence. I would suggest that it be used as a last resort for the following reasons:

  • UUID/GUID is generally not supported in all programming languages/databases as an existing type and would need to be stored as a byte array/slice or as a string
  • UUID/GUID size is 16 bytes when stored via an existing type or as a byte array/slice. If stored as a string - depending on the encoding - it can be between 16 and 32 bytes.
  • UUID/GUID's should be ZBase32 encoded for URL's (to limit the characters for pattern matching purposes and ensure case insensitivity if the user would have to fill it in)
Thanks for reading, I hope that these guidelines will prove useful to you. Be sure to share your opinion via the comments or on the hacker news thread!


03/07/2012

TDD in practice: ABC's of TDD

Prelude

This is my humble opinion of TDD:
  • PRO: It's super easy to get equipped with a testing framework and learn the basics. 
  • CON: It's unbelievably hard in comparison to start unit testing and applying TDD. 
Hopefully, by now we have a common understanding of what we should be testing. (TDD in practice: Where does it fit in? - We can only test against known results and behaviours).

Let’s reiterate and approach the same conclusion from a different angle:

Method A: void SendEmail()
Analysis: The above method takes no parameters and returns no values. The only clue we have of its purpose and intended behaviour is the method name.
Conclusion: Not testable

Method B: int Sum(params int[])
Analysis: The above method takes parameters and returns a value. We can imply that because the parameters are numeric of nature and the methods' intention is to sum up these values - the expected result should be the sum of the parameters.
Conclusion: Easily testable

Although method A is not testable - be not discouraged - it doesn't imply it can never be.

Test Driven Development revolves around:
  1. Red: Write the test the way you want the implementation to work. 
  2. Green: Implement the functionality required to make the test pass (with a focus on loose coupling and high cohesion). 
  3. Refactor: Refactor your implementation and enforce Separation of Concerns. 
  4. Repeat
Relating to the above-mentioned:
  • (SoC)Separation of Concerns: The process of separating a computer program into distinct features that overlap in functionality as little as possible. 
  • (LC)Loose coupling: In computing and systems design a loosely coupled system is one where each of its components has, or makes use of, little or no knowledge of the definitions of other separate components. 
  • (HC)High cohesion: In computer programming, cohesion is a measure of how strongly-related each piece of functionality expressed by the source code of a software module is. 

I suggest everyone should follow these steps when applying Test Driven Development:

Step 1: Investigate and Plan
Our downfall in regards to TDD is that we've been conditioned via tutorials to think that applying TDD is simple and that we should jump head first into coding.

You have to start planning out the feature you want to implement.

I have found that enforcing the Single Responsibility Principle has allowed me to identify different components. The Liskov Substitution Principle makes for a good guideline to determine if an abstraction should be implemented as an interface or using the Template Method Pattern.

Once you have set out some behavioural guidelines (user stories) and have a pretty good idea of the functionality you have to implement and have identified reusable components - then you are ready to go.

Relating to above-mentioned:
  • (SRP)Single responsibility principle: Every class should have a single responsibility, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility. 
  • (LSP)Liskov substitution principle: Concrete implementations of abstractions should be interchangeable without altering any of the desirable properties of that implementation (correctness, task performed, etc.) 
  • Template method pattern: A template method defines the program skeleton of an algorithm. One or more of the algorithm steps can be overridden by subclasses to allow differing behaviours while ensuring that the overarching algorithm is still followed. 
  • User Story: A sentence that captures the context (who, what, where, when) and the expected behaviour (then). In agile development (XP,SCRUM etc.) the form is usually As a <user>-When <condition/s>-Then <expected result/s/behaviour/s>. In behaviour driven development (BDD) and acceptance test driven development (ATDD) the form is usually Given<context>-When<condition/s>-Then<expected result/s/behaviour/s>. 
Step 2: Red-Green-Refactor
I'd like to revisit Method A from the prelude.

If Method A was fully implemented within a single method, then it would have been in an untestable state.

We would approach this problem by refactoring in-scope functionality into testable methods and abstracting out-of-scope functionality into implementations that can be tested independently.

This allows us to test the method by testing the abstracted implementations and refactored in-scope methods the method to be tested uses.

Abstract out-of-scope functionality that can be tested independently
If the responsibility of certain functionality lies outside of the scope of the class (SRP), functionality should be refactored out of the method/class into a concrete implementation. The concrete implementation then needs to be abstracted with an interface or an abstract class using a template method pattern (DI, TMP).

This rule also highlights the Dependency Inversion Principle, which is a specific form of decoupling that's very useful while applying TDD. You can then write unit tests for said low level implementations and use Stubs/Fakes/Mocks within your higher level implementations (that depend on the abstraction) to verify behaviour of said high level components.

The implementation of the abstraction can then be supplied via the constructor or via parameters to the method (SP).

Relating to above-mentioned:
  • (DI)Dependency inversion principle: High level implementations should not depend on low level implementations. Both should depend on abstractions. (A high level implementation should depend on the abstraction of a low level implementation) 
  • (SP)Strategy pattern: The strategy pattern allows you to supply a strategy (behaviour) to a high level implementation at run-time. 
In hopes of not making this post too long, it will have to be cut short for now. It is evident that a good understanding is required of SOLID principles, GRASP and design patterns to properly apply Test Driven Development.

Test Driven Development is awesome - it promotes good coding practices and quality code. Once you know implementations adhere to desired behaviours, the fear of maintaining (by refactoring) and extending a system almost entirely disappears.

There are 2 advanced fields within TDD - Behaviour Driven Design and Acceptance Test Driven Design (ATDD) - albeit outside the scope of this post, carry great benefits. For example, ATDD tests double as confirmation that a specific feature is done - which serves as an asset within project management.

My opinion
I still consider TDD as an implementation detail - it's very useful within the context of its application (implementation of functionality).

There are other fields to explore, such as:
  • Architectural design (like CQRS, layering, distributed systems, client-server, online-offline) 
  • Project management (prince II, agile/scrum, xp) 
  • Program design (domain driven design, metadata driven design, model driven design, design by contract, AOP) 
Perhaps we are not intended to learn everything - but I believe we should know enough to fend for ourselves.

Unlike the common expression - a chain is only as strong as its weakest link - it's the average skill within a programming team that dominates the quality of implementation within a project.

20/06/2012

TDD in practice: Where does it fit in?

Lately I've been delving deeper into Domain Driven Design and Test Driven Development.

If you asked me two months ago; do you know and/or use test driven development? I'd say - yes, but I only write tests for algorithms or methods with expected results or behavior.

Since then, my view has changed.

The question I was asking was not how to do test driven development, but rather when?

Let's take a look at the well known rules for TDD set out by uncle Bob:
  1. You are not allowed to write any production code unless it is to make a failing unit test pass.
  2. You are not allowed to write any more of a unit test than is sufficient to fail; and compilation failures are failures.
  3. You are not allowed to write any more production code than is sufficient to pass the one failing unit test.
These are great rules. Too bad most developers don't know how to implement them.

It becomes pretty clear why, when we examine what I was implying two months ago.

We can only test against known results and behaviors.

We struggle to incorporate TDD into our coding practices because we are focused on code, instead of context.

Domain driven design gives a neat explanation of a domain; Whenever a context is implied, a boundary is formed.

Ubiquitous language implies that by classification, an appropriate name can be given for a domain entity that describes its context and purpose.


As an example, I'll describe an User within different contexts of healthcare.

A person that uses a system, is called a user.
A person that needs to be billed, is treated as an account. (Accounting)
A person who has clinical data, is considered a patient. (Clinical)
A user, may be a patient.
A user, may be a doctor.

Let's classify the above into User Roles (roles a user may fulfil) and Domain Entities
User Roles: patient, doctor
Domain entities: account, patient

The reason why we struggle to test domain entities is because we never define them.

Let's take a look at the average programmers model abstraction evolution:

In the beginning
UI -> Database

After we've learnt about presentation patterns
UI -> View Model/Presenter -> Database

After we've learnt about ORMs
UI-> View Model/Presenter -> Data Model -> Database

At this stage, another level of abstraction comes along if you're doing SOA
UI -> View Model/Presenter -> Data Transfer Object -> Data Model -> Database

Let's look at the last example in terms of context:

  • The UI defines the data of the View Model (or rather, the View Model contains the data for UI).
  • The DTO contains the data required by the View Model (the View Model also has UI specific properties like state unlike the DTO).
  • The Data Model represents the structure of how the object is stored in the database.
  • In a simple case, the DTO is effectively a partial 'view' of the Data Model.
The reason why we are struggling to practice TDD is because nowhere in this chain is anything represented in the domain (meaning we don't know where to apply the boundaries/rules).

We are effectively just mapping the one object to the other along the chain.

Let's add in the Domain Entity.
UI -> View Model/Presenter -> Data Transfer Object -> Domain Entity -> Data Model -> Database

What's interesting here is the fact that by simply adding the domain entity, we have given context to the entire chain. For example, the View Model/Presenter UI validation should mirror/conform to that of the Domain Entity.

The role TDD plays in DDD is that of verifying that the domain boundaries (or rules) are in place.

By defining boundaries, you create context. Context, in turn allows you to define behaviour which creates structure. Structure leads to reuse and better code.

Disclaimer: TDD in itself has many benefits and other use cases outside DDD.

In my opinion, the main reason why you should adopt TDD is because it takes the fear out of modifying and extending an existing system, because you can verify that everything works as it should.

We have learnt that just because a system builds successfully, doesn't mean it works the way it should.


17/04/2012

We build boxes

Today I will take a more philosophical approach to software development.

We as developers are faint to admit that our golden goose is building boxes.

Consider for a moment how many authentication and user management systems you have implemented.

Every one of them are similar in many ways - but never entirely reuseable.

Our golden goose is building boxes.

We take ideas and build a box for it.

Often clients will request something that the system wasn't design to do - because the new idea doesn't fit into the old box.

I guess what I'm trying to say here is that (designing the box) constraints are important.

In a sense we are always trying to build boxes for ideas. One for every part of the system that will be neatly stacked into the big box that is the final product.

Modular programming is rarely applied to general scenarios. The responsibility of a box (module) should be defined (even if barely) to keep in mind how it will fit into the big box.

There is no excuse not to plan your boxes and how they will to form your final product.

I think building boxes is a natural reaction to software change requests.

Despite the almost negative character I have portrayed regarding building boxes, they are definitely important.

I have seen (and experienced) many times that businesses suffer because of a lack of constraints in quoting projects for clients.

To define your box and sticking to it requires commitment - in business this is essential to guard against the never-ending stream of new ideas.

Don't be ashamed, embrace building boxes and you will get better at it.


22/12/2011

How the Internet fried your brain

No army can withstand the strength of an idea whose time has come.
Victor Hugo
Ever imagined a world where original ideas were the norm?

That wasn't too long ago...

Before the Internet

Lets consider a century prior to ours, when everthing was local - stores, services, social interaction and worries.

Back then, all you had to do was make it work where you live. Which is still the case today.

To make ideas work; you need persistence, patience and perseverance to bring that idea to realisation.

L&P - Laziness and Procrastination
Everyone wants more time because they waste it away browsing or socializing away on the internet for hours, reading up on all the latest news and gossip and sitting hours on end infront of a screen.

The easiest product/service to sell nowadays are ones that enable laziness and procrastination. We have proactively dubbed this - consumerism.
 
Take care not to believe by being lazy and procrastinating yourself - you will bring your big idea into the world!

Unique ideas 
To bring your unique idea into fruitition, this is what you'll need to know:
To succeed in life, you need two things: ignorance and confidence.
Mark Twain
So here are a couple of rules:
  • Be ignorant but not stupid
  • Have confidence but not headstrong
  • Be humble but not a pushover
  • Have faith and do not doubt
Don't go looking for problems to solve - and definitely don't ask people for ideas. I reckon the only good time to ask people for business/idea advice/feedback is when they are potential or existing customers.

The Big Question
Why do a lot of people fail doing business?

Simply because they aren't working on their idea. Pun implied.

11/12/2011

The Challenger: Larger Size Matters

Our biggest challenger, is not ourselves - but our perspectives.
Image from ideachampions.com


Think small

We believe worthy challenges are bigger than us.

You may have heard some people perform better under stressful situations, while others crumble.

The truth is; persistent challenge-seeking causes fear to perpetuate in most people.

We make everything harder than it is. Fear and anxiety are partners; anxiety will woo you over while fear breaks you apart.

We are big, problems are small

We live in a challenging world, we constantly find a need to challenge ourselves because everyone else is doing it.

I believe beauty is in the eye of the beholder - and so is a bad perspective.

We are actually big, problems are really small. It's Occam's razor: simpler explanations are, other things being equal, generally better than more complex ones.

Big problems are made up of little problems

Well that's obvious, we just don't believe it.

So lets try this instead:
Any big change consists of many small changes.
or consider this quote:
"We have not really looked too hard at low priced stocks over the years. Then we started to look for stocks that could gain hundreds, or even thousands of percent, and we found ourselves with small cap penny stocks. ~ William McKinley

The real challenge lies in the preparation of solving these little problems.

And remember, every small change has an effect.

07/12/2011

3 reasons to include developers in strategy meetings

I'll aim to keep this short and sweet:
“When you're building your strategy around the user, it changes the business imperatives. This world of Web services really is not about the technology itself, it's about business. The business issues are so much at play that they are really more important than the technology.” Herman Baumann quotes
In the end it's all about the business. Software. Strategy. Customer Support. Sales team. Project Managers. Developers. Techies.

Not a lot of software businesses get this and they opt to exclude developers from strategy meetings.

1. You can't code what you don't know
Programmers need to know the industry, because after all - you can't code what you don't know.

2. Know what works and what doesn't
Programmers, UI and UX designers are generally familiar with the needs of their users and are usually the first to know when something doesn't work the way it's supposed to.

3. We create things others use
In general, over a period of 3 years a developer working at a dev shop will probably gain experience developing up to or over 20 applications that consumers or businesses use. In essence, you could almost equate applications developed to businesses started 1 to 1 - or in this case 20 businesses.

Now, wouldn't it be silly to exclude the developers from strategy meetings?

After all, they are the ones who will be doing the work.