When a software collapses, it's usually due to it being no longer maintainablebecause of complexity. Brian Kernighan, the creator of AWK programminglanguage, puts it quite explicitly:
Controlling complexity is the essence of computer programming.
This post is about Data Transfer Object Pattern, which:
is just a way of organizing some pieces of code
is simple
clearly specifies shapes of objects when communicating with backend/external services
makes code more structured and self-descriptive
provides a way to plug-in validation or parsing without too much effort, if such need arises
does not require any extra libraries
is totally technology agnostic, so it plays nice with plain POJOs, ES classes, TypeScript interfaces or anything else you'd normally use
In the code samples I will use TypeScript, which looks like ECMAScript with added type annotations. If you'd rather use plain JS, just disregard all type annotations (so treat function f(a: number): string as function f(a)). Static typing is, however, extremely beneficial! Refer to this post on static typing for more details.
Example implementation is available in this plunk.
The Problem: Data Model Shape
For the sake of this post, let's assume we're writing a web app that integrates with GitHub. Somewhere in our code we have a function for getting allorganization repositories:
function getOrganizationRepos(org: string): Promise<{}> { return fetch(`https://api.github.com/orgs/${org}/repos`) .then(response => response.json()) }
Then you can, for example, list all git urls:
getOrganizationRepos('npm') .then(repos => repos.forEach(repo => console.log(repo.clone_url)))
It happens to work. There is however one point that makes me queasy: the repo.clone_url expression depends on the shape of the response from GitHub. It might not sound like a big deal, but let's imagine our app is more complex and it passes data around. So, let's say that we first fetch data from GitHub, then we cache it somewhere, then we pass it along to business logic, then depending on the routing we use some components or views, and we finally render a list of repos with some kind of template, which might look like this:
<ul class="repos"> <!-- some kind of `for repo in repos` loop --> <li class="repo"> <h4>{ }</h4> <span class="git-url">{ }</span> </li> <!-- end forloop --> </ul>
It doesn't work - the template should get repo.clone_url instead of repo.git_url. To track this bug down, you will need to go all the way to the API layer of your app. Not good. Moreover, if GitHub API changes (and your API will probably change more often than GitHub's), we will need to update templates - and that's just madness. Everything boilsdown to a single conclusion:
Avoid things you cannot control.
We cannot control the shape of backend responses.
Conquering the Data Shape by Abstracting
We'll take over the data shape by providing an extra intermediary. Instead of passing around the original parsed JSON, we'll use our own custom data type.This has a few advantages:
The data shape will be explicitly described in our app, so we control it.
The intermediary will be able to perform some conversions, if we need them.
We will decouple communication layer from logic layer.
I will describe two flavors of DTOs I use on a daily basis: POJO-like DTOs and class-based DTOs. As no two apps are identical, you might have to adapt one of these or devise your own, but the principles stay the same:
Prepare a type that closely resembles communication protocol objects
POJO-style DTO
So, we still want to use POJOs, but we want to be explicit about their shape.
Let's extend our communication layer:
function parseRepo() { return { commits_url, forks, full_name, git_url: clone_url, html_url, issues_url, language, name, owner } } function getOrganizationRepos(org) { return fetch(`https://api.github.com/orgs/$/repos`) .then(response => response.json()) .then(repos => repos.map(parseRepo)) }
This way we freeze the data shape to be used in the rest of the application.
We also did a small conversion: our objects will have a git_url attribute instead of clone_url.
For small apps, we can than use the DTO as our model. For more complex situations, though, we'd rather design the data model so that it does not depend on the communication layer. Therefore we need two different types: DTO for communication, and a regular model for internal use. We also need a conversion from one to another:
function repoToModel() { return { urls: { git: clone_url, commits: commits_url, html: html_url, issues: issues_url }, forks, full_name, language, name, owner } }
The last step to do is to describe the types using TypeScript interfaces: the communication object (RepoDTO) and the model (Repo), which in turn depends on RepoUrls:
interface RepoDTO { clone_url: string commits_url: string forks: number full_name: string html_url: string issues_url: string language: string name: string owner: string } interface RepoUrls { git: string commits: string html: string issues: string } interface Repo { urls: RepoUrls forks: number full_name: string language: string name: string owner: string }
Having these interfaces, final fetching and listing might look like this:
function repoToModel(raw: RepoDTO): Repo function getOrganizationRepos(org: string): Promise<Repo[]> { return fetch(`https://api.github.com/orgs/$/repos`) .then(response => response.json()) .then(repos => repos.map(repoToModel)) }
Pros
The data shape is explicitly described with Repo interface
If API changes, we only need to change repoToModel function and RepoDTOinterface
The rest of the app does not depend on backend API
Issuing PUT/POST/PATCH requests with specific payload is now a breeze
Cons
If backend responds with malformed record (e.g. without clone_url field), we probably won't catch it and pass undefineds somewhere
This solution is very often good enough - you're free to assume backend will play nice.
More Guarantees: Adding Validation
Sometimes you'd rather have the fail-fast approach when response is in the wrong shape. This can be achieved by modifying the converter function so that it validates the input more carefully:
function nonEmptyText(maybeText: any): string { if (typeof maybeText === 'string' && maybeText) { return maybeText } throw new TypeError(`Expected non-empty string, got $`); } function repoToModel(raw: RepoDTO): Repo { return { urls: { git: nonEmptyText(raw.clone_url), issues: nonEmptyText(raw.issues_url), }, } }
This way you can check various constraints on the data, but I'd rather you spend your valuable time doing something else. Why? All this validation will not stop your app from crashing: If backend decides to cheat, you lose anyway. The only thing you gain is failing fast (and maybe a nicer error message for your user,if you organize your exception handling properly), but it seems like a lot of unnecessary effort. Sometimes it's worth it, but rarely.
Pros
Data is validates as soon as it enters your app, so you know that internal objects are well-formed
Cons
Requires substantial effort
More Convenience: Classy DTOs
There is this never-ending discussion on rich interfaces vs thin interfaces.
For now, our DTOs just contain data, but are dumb: they have no methods, no nothing. If you're a fan of rich interfaces, you might want to add extra behavior to the models. This is easily achievable by using classes instead of POJOs:
interface AbstractRepoDTO { // as RepoDTO previously } class RepoDTO { public static parse(raw: RepoDTO): RepoDTO { /* If you need validation, do it here: */ const cleanedData = validateRepo(raw) return new RepoTO(cleanedData) } public isHot: boolean constructor( public html_url: string, public clone_url: string, public forks: number, public full_name: string, ) { this.isHot = forks > 500 } public toString() { return `$, written in $` } public toJson(): string public toXml(): string public toModel(): Repo } function getOrganizationRepos(org: string): Promise<Repo[]> { return fetch(`https://api.github.com/orgs/$/repos`) .then(response => response.json()) .then(repos => repos.map(item => RepoDTO.parse(item).toModel()) }
Pros
You can add any extra behaviors to your models
A class has an explicit list of attributes, so you're type-safe even in plain ECMAScript
Cons
REST APIs are based on POJOs, so you need to be careful during serialization and parsing
You might be tempted to put lots of custom business logic in the model,which in my opinion validates the Single Responsibility Principle - use DTO's for modelling data; complex logic should be in models or services
Bonus: Abstracting even further
You're a good software engineer, so you probably already figured this on your own, but just for the sake of completeness here's a method that fetches and parses any array of DTOs:
function fetchAndParseArray<A>(url: string, parser: (raw: {}) => A): Promise<A[]> { return fetch(url) .then(response => response.json()) .then(parser) } fetchAndParse('https://api.github.com/orgs/npm/repos',item => Repo.parse(item) .toModel()) .then((repos: Repo[]) => repos.forEach(r => console.log(r.toString())))
Wrapping Up
The main advantage of using a DTO is isolating your application from the backend changes. When choosing the exact shape of your DTO, try and keep it simple. Wrapping API entities in TypeScript interfaces is usually sufficient to make the development and maintenance easier.