When a software collapses, it's usually due to it being no longer maintainablebecause of complexity. Brian Kernighan, the creator of AWK programminglanguage, puts it quite explicitly:

Controlling complexity is the essence of computer programming.

This post is about Data Transfer Object Pattern, which:

  • is just a way of organizing some pieces of code

  • is simple

  • clearly specifies shapes of objects when communicating with backend/external services

  • makes code more structured and self-descriptive

  • provides a way to plug-in validation or parsing without too much effort, if such need arises

  • does not require any extra libraries

  • is totally technology agnostic, so it plays nice with plain POJOs, ES classes, TypeScript interfaces or anything else you'd normally use

DTOs-blog-post-by-Wojtek-v2-3.png

In the code samples I will use TypeScript, which looks like ECMAScript with added type annotations. If you'd rather use plain JS, just disregard all type annotations (so treat function f(a: number): string as function f(a)). Static typing is, however, extremely beneficial! Refer to this post on static typing for more details.

Example implementation is available in this plunk.

The Problem: Data Model Shape

For the sake of this post, let's assume we're writing a web app that integrates with GitHub. Somewhere in our code we have a function for getting allorganization repositories:

function getOrganizationRepos(org: string): Promise<{}> {
  return fetch(`https://api.github.com/orgs/${org}/repos`)
    .then(response => response.json())
}

Then you can, for example, list all git urls:

getOrganizationRepos('npm')
  .then(repos => repos.forEach(repo => console.log(repo.clone_url)))

It happens to work. There is however one point that makes me queasy: the repo.clone_url expression depends on the shape of the response from GitHub. It might not sound like a big deal, but let's imagine our app is more complex and it passes data around. So, let's say that we first fetch data from GitHub, then we cache it somewhere, then we pass it along to business logic, then depending on the routing we use some components or views, and we finally render a list of repos with some kind of template, which might look like this:

<ul class="repos"> <!-- some kind of `for repo in repos` loop -->
  <li class="repo">
    <h4>{  }</h4>
    <span class="git-url">{  }</span>
  </li> <!-- end forloop -->
</ul>

It doesn't work - the template should get repo.clone_url instead of repo.git_url. To track this bug down, you will need to go all the way to the API layer of your app. Not good. Moreover, if GitHub API changes (and your API will probably change more often than GitHub's), we will need to update templates - and that's just madness. Everything boilsdown to a single conclusion:

Avoid things you cannot control.

We cannot control the shape of backend responses.


Conquering the Data Shape by Abstracting

We'll take over the data shape by providing an extra intermediary. Instead of passing around the original parsed JSON, we'll use our own custom data type.This has a few advantages:

  1. The data shape will be explicitly described in our app, so we control it.

  2. The intermediary will be able to perform some conversions, if we need them.

  3. We will decouple communication layer from logic layer.

I will describe two flavors of DTOs I use on a daily basis: POJO-like DTOs and class-based DTOs. As no two apps are identical, you might have to adapt one of these or devise your own, but the principles stay the same:

Prepare a type that closely resembles communication protocol objects


POJO-style DTO

So, we still want to use POJOs, but we want to be explicit about their shape.

Let's extend our communication layer:

function parseRepo() {
  return {
    commits_url,
    forks,
    full_name,
    git_url: clone_url,
    html_url,
    issues_url,
    language,
    name,
    owner
  }
}

function getOrganizationRepos(org)  {
  return fetch(`https://api.github.com/orgs/$/repos`)
    .then(response => response.json())
    .then(repos => repos.map(parseRepo))
}

This way we freeze the data shape to be used in the rest of the application.

We also did a small conversion: our objects will have a git_url attribute instead of clone_url.

For small apps, we can than use the DTO as our model. For more complex situations, though, we'd rather design the data model so that it does not depend on the communication layer. Therefore we need two different types: DTO for communication, and a regular model for internal use. We also need a conversion from one to another:

function repoToModel() {
  return {
    urls: {
      git: clone_url,
      commits: commits_url,
      html: html_url,
      issues: issues_url
    },
    forks,
    full_name,
    language,
    name,
    owner
  }
}

The last step to do is to describe the types using TypeScript interfaces: the communication object (RepoDTO) and the model (Repo), which in turn depends on RepoUrls:

interface RepoDTO {
  clone_url: string
  commits_url: string
  forks: number
  full_name: string
  html_url: string
  issues_url: string
  language: string
  name: string
  owner: string
}
  
interface RepoUrls {
  git: string
  commits: string
  html: string
  issues: string
}

interface Repo {
  urls: RepoUrls
  forks: number
  full_name: string
  language: string
  name: string
  owner: string
}

Having these interfaces, final fetching and listing might look like this:

function repoToModel(raw: RepoDTO): Repo
function getOrganizationRepos(org: string): Promise<Repo[]> {
  return fetch(`https://api.github.com/orgs/$/repos`)
    .then(response => response.json())
    .then(repos => repos.map(repoToModel))
}

Pros

  • The data shape is explicitly described with Repo interface

  • If API changes, we only need to change repoToModel function and RepoDTOinterface

  • The rest of the app does not depend on backend API

  • Issuing PUT/POST/PATCH requests with specific payload is now a breeze

Cons

  • If backend responds with malformed record (e.g. without clone_url field), we probably won't catch it and pass undefineds somewhere

This solution is very often good enough - you're free to assume backend will play nice.

More Guarantees: Adding Validation

Sometimes you'd rather have the fail-fast approach when response is in the wrong shape. This can be achieved by modifying the converter function so that it validates the input more carefully:

function nonEmptyText(maybeText: any): string {
  if (typeof maybeText === 'string' && maybeText) {
    return maybeText
  }
  throw new TypeError(`Expected non-empty string, got $`);
}

function repoToModel(raw: RepoDTO): Repo {
  return {
    urls: {
      git: nonEmptyText(raw.clone_url),
      issues: nonEmptyText(raw.issues_url),
      ...
    },
  }
}

This way you can check various constraints on the data, but I'd rather you spend your valuable time doing something else. Why? All this validation will not stop your app from crashing: If backend decides to cheat, you lose anyway. The only thing you gain is failing fast (and maybe a nicer error message for your user,if you organize your exception handling properly), but it seems like a lot of unnecessary effort. Sometimes it's worth it, but rarely.

Pros

  • Data is validates as soon as it enters your app, so you know that internal objects are well-formed

Cons

  • Requires substantial effort

More Convenience: Classy DTOs

There is this never-ending discussion on rich interfaces vs thin interfaces.

For now, our DTOs just contain data, but are dumb: they have no methods, no nothing. If you're a fan of rich interfaces, you might want to add extra behavior to the models. This is easily achievable by using classes instead of POJOs:

interface AbstractRepoDTO { // as RepoDTO previously
  ...
}

class RepoDTO {
    public static parse(raw: RepoDTO): RepoDTO {
    /* If you need validation, do it here: */
        const cleanedData = validateRepo(raw)
        return new RepoTO(cleanedData)
    }
    public isHot: boolean
    constructor(
      public html_url: string,
      public clone_url: string,
      public forks: number,
      public full_name: string,
      ...) {
        this.isHot = forks > 500
    }
    public toString() {
      return `$, written in $`
    }
    public toJson(): string
    public toXml(): string
    public toModel(): Repo
}

function getOrganizationRepos(org: string): Promise<Repo[]>  {
  return fetch(`https://api.github.com/orgs/$/repos`)
    .then(response => response.json())
    .then(repos => repos.map(item => RepoDTO.parse(item).toModel())
}

Pros

  • You can add any extra behaviors to your models

  • A class has an explicit list of attributes, so you're type-safe even in plain ECMAScript

Cons

  • REST APIs are based on POJOs, so you need to be careful during serialization and parsing

  • You might be tempted to put lots of custom business logic in the model,which in my opinion validates the Single Responsibility Principle - use DTO's for modelling data; complex logic should be in models or services

Bonus: Abstracting even further

You're a good software engineer, so you probably already figured this on your own, but just for the sake of completeness here's a method that fetches and parses any array of DTOs:

function fetchAndParseArray<A>(url: string, parser: (raw: {}) => A): Promise<A[]> {
  return fetch(url)
    .then(response => response.json())
    .then(parser)
}

fetchAndParse('https://api.github.com/orgs/npm/repos',item => 
    Repo.parse(item)
        .toModel())
        .then((repos: Repo[]) => repos.forEach(r => console.log(r.toString())))

Wrapping Up

The main advantage of using a DTO is isolating your application from the backend changes. When choosing the exact shape of your DTO, try and keep it simple. Wrapping API entities in TypeScript interfaces is usually sufficient to make the development and maintenance easier.