Forward references for RESTful resource collections

I could use some help with a design problem for a RESTful API.

The APIs we’re trying to do are those for a media production process with MXF and AAF as interchange formats. Data comes out of a database to go into complex long-running processes that slice and dice the data, eventually coming back to merge into the database. That database itself is replicated across half a dozen sites in an eventually consistent pattern, and connected up in various ways to other (enterprise) databases. Because the full complexity of these media formats gets in the way of designing the API basics I’ve come up with a simpler example. The weirdness of the example comes from it being distilled out of the complex use cases, where it does make (some) sense.

Setting the scene

Imagine a library of digital books. The library for reasons of storage efficiency and others has ripped all the books apart and has stored the individual chapters. When you are searching through the library or fetching bits of content, you interact with a representation of the books and the chapters (like a virtual index card) that does not include their content.

So books consist of 0 or more chapters, chapters are part of one or more books. Chapters can be part of multiple books, really. This happens because The collected works of William Shakespeare is represented as all the chapters from all of his books stitched together.

Both books and chapters have 0 or more titles (usually one title per language but there are various also known as edge cases).

Browsing through books

Imagine we represent a book as

<book xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA">
  
<title xml:lang="en-GB">The Merchant of Venice</title>
  
<title xml:lang="nl">De Koopman van Venetië</title>
  
<chapters>
    
<chapter id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
      
<title xml:lang="en-GB">FIRST ACT</title>
      
<title xml:lang="nl">EERSTE BEDRIJF</title>
    
</chapter>
  
</chapters>
</book>


and a chapter as

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
  
<title xml:lang="en-GB">FIRST ACT</title>
  
<title xml:lang="nl">EERSTE BEDRIJF</title>
  
<book id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA">
    
<title xml:lang="en-GB">The Merchant of Venice</title>
    
<title xml:lang="nl">De Koopman van Venetië</title>
  
</book>
</chapter>

It’s hopefully obvious that you can do a GET /library/{book|chapter}/{uuid} to retrieve these representations.

Changing book metadata

It’s also not difficult to imagine that you can do a PUT to the same URL to update the resource. You just PUT the same kind of document back.

What is a bit difficult is what happens when you do that PUT. The logic that I want is that a PUT of a book can be used to change the titles for that book and change which chapters are part of that book. For a PUT of a chapter, that should be used to change the titles for the chapter, but not to add or remove the chapter from a book (the list of chapters is ordered and the chapter doesn’t know where it is in the ordering).

(Again these rules seem pretty artificial in the example but in MXF there’s a variety of complex constraints that dictate in many cases that a new UMID should be created if an object in the model changes in a way that matters)

This sort-of breaks the PUT contract, because no matter how often you GET a book document, change the title of a chapter inside the book, and PUT that changed representation, your change will not be picked up. You have to follow the href, get the representation for the chapter, change the title there, and PUT it back.

This also breaks the common expectation people have with XML documents — if the data is there and you edit it and then you save it, normal things happen.

The problem with minimal representations

It’s easy to minimize the representations in use so this problem goes away. For example,

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
<title xml:lang="en-GB">FIRST ACT</title>
<title xml:lang="nl">EERSTE BEDRIJF</title>
<book href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA" />
</chapter>


It’s clear what you’re dealing with. The PUT does what it is supposed to do, and to learn the book title you just do another GET.

The problem with this approach is that the number of HTTP requests grows much larger if you want to display something in the UI, because the visual representation of a chapter shows the book title. To build snappy UIs that use ajax to communicate with my service, the rich representation that has the title information is much better.

Some options

So what should I do?

Use multiple representations

I could have /library/{book|chapter}/{uuid}/annotated as well as /library/{book|chapter}/{uuid}, with the latter serving the minimal representation and supporting PUT, or if I had smart ajax clients (I don’t) I could use some kind of content negotiation to get to the rich annotated version.

This is rather a bit of work and when documents leave the web for some kind of offline processing (the AAF files go into a craft edit suite and come back very different many weeks later, but they will still reference some of my original data) I have a risk that the “wrong” document makes into that edit suite.

Document the situation

I could stick with my original richly annotated XML and simply document which fields are and aren’t processed when you do a PUT. I’d probably change the PUT to a POST to make it a bit clearer.

Document and enforce the situation

I could strongly validate all documents that are PUT to me to make sure they do not contain any elements (in my namespace) that I do not intend to save, and reject documents that

Document the situation inside the XML

I could do something like

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
  
<title xml:lang="en-GB">FIRST ACT</title>
  
<title xml:lang="nl">EERSTE BEDRIJF</title>
  
<referencedBy>
      
<!-- please note that referencedBy contents
           cannot be changed through PUT 
-->
      
<book id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA">
        
<title xml:lang="en-GB">The Merchant of Venice</title>
        
<title xml:lang="nl">De Koopman van Venetië</title>
      
</book>
  
</referencedBy>
</chapter>


This way it’s hopefully quite obvious to the API consumer what is going to happen when they PUT a document back. It is still rather unclean REST (so should I use POST?), but it avoids me having to design separate representations for browse vs edit.

One disadvantage is that I have to keep more resource state around when parsing or generating the content. Not an issue when things are built-up in memory, but for large documents and/or for pipeline processing, I made life a lot harder. There’s other possibilities to alleviate this (like adding an isReference attribute or inlining referencedBy sections throughout the document rather than put them all at the bottom), but they’re even less please esthetically.

Something else?

Which approach do you think is best? Is there a better one? What would you do?

Right now, since I’m just doing some quick prototyping, I’ve gone for the “document the situation” approach, but I think that eventually I’d either like to somehow highlight the “this is a forward reference for your convenience but don’t edit it” bits of the XML, or go for the multiple representations approach.

New year, new job

I’m still at the BBC, but instead of dealing with doing dynamic web applications at scale, I’m going back to this other thing I really like doing: digital media. I’ve joined the Digital Media Initiative, which is all about changing the BBC’s TV production workflow to be more digital.

DMI is a huge, huge undertaking that has already been ongoing for a few years. I’ve joined a small team of architects who have joint responsibility for the architecture of the whole system. It’s great to be back to working with and thinking about video all day every day. I get to peek into TV studios, talk to TV producers, mess about with MXF and AAF and mind-boggingly messy data models.

Even if perhaps there’s not as many requests per second flowing around DMI as there are to be found heading for the BBC website, the scaling things theme in my work is still firmly there – we are planning to archive petabyte upon petabyte of 100mbit video on digital tape, to have 100s if not 1000s of professionals depending on the system for their daily workflow, etc etc.

Unfortunately DMI cannot currently be quite as open in its outward-facing communication as the web platform side of FM&T tends to be, so my blog is probably going to go rather quiet for a while. Don’t worry, I’m not dead, I’m just having fun elsewhere 🙂

How to dive into online social media marketing

  1. Familiarize yourself with social media:
    Get a blog,
    Get on twitter,
    Get on facebook,
    look at what your competitors are doing,
    engage in conversations online,
    read the cluetrain manifesto
  2. Decide that you want to do it:
    Social media strategy advice directed at consumer companies

    1. participation
    2. integration
    3. syndication
  3. Decide what you want:
    What do you hope to accomplish?
    Who will execute on your social media strategy?
    How will they do that? What do they need?
  4. Get started with free tools:
    Define a blog policy,
    encourage employees to blog,
    Start a company blog, link to employee blogs
    Start a company twitter account, follow your customers
    Start a company facebook group, befriend your customers
  5. If you want to customize your existing website, consider buying a vendor solution:
    Forrester analysis of social networking platforms, with process explained

    1. “forrester wave” defined
    2. primary analyst talking about analysis
    3. start of analysis
    4. aside: criticism of SaaS
    5. aside: SaaS a way to circumvent IT
    6. conclusion:
      go with KickApps or Pluck
    7. register to download forrest report PDF
    8. list of white label social networking platforms
  6. Let me know how it works out!

belly button

Joost has a large library of video clips. For a while Joost was my primary source of music during the day, but I tend to use last.fm quite a bit these days, and now I only turn Joost on when I don’t mind being distracted by Nelly Furtado‘s belly button.

The last.fm feature I like most is the music profile sharing. I’ve learned about so much cool new music just by looking at what my friends and colleagues listen to. Fortunately, Joost is now growing some social networking features of its own – these days if you click on a video on our website, we can send a notification to your facebook feed. I wonder what will happen now that everyone on facebook will know that I actually enjoy watching Rihanna and Fergie

Scrambling towards a new online media ecosystem

I just watched a registration of a presentation by Larry Lessig at TED (using miro, which gets better every week) about creative freedom and read-write culture. Over the last few years I’ve read a lot of Lessig his work, and I’m impressed at how the same message comes across when he delivers it as a presentation.

I feel like I’m one of the kids he mentions (though at 24, people tend to treat me as an adult!), and the “remixing” story he tells seems so obvious and natural to me. I grew up with the web, and I grew up as a participant in online communities.

It seems right now a lot of the attention in this creative freedom arena is going to online video at the moment, which is where I also have a bit of a professional interest. It is not easy to figure out how to help shape a digital world where amateur and professional content can coexist well together, and moreover mesh and intermingle freely. Like Lessig explains, we need to find the right balance. Media companies must accept some loss of control over the user experience, and in return the users have to accept that while sharing is good, stealing is bad, and that creators have some rights they must respect.

Hopefully we will find a path to a healthy and sustainable ecosystem for digital media. To me it’s obvious we’re not there yet, but I see progress happening by leaps and bounds everywhere I look. These are exciting times!