RavenDB Query Includes and Projections

15 Mar 2012. comments

By default RavenDB will only allow 30 requests per session. This is part of RavenDB’s safe-by-default behavior to prevent you from making a performance problem by executing a large number of HTTP requests.

One of the design principles that RavenDB adheres to is the idea that documents are independent, meaning all data required to process a document is stored within the document itself. However, this doesn’t mean there should not be relations between objects.

There are valid scenarios where we need to define relationships between objects. By doing so, we expose ourselves to one major problem: whenever we load the containing entity, we are going to need to load data from the referenced entities too (unless we are not interested in them). While the alternative of storing the whole entity in every object graph it is referenced in seems cheaper at first, this proves to be quite costly in terms of database resources and network traffic.

Let’s say you have an object graph that you are retrieving from RavenDB that contains referenced documents and it looks like this:

"stories/123": {  
  "Headline": "New iPad is key to Apple's bottom line",  
  "Author": "Jack Smith",  
  "LastPublishedAtUtc": "2012-03-14T23:48:00.0000000+00:00",   
  "PublishStatus": "Published",  
  "StoryReferences": {  
    "storyreference/456213": {  
      "Headline": "New iPhone coming soon",  
      "Author": "John Doe"  
    "storyreference/789654": {  
      "Headline": "New iPad foils reviewers' attempts to find legitimate faults",  
      "Author": "Jane Doe"  
    "storyreference/555111": {  
      "Headline": "Now on Netflix: Search by TV network",  
      "Author": "Jack Smith"  
    "storyreference/942342": {  
      "Headline": "Apple stores to open at 8am for iPad launch",  
      "Author": "John Doe"  

And let’s say you are interested in getting a small subset of data about the referenced stories for display with the base story. What you DON’T want to do is something like this:

var story = session.Load("stories/123");  
foreach(var storyReference in story.RelatedStories)  
  var otherStory = session.Load(storyReference.Id);  
  // ... do something with otherStory ...  

That will result in the following HTTP traffic back to Raven:

  1. Make a request for ‘story/123’
  2. Make a request for ‘story/456213’
  3. Make a request for ‘story/789654’
  4. Make a request for ‘story/555111’
  5. Make a request for ‘story/942342’
  6. …etc…

You’ll consume unnecessary bandwidth and incur the cost of individual HTTP requests. What you really want to do is have the client make a single HTTP request for all of them. Fortunately RavenDB allows you to do that with ‘Includes’. A RavenDB include says “Hey server go get this for me, but before you give it back to me gather up these other things and return them with the request too so I can deal with them in a moment”.

A few weeks back we had some code that was hitting the 30 requests per session limit. At first we couldn’t understand why since we do a pretty good job of making sure we only make 1 or 2 requests via ‘Includes’. Upon further inspection it turned out we had misunderstood something about the RavenDB client API.

If we have an index that produces projections, in which it produces a server side anonymous entity containing flattened “StoryReferenceIds” like this (contrived example):

public class StoriesByReferencedStories : AbstractIndexCreationTask
  public class Result
    public string Headline { get; set; }
    public DateTimeOffset? LastPublishedAtUtc { get; set; }
    public IEnumerable StoryReferenceIds { get; set; }

  public StoriesByReferencedStories()
    this.Map = 
      stories => from story in stories
                 select new
                   Headline = story.Headline,
                   LastPublishedAtUtc = story.LastPublishedAtUtc,
                   StoryReferenceIds = story.StoryReferences.Select(x => x.Id)

Then we had previously done something like the following on our Lucene queries against it:

  .WhereStartsWith("Headline", text)

However it turns out that last Include line doesn’t do anything at all. The Include() call actually operates on the entries identified by the index and NOT the projection. In other words the stories produced from the query are what the Include() call actually operates against.

So with that in mind what we actually want is something like this:


The syntax with the comma may look a little funny, but what it means is “For the StoryReferenceIds entities collection Include the document identified by the Id property from each referenced document”. So if you had a story with 45 referenced stories in it, instead of making 46 requests back to Raven you would make only 1 request. That’s much better.


Tagged: performance ravendb ASP.NET MVC C#

2019 Ben Lakey

The words here do not reflect those of my employer.