Monthly Archives: March 2012

RavenDB: Lessons Learned: Query Includes and Projections

Note: This was originally posted over on the Development @ MSNBC blog.

By default, RavenDB will only allow 30 requests per session. This is part of RavenDB’s “Safe by default” behaviors, to prevent you from making a giant number of RavenDB HTTP requests, which would be a performance quagmire.

Let’s say you have an object graph that you are retrieving from RavenDB that contains referenced documents, and it looks like this:

stories/123
{
  "Headline": "New iPad is key to Apple's bottom line",
  "Author": "Jack Smith",
  "LastPublishedAtUtc": "2012-03-14T23:48:00.0000000+00:00", 
  "PublishStatus": "Published",
  "StoryReferences": 
  {
    "storyreference/456213": 
    {
      "Headline": "New iPhone coming soon",
      "Author": "John Doe"
    }
    "storyreference/789654": 
    {
      "Headline": "New iPad foils reviewers' attempts to find legitimate faults",
      "Author": "Jane Doe"
    }
    "storyreference/555111": 
    {
      "Headline": "Now on Netflix: Search by TV network",
      "Author": "Jack Smith"
    }
    "storyreference/942342": 
    {
      "Headline": "Apple stores to open at 8am for iPad launch",
      "Author": "John Doe"
    }
    ...
  }
}

And let’s say you are interested in getting a small subset of data about the referenced stories for display with the base story. What you DON’T want to do is something like this:

var story = session.Load("stories/123");

foreach(var storyReference in story.RelatedStories)
{
    var otherStory = session.Load(storyReference.Id);
    // ... do something with otherStory ...
}

That will result in the following HTTP traffic back to Raven:

  1. Make a request for ‘story/123′
  2. Make a request for ‘story/456213′
  3. Make a request for ‘story/789654′
  4. Make a request for ‘story/555111′
  5. Make a request for ‘story/942342′
  6. …etc…

You’ll consume unnecessary bandwidth and incur the cost of individual HTTP requests. What you really want to do is have the client make a single HTTP request. Fortunately RavenDB allows you to do that with Includes. A RavenDB include says “Hey server, go get this for me, but before you give it back to me, gather up these other things and return them with the request too so I can deal with them in a moment”.

A few weeks back we had some code that was hitting the 30 requests per session limit. At first we couldn’t understand why, since we do a pretty good job of making sure we only make 1 or 2 requests via Includes. Upon further inspection, it turned out we had misunderstood something about the RavenDB client API.

What’s the problem?

If we have an index that produces projections, in which it produces a server side anonymous entity containing flattened “StoryReferenceIds”, like this (this is a contrived example):

public class Stories_ByReferencedStories : AbstractIndexCreationTask
{
    public class Result
    {
        public string Headline { get; set; }
        public DateTimeOffset? LastPublishedAtUtc { get; set; }
        public IEnumerable StoryReferenceIds { get; set; }
    }

    public Stories_ByReferencedStories()
    {
        this.Map = stories => from story in stories
                              select new
                              {
                                  Headline = story.Headline,
                                  LastPublishedAtUtc = story.LastPublishedAtUtc,
                                  StoryReferenceIds = story.StoryReferences.Select(x => x.Id),
                              };
    }
}

… Then we had previously done something like the following on our Lucene queries against it:

session.Advanced.LuceneQuery()
  .WhereStartsWith("Headline", text)
  .OrderBy("-LastPublishedAtUtc")
  .Include("StoryReferenceIds")

However, it turns out that last Include line doesn’t do anything at all. The Include() call actually operates on the entries identified by the index, NOT the projection. In other words, the stories produced from the query are what the Include() call actually operates against.

So, with that in mind, what we actually want is something like this:

.Include("StoryReferences,Id");

The syntax with the comma may look a little funny, but what it means is “For the StoryReferenceIds entities collection, Include the document identified by the Id property from each referenced document”. So if you had a story with 45 referenced stories in it, instead of making 46 requests back to Raven, you would make only 1 request. That’s much better.

Happy coding.

Raven DB: Lessons Learned: Caching Contexts

Note: This was originally posted over on the Development @ MSNBC blog.

Caching

When you talk about caching in terms of the the full web application stack, you’ve typically got the following layers:

  • Browser cache
  • CDN cache
  • Application output cache
  • Data cache

However, in a application leveraging Raven DB, the last layer actually gets split up into two layers.

Some Background

The way Raven DB operates is by having the client generate HTTP requests, which are sent across the wire to the server. Therefor, the same standard caching mechanisms that HTTP provides are present. This means, that if a request is made, and Raven DB thinks the data hasn’t changed since the last time you requested that same data, then the server responds with HTTP 304 Not Modified, instructing the raven client to continue to use what it got last time.

using (var session = store.OpenSession())
{
    //if the server doesnt have anything different from the last time 
    //this was requested, it wont do any processing, and just return 
    //HTTP 304 Not Modified to the client. the client will then use 
    //what it got last time.
    var foo = session.Load("foos/123");
}

So the first layer of the data cache, you get for free out of the box with Raven. Fortunately the second layer is available as well, if your application needs it.

Aggressive Data Caching

With Raven DB, it’s possible to instruct the client to not even ask the server for data again, thereby skipping the HTTP request, even if it might result in a 304. Here’s what that looks like:

using (var session = store.OpenSession())
{
    //set up an aggressive caching context, instructing the server to not
    //make an http request if it made one within the last 5 seconds
    using (session.Advanced.DocumentStore
        .AggressivelyCacheFor(TimeSpan.FromMinutes(5)))
    {
        //may or may not make a request
        var foo = session.Load("foos/123");  
    }
}

Runtime Configuration?

We made mention in a previous blog post about a runtime configuration setup that we’ve provided our ops team with. Having the ability to control the TTL on the Raven runtime configuration seems like a prime candidate to use with this. We wired up the runtime configuration much the same as the output caching runtime configuration from the other blog post.

Clever

Now, to use output caching, it was a simple line to apply the [ConfiguredOutputCache] attribute to our controller actions. However, with the raven data caching, it’s a violation of DRY to have to open an aggressive caching context, and pass in a runtime configuration value everywhere it’s needed. So, with that in mind, we came up with an extension method to encapsulate that behavior. We thought this was very clever, but it actually turned out to be quite stupid. Can you spot the problem?

public static class DataCachingExtensions
{
    public static T LoadAndCache(this IDocumentSession session, string id)
    {
        using (session.Advanced.DocumentStore.AggressivelyCacheFor(
            CacheSettings.RavenAggressiveCachingDurationSeconds))
        {
            return session.Load(id);
        }
    }

    public static IRavenQueryable QueryAndCache(
        this IDocumentSession session)
    {
        using (session.Advanced.DocumentStore.AggressivelyCacheFor(
            CacheSettings.RavenAggressiveCachingDurationSeconds))
        {
            return session.Query();
        }
    }
}

... 

session.LoadAndCache("foos/123");

...

session.QueryAndCache().Where(f => f.Bar == "Baz").ToList();

The first extension method is fine, but the 2nd one doesn’t do anything at all. Why?

It’s because Raven doesn’t actually execute the HTTP query until it’s evaluated. So since we haven’t actually executed the query, and have returned from inside the aggressive caching context, the context was disposed before we ever execute the HTTP query, resulting in no caching.

So after feeling pretty silly, we restructured the extension method to simply return the aggressive caching context, so that the caller can encapsulate the full query including its execution.

public static class DataCachingExtensions
{
    public class NonCachingContext : IDisposable
    {
        public void Dispose() { }
    }

    public static IDisposable GetCachingContext(
        this IDocumentSession session)
    {
        if(CacheSettings.RavenAggressiveCachingDurationSeconds == 0)
        {
            return new NonCachingContext();
        }

        return session.Advanced.DocumentStore.AggressivelyCacheFor(
            CacheSettings.RavenAggressiveCachingDurationSeconds);
    }
}

...

using (var session = store.OpenSession())
{
    using (session.GetCachingContext())
    {
        session.Query().Where(f => f.Bar == "Baz").ToList();
    }
}

It’s worth pointing out that the caching context, when used with a Query, does not actually cache the items returned, but rather just caches the query/response aspect, so subsequent cache-enabled calls to .Loadfor items that were returned from a cache-enabled query context will still make a request take place, if they weren’t already cached by a .Loadcall themselves.

Happy coding!