Wednesday, January 28, 2009

Using the Repository Pattern with CSLA.NET, Part 1

In a previous post I showed a way to use dependency injection with CSLA.NET.  In this post, I'll show a practical example of how to use that technique to abstract the data access layer used by your business objects using the Repository pattern.  This post is part 1.  At the end of part 2, I'll include a link to a fully-functional demo application that even includes a fully functional business layer, data access layer, UI layer, and unit tests.

Background

The default approach for data access in CSLA.NET is to write your data access code right in the business class in the DataPortal_* or Child_* methods.  These methods have direct access to the internal state of the business object so they can easily persist that state to your data store without breaking encapsulation.  While this approach obviously works, there are several reasons why one might want to move this code out of the business object and into its own layer.  Here are few good ones:

  1. Testability: If you want to be able to write tests for your business objects that test isolated units of code (i.e. unit tests), your business object shouldn't be talking to a real database at test-time.  Therefore some kind of abstraction needs to be implemented that allows the real database to be called at run-time and a mocked database to be called at test-time.
  2. Maintainability: It's never good when you cram too much code into one class.  Arguably, data access logic is very different than business logic, and therefore has a different set of concerns and pressures for change.  Moving it out to its own layer allows both the business logic and the data access logic to change independently with minimal impact on each other.  For example, an abstracted data access layer, if designed at a high enough level, might allow you to change the underlying database platform (ex: from SQL Server to Oracle) or technology (ADO.NET to LINQ to SQL) without having to alter and recompile your business layer. Abstracting data access into its own layer also jives with SRP (Single Responsibility Principal) which states that every object should only have one reason to change.  While CSLA.NET business objects may never only have just one reason to change, it's always good when you can reduce the number of reasons.
  3. Security: Some application architectures require that the data access code itself cannot exist on any public-facing machine (the client workstation in the case of a Rich UI application like WinForms, WPF, or Silverlight or the web server in the case of a web app).  The idea being if a hacker is able to compromise a public-facing machine, and that machine has the data access code on it, they then potentially have the information to access the database itself (i.e. via the connection string).  Moving the data access code to a separate layer means you have the flexibility to only deploy the data access layer to an application server, which stands in between the public-facing machine(s) and the database, behind its own firewall.

Problem

Unfortunately, abstracting the data access layer is no easy task.  There are a lot of choices and options, so you have to be careful that you don't end up limiting yourself.  Here are a few things we need to consider:

  1. We want a data access layer that is called and consumed by the business layer and not one that populates the business objects for us.  If your data access layer populates your business objects it either has to set public properties on your business objects (which fires validation rules unnecessarily) or it has to have access to the internal state of your business objects, which violates encapsulation. 
  2. We need to determine what level we want to do our abstraction.  We could create a low-level abstraction that represents a specific data access technology.  For example, you could abstract ADO.NET by only working with the various ADO.NET interfaces (IDbConnection, IDbCommand, IDataReader, etc.).  But this approach prevents you from easily switching data access technologies, since, by doing so, you would have to alter your business layer code.  Another approach to consider is making your data access layer abstraction generalized enough so it doesn't pin you to a specific technology.

Solution

The solution I chose to pursue was to follow the relatively well-known data access pattern called the Repository Pattern.  Repository is a high-level abstraction of your data access layer that allows the caller to perform any necessary data access (queries, CRUD, etc) and get back a structured set of objects that represent the data.  I like to call these objects DTO's (data transfer objects) since they are nothing more than strongly-typed data structures (just data, no logic).  And since the Repository is a high-level abstraction, you gain the flexibility of being able to change the underlying data access technology in the future as well as do other fun tricks like data caching.

Given that description of Repository, there still are a lot of ways to implement it.  Just do a Google search on "Repository Pattern" and you'll see what I mean.  I've come up with what I think is a relatively simple approach that seems to work well with CSLA.NET.  Let's take a closer look.

Repository and Context

There are two key components to my Repository design: the repository and the context.

The repository represents the gateway to the data access layer.  The repository will create all the necessary data access objects we need to perform our data access.  Essentially it is a factory object and it can be injected into our business object via dependency injection. 

When you want to actually perform data access operations, you usually want this to be done in a context of some sort that you can dispose of when you're done.  Maybe you need to execute one or more calls atomically within a transaction or maybe you just need to do some querying and you want to ensure that the connection is closed when you're done, even if the query fails.  This is what a context is for. 

The repository's primary job is to create a context when the business object needs to perform data access.  It can also create other data access objects, like DTO's, that may be needed to assist in performing data access.

So, let's take a look at what the repository and context might look like in code:

public interface IRepository<TContext>
where TContext : IContext
{
TContext CreateContext(bool isTransactional);
}


public interface IContext
: IDisposable
{
void CompleteTransaction();
}

Pretty simple.  An IRepository object can create a specific type of IContext object, and when it does, you have to specify whether or not the IContext is transactional.  If an IContext is transactional, it has a method called CompleteTransaction that is called by the business object to indicate that the transaction has succeeded and should be committed.  And regardless if the IContext object is transactional or not, it is disposable which makes it easy for the calling business object to close things like database connections.

And by the way, I personally prefer to use interfaces for my abstractions (vs abstract classes) because they're much easier to mock in a testing scenario.

Notice that neither of these interfaces has any methods that perform specific database operation (ex: CRUD).  I prefer to move those down into more specialized interfaces that abstract a specific set of entities in your data access layer.  For example, here's a repository and a context for a simple order-entry system that contains order and line item entities:

public interface IOrderRepository
: IRepository<IOrderContext>
{
IOrderDto CreateOrderDto();
ILineItemDto CreateLineItemDto();
}

public interface IOrderContext
: IContext
{
IEnumerable<IOrderInfoDto> FetchInfoList();
IOrderDto FetchSingleWithLineItems(int id);
void InsertOrder(IOrderDto newOrder);
void UpdateOrder(IOrderDto existingOrder);
void DeleteOrder(int id);
void InsertLineItem(ILineItemDto newLineItem);
void UpdateLineItem(ILineItemDto existingLineItem);
void DeleteLineItem(int id, int orderId);
}

Most of the action is down in the IOrderContext object; it contains all of the data access methods that involve interacting with the database.  There are two methods for querying and retrieving order data.  The remaining methods perform the inserts, updates, and deletes.

Notice that my query methods are explicit: "fetch info list" and "fetch single order with line items".  I've seen some Repository Pattern implementations out there that provide more generic query methods that allow you to harness the power of LINQ by returning an IQueryable<T> object.  While there's nothing wrong with this approach, it does lock you into using data access technologies in your concrete implementation that only support LINQ.  Some do, some don't (like ADO.NET).  To be flexible, I chose to not lock myself into LINQ, so my query methods return simple IEnumerable<T> results.  And even if you are confident that any data access technology you will use is LINQ-friendly, not all LINQ implementations are the same.  For example: LINQ syntax for LINQ to SQL is a little different than LINQ syntax for LINQ to Entities.

The DTO's

Notice that all data is transferred to and from these data access methods via DTO's.  The only issue is that these DTO's are interfaces so if the business object has to pass them to the context (say to the InsertOrder method), it needs a way to create one first.  That's where the DTO factory methods on the IOrderRepository object come into play.  This further cements the concept that the repository is really an object factory.

Before we move on, let's take a look at the definition of the DTO interfaces in my example.

First, the IOrderInfoDto which is returned by the FetchInfoList method:

public interface IOrderInfoDto
{
int Id { get; set; }
string Customer { get; set; }
DateTime Date { get; set; }
}

This is giving us only a portion of the fields that may be defined by the order entity in the database.  The purpose of this DTO (and the ReadOnlyBase<T> business object it populates, is to provide a simplified view of order data, perhaps to populate a list in the GUI.

When we need a single entire order, the FetchSingleWithLineItems method returns the IOrderDto object:

public interface IOrderDto
{
int Id { get; set; }
string Customer { get; set; }
DateTime Date { get; set; }
decimal ShippingCost { get; set; }
byte[] Timestamp { get; set; }
IEnumerable<ILineItemDto> LineItems { get; }
}

It contains pretty much the same data as IOrderInfoDto plus the remaining order entity data and also a collection of child line items.  Those line item objects are of type ILineItemDto:

public interface ILineItemDto
{
int Id { get; set; }
int OrderId { get; set; }
string ProductName { get; set; }
decimal Price { get; set; }
int Quantity { get; set; }
byte[] Timestamp { get; set; }
}

By the way, both the IOrderDto and ILineItemDto interfaces contain a Timestamp property which is used to manage data concurrency.  This timestamp has to be persisted into the business object itself.  I'll show how this is done in the sample code, but Rocky also does it in his ProjectTracker sample application.

Calling the Repository from the Business Object

I think to really see how this implementation of the Repository Pattern works, we should take a look at how a business object would call these data access objects.  Let's examine the simplest case of a root-level collection of read-only order business objects (we'll name it OrderInfoCollection) performing its data access while fetching the collection.

private void DataPortal_Fetch()
{
RaiseListChangedEvents = false;
using (var context = EnsureDependency(_repository).CreateContext(false))
{
IsReadOnly = false;
foreach (var dto in context.FetchInfoList())
{
var child = DataPortal.FetchChild<OrderInfo>(dto);
this.Add(child);
}
IsReadOnly = true;
}
RaiseListChangedEvents = true;
}

The code follows the standard pattern for the implementation of the DataPortal_Fetch method of a business object that inherits from ReadOnlyListBase<T,C> where we turn off list-changed events and make the collection temporarily writable while we load it.  But instead of opening up an ADO.NET database connection or creating a LINQ to SQL data context, we talk to an abstract repository instance instead and call CreateContext which returns us a IOrderContext instance.  All context objects implement IDisposable so we wrap it in a using statement which guarantees things like database connections and transactions will be closed up at the end.

Within our using block is where we make all the data access method calls against the context.  In this case, we're calling FetchInfoList, enumerating the IOrderInfoDto objects returned, and using each to create an associated child OrderInfo business object for the collection.

Dependency Injection

You may be asking: where does the instance of the _repository field get created and what's this EnsureDependency method?  This goes back to how I implement dependency injection with CSLA.NET which I detail in the previous post.  Here's the definition of that field and the method that Unity (my dependency-injection/IOC Container framework of choice) uses to inject it:

[NonSerialized]
[NotUndoable]
private IOrderRepository _repository;

[InjectionMethod]
public void Inject(IOrderRepository repository)
{
if (repository == null)
throw new ArgumentNullException("repository");
_repository = repository;
}

This is the magic that allows us to easily mock out the data access layer at test-time and use a real concrete implementation at run-time.  Showing how you would mock these objects is a bit more than we have time for now.  However, I will provide unit tests that show it in the downloadable sample code at the end of part 2.

Transactional Data Access Code

So we showed the simple example of how a read-only collection would call the data access layer to query the database and populate itself.  Let's also show an example of some data access code in a business object that's transactional. 

Let's jump over to a full BusinessBase<T> business object for an order (we'll call it Order) and take a look at what the DataPortal_Insert method implementation might look like for inserting an order:

protected override void DataPortal_Insert()
{
using (var context = EnsureDependency(_repository).CreateContext(true))
{
var dto = EnsureDependency(_repository).CreateOrderDto();
dto.Customer = ReadProperty<string>(CustomerProperty);
dto.Date = ReadProperty<DateTime>(DateProperty);
dto.ShippingCost = ReadProperty<decimal>(ShippingCostProperty);
context.InsertOrder(dto);
LoadProperty<int>(IdProperty, dto.Id);
_timestamp = dto.Timestamp;
DataPortal.UpdateChild(ReadProperty<LineItemCollection>(LineItemsProperty), this, context);
context.CompleteTransaction();
}
}

A little more complicated, but the pattern is very similar.  We create a context (this time specifying that it's transactional) and we perform our data access operations against it.  In this case we to first create an empty IOrderDto so we can populate it with the business object state and pass it into the InsertOrder data access method. When that's done we load the ID and timestamp of the newly inserted order entity back into the business object.  Finally, we cascade the update call down to any child line item business objects.

Conclusion

So this is all great and fun and with just the code we've written so far we could get our business layer up and running with passing unit tests.  But with all of that we still haven't actually hit a real database!  We need to code up a concrete implementation of our repository interfaces to get there.  I'm going to save that for the next post and with that post will come the promised full set of sample code.

10 comments:

mamboer said...

Nice post!
I'm running a CslaRepository project on google code,also applying Repo pattern(Currently supports Nhibernate),have a look and give me some advices.
http://code.google.com/p/cslarepo/

Nermin Dibek said...

Peter, I must say an excellent post! I have been playing with the same idea for a while, together with Frank Mao.

You really need to take a look at his post. He combines StructureMap and Repository pattern in a trully simple solution that only takes few lines of code:
http://maonet.wordpress.com/2008/08/08/using-structuremap-to-mock-dataaccess-in-csla-bo/

Pete said...

Thanks! I'll take a look.
~pete

Sean said...

Peter,

I am truly grateful for such an excellent example of key features of .NET development. From dependency injection, CSLA.NET, MVP, Unity. I refer to this example constantly.

Thank you very much.

Regards,
Sean.

under_the_hood said...

I was looking for a good article which can explain Repository pattern.I think this is the ONE I was looking for. Good job.

Richard Collette said...

First, thanks for posting an informative article. I do have a few questions though...

You mention security as one reason for doing this, but isn't the design of CSLA.NET to allow your application to be simply deployed in an N-Tier scenario and thus the data access logic would not reside on the client?

You are using DI for some things but then revert back to using a factory pattern in the repository for creating the DTOs. Shouldn't the DTO's be created using DI as well?

I believe in the second article you mentioned that using CSLA.NET object factory required writing a bunch of factories that would not be required when using DI but I am not seeing how using the repository pattern is any better. You are still writing a bunch of factories.

Pete said...

Richard,

Thank you for your thoughtful comments. My apologies for taking so long to reply.

Regarding your question about security. If you design your business objects in the classic CSLA.NET fashion (i.e. not using a Repository pattern like this one or Object Factory), then your business object classes will contain both the business logic AND the data access logic - the data access logic will reside in the DataPortal_xyz methods. And since these classes are deployed to all tiers, that means your data access logic will be deployed to all tiers as well. CSLA.NET uses a concept called "mobile objects" (which I talk about a bit in the previous post), where your BO's move from one tier to the next and even behave differently on each tier. However, the code in those classes is still the same and must be deployed to each tier. Repository (as well as Object Factory) essentially pulls the raw data access code out of the business object classes into a separate layer, which can be configured to only be deployed on the tiers that really need it (ex: an application server that has direct access to the database).

Regarding your question about the DTO's being created using the factory pattern. Yes, the DTO's are being created by a factory (what I call the "Repository") but that factory is itself being injected into the CSLA.NET business object via DI. So those two patterns are being used together. In essence, everything that the business object needs to access the DB, whether it's creating DTO's to send data or calling the DB itself, is given to it via Dependency Injection, which is important for proper isolation of unit tests.

Finally, regarding your comment about the number of factories necessary for Repository/DI vs. Object Factory. You do have a good point here. Both approaches require you to create factories. However, Repository could result in less. In Object Factory you essentially need to create one "object factory" class for each root-level CSLA.NET class. With Repository, you create a Repository (which is the factory) for each database entity. If your CSLA.NET class structure is more granular or complex than the database structure (often times they are), then you could end up with less Repository factories than root-level business object classes since many of the classes would be sharing some of the Repositories.

Thanks!
~pete

Simon said...

Really enjoyed your series of posts, looking forward to the next one. Nice job!

Also read : Using StructureMap to mock DataAccess in CSLA BO
http://maonet.wordpress.com/2008/08/08/using-structuremap-to-mock-dataaccess-in-csla-bo/

I'm currently testing Csla.net 4.0 but I'd also like to stay with ADO.Net. I also would like to return DataSets because it's just easier for reporting so I'll put a thin layer over clsa to do this.

Buddy James said...

Great post! I've been struggling with DI/Ioc and abstracting the data layer when implementing CSLA. Thanks a lot for taking the time to write this!

Buddy James
feel free to check out my .NET development blog www.refactorthis.net

Buddy James said...

Great post! I've been struggling with DI/Ioc and abstracting the data layer when implementing CSLA. Thanks a lot for taking the time to write this!

Buddy James
feel free to check out my .NET development blog www.refactorthis.net