The Repository Pattern
[UPDATE: 01/30/2014] I have been informed by a friend that common terms for these two types of repositories are ‘generic’ and ‘specific’. As a result, I have updated the post to use this terminology instead of what I’d chosen (monolithic and disparate).
Having finally found myself being introduced to the finer points of computer science at my new job, I am enjoying the foray into a touch of philosophical programming discussion. This will be (hopefully) the first in a haphazard series of posts outlining my own thoughts on those discussions. Full disclaimer, I am still pretty new to correct terminology and thus likely to fumble with their use…I’m not sorry, but I also don’t mind being corrected.
Repository (As I Learned)
When I first learned of the Repository pattern, I was shown an approach to encapsulating access to data entities that simply removed the business layer from having to touch queries. You provided an object that wrapped around your entity and thus made it so there was only one place to go when it came to add/edit/delete methods used by the rest of your code. A simplified example of this might look like this:
It’s simple, and it tends to significantly clean up your business logic if you’re the kind of person who just tosses their queries hither and tither.
One coworker seems to have coined the term ‘monolithic repository’ when trying to describe this approach, and I’ll be using the word ‘monolithic’ moving forward. Apologies to anyone who thinks they came up with it first. The common way to refer to this approach seems to be calling it a ‘specific repository’, so we will be referring to this hereafter as ‘specific’.
IRepository and IUnitOfWork
At work we are beginning to implement what I have been told is a “pure” version of the repository pattern. As it was explained to me, there are really two patterns, the first being the Repository and the second being the UnitOfWork. To simplify what I was told, you would say that a Repository provides access to a set of data and UnitOfWork writes any changes to the set of data to their source.
In order to accomplish this we’re using a bit of dependency injection (DI) along with two interfaces:
We also have a Repository<T> class and IUnitOfWork<T> class which are used by the DI to satisfy the requirements a class may have. The implementations can be found pretty easily with a web search, so I hope you don’t mind me skipping them for now.
For the sake of this post, I will be referring to this approach as ‘disparate’. The common way of referring to this approach is to use the terminology ‘generic repository’, so we’ll use that from here on out.
Bridging the Gap
The specific approach provides repository objects which are fairly customized for entity interactions. Initially the generic approach does not, but there is a fairly easy way to overcome that in C# using extension methods:
Since we are using DI in our system, we know that we will always be initially referencing the repository dependencies by their interface, IRepository<T>. With this in mind, we can easily create an extension method for all sub-classes of the interface giving them access to the FindUsersByEmail() method, just as we had implemented with the specific approach.
The specific approach can easily be rearranged to reflect the basic UnitOfWork separation from the Find/Add/Remove methods, so I won’t waste the space showing the simple movement of the Db.SaveChanges() call into its own method.
Fight! (And Conclusion)
Ok so there isn’t really a fight here, at least not from my perspective. Both approaches afford the ability to swap out sub-classes and instead use mock data storage (probably through a List<T> or similar) for building unit tests. Both approaches encourage users to keep their business logic devoid of direct queries to the data layer (assuming you use something similar to the extension methods above to provide custom entity-specific methods for the generic repositories). It almost feels as if the choice between the two comes down to a personal preference, since with either I can easily implement the behavior of its counterpart.
Still, the times that I have discussed this with my coworkers I got the sense that there was a natural desire to side one way or the other. Purists seem to prefer generic while pragmatists mostly side with specific.
One of the reasons I wanted to write this post is to see if anyone could show reasons one approach is truly better than the other. Do they perform differently to an extent worth citing? Are they easier to implement than the other? Do they save significant effort? I certainly can’t say. My only thought is that it would be much more difficult to adapt the generic approach to a data access system which doesn’t rely on in-memory storage of change sets (perhaps direct SQL).
Regardless of the result of any argument, it’s at least been fun grilling coworkers. Here’s hoping this is just the beginning of the study of the ‘finer points’ in programming.