Sunday, September 27, 2009

4 Equals, Reference Type, Value Type

The very fundamental design in .net clr is that type system is classified into two type, reference type and value type. This design decision has profound implication on the .net. One examples is to test the equality between objects.

Basically we have two kinds of comparison, identity comparison(whether two object has the same identity), semantic comparison(whether two object means the same thing, most people use value equality comparison, I use "semantic" because value of reference type is a reference, event the values of reference typed variable are the sames, it is possible that they mean the same thing in semantics). Since we have the two different type, this makes things complicated. For example, can we compare the "value" of reference type, or can we compare the reference of value type. If there had been only reference type, if there had been no value type, the .net world will be simpler. Why we need two types? This is a deep question, lots of this topics has been covered in a book "CLR via C#". Basically, this a consideration of memory efficiency and performance. What we need to know is that the value of reference type is reference, the value of value type is value.

Reference type identity comparison

To do identity comparison for reference type, we should call Object.ReferenceEquals(objA, objB), or you can use shortcurt operator "==" like "objA == objB". The following source code shows that ReferenceEquals and == operator is the same.

public class Object { [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] public static bool ReferenceEquals (Object objA, Object objB) { return objA == objB; } }

If they are the same, why we still need ReferenceEquals. It turns "==" means different things for different type value type. What exactly "==" does? For all reference type and all primitive value type, like int, double, enum, it become "ceq" instruction after it is compiled msil. What does "ceq" do? It is clr implementation question, I guess it compare identity equal for reference type and compare value equal for primitive value type. But it means "==" operator for custom value type like struct, which has not default implementation.

Reference type semantic comparison

The default semantic comparison of reference type is identity comparison, because the value of reference type variable is a reference. The default implementation is as follow.

// Returns a boolean indicating if the passed in object obj is // Equal to this. Equality is defined as object equality for reference // types and bitwise equality for value types using a loader trick to // replace Equals with EqualsValue for value types). // public virtual bool Equals(Object obj) { return InternalEquals(this, obj); } [MethodImplAttribute(MethodImplOptions.InternalCall)] internal static extern bool InternalEquals(Object objA, Object objB);

According the comments, for reference type object, InternalEquals just compare the reference, it does not compare referenced content. The following code shows this behavior.

static void Main(string[] args) { Customer c1 = new Customer { Name = "fred" }; Customer c2 = new Customer { Name = "fred" }; Customer c3 = c1; Console.WriteLine(object.ReferenceEquals(c1, c2)); //False Console.WriteLine(object.ReferenceEquals(c1, c3)); //True Console.WriteLine(c1 == c2); //False Console.WriteLine(c1 == c3); //True Console.WriteLine(c1.Equals(c2)); //False, event the reference content is same Console.WriteLine(c1.Equals(c3)); //True } public class Customer { public string Name { get; set; } }

But sometimes, we want to change this semantics. In our case, we can say if the name of customer is the same, regardless their identity. So we can override the instance Equals method like the following.

public class Customer { public string Name { get; set; } public override bool Equals(object obj) { var c = obj as Customer; if (c == null) { return false; } else { return this.Name == c.Name; } } }

Value type identity comparison

Can you compare identity of value type variable. "Yes". Should you compare identity of value types variable. "No". The result will always return "False", because object put in different boxes before comparison.

Console.WriteLine(object.ReferenceEquals(1, 1)); // False

Value type semantic comparison

Although you can use "==" operator with primitive value type like System.Int32, but you can not use it with custom value type such as struct before you implement the operator by your self. But you can use object type's instance Equals to do semantic comparison, which use reflection to check content equality like below.

public abstract class ValueType { public override bool Equals (Object obj) { BCLDebug.Perf(false, "ValueType::Equals is not fast. "+this.GetType().FullName+" should override Equals(Object)"); if (null==obj) { return false; } RuntimeType thisType = (RuntimeType)this.GetType(); RuntimeType thatType = (RuntimeType)obj.GetType(); if (thatType!=thisType) { return false; } Object thisObj = (Object)this; Object thisResult, thatResult; // if there are no GC references in this object we can avoid reflection // and do a fast memcmp if (CanCompareBits(this)) return FastEqualsCheck(thisObj, obj); FieldInfo[] thisFields = thisType.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); for (int i=0; i<thisFields.Length; i++) { thisResult = ((RtFieldInfo)thisFields[i]).InternalGetValue(thisObj,false); thatResult = ((RtFieldInfo)thisFields[i]).InternalGetValue(obj, false); if (thisResult == null) { if (thatResult != null) return false; } else if (!thisResult.Equals(thatResult)) { return false; } } return true; } }

So we should always override instance Equals() for your custom value type struct to improve performance.

Comparing objects of unknown type

If we don't know the types of two object, the best bet is to use static method object.Equals(objA, objB). This method check if the identity equal first, then check semantic equality, this if This method is as follow.

public static bool Equals(Object objA, Object objB) { if (objA==objB) { return true; } if (objA==null || objB==null) { return false; } return objA.Equals(objB); }

To wrap it, what does this means to me? We can follow the following pseudo code

if (we compare two object of the same type) { if (type is reference type) { if (we want semantic compare && we have override the objA.Eqauls method) { objA.Equals(B); } else //we just want to identity compare { always use "objA == objB"; but object.ReferneceEqual(objA, objB) and objA.Eqauls(objB) do the same thing in this case } } else //type is value type { if (we want identity compare) { forget about it, although we can call object.ReferenceEqual(objA, objB) it will always return false because of boxing } else //we should always use semantic compare { if (type is primitive value type like int) { x == y // it is compiled to ceq il instruction } else { if (you have implment the == operator for this type) { use objA == objB } else { use objA.Equels(objB) //if you want more efficent comparison override instece Equals method } } } } } else //we compare two object of unknown type { Object.Equals(objA, objB); }

For reference type, "==" is enough for a situation, unless you want to change the default semantics comparison. For primitive value type, "==" is enough for most situations. For struct, you are encourage to override default semantics comparison obj.Equals() for performance, although not mandatory, and use obj.Equals for comparison.

Saturday, September 26, 2009

IEnumberable, IQueryable , Lambda expression - part2

I have seen such following piece of code written by a developer from a client.

interface IContactRepository { IEnumberable<Contact> GetSomeContacts(); } class ContactRepository : IContactRepository { public IEnumerable<Contact> GetSomeContacts() { //query is linq to sql query object IQueryable<Contact> query = ... return query; } }

Is it a better choice to using IEnumerable<T> instead of IQueryable<T>. I guess his concerns is that, if the interface is too specific, first this may give client more functionality than is required, second this may limit the server's choice of implementation. In lots case, this concern is right, we should give client the only functionality which client needs, nothing less and nothing more, and server should has more freedom to implement.

interface IPerson { void Eat(); void Sleep(); } interface ISales : IPerson { void Sell(); } interface ITeacher : IPerson { void Teache(); } class Service { //Unappropriate // public ISales GetPerson() // { // return ... // } //better public IPerson GetPerson() { return ... } }

Firstly, if the method return a ISales, First client will have one extra unnecessary method Sell. Secondly If the client only needs a IPerson, and the contract says client needs a IWorker, this will limit server's ability to serve the client, for example, server can not return a ITeacher.

Is this design guideline also applicable to the case of IContactRepository.

public interface IQueryable<T> : IEnumerable<T>, IQueryable, IEnumerable {} public interface IQueryable : IEnumerable { Type ElementType { get; } Expression Expression { get; } IQueryProvider Provider { get; } }

First the the IQuerable<T> interface does give user more functionality than the IEnunumerable<T>, but these members are read only, and client can not use them directly for query. Because the query functionality comes from the static method in Enumerable and Queryable, but not the IQuerable<T>, and IEnumeralbe<T>, from the client's perspective, Two interfaces works identically. Secondly, the interface does limit limit server's implementation choice, because server cannot return a IEnumberable<T> . Initially, I thought I can implement easily a empty IQueryable<T> that wrap a IEnumberable<T>. It turns out to be even easier. Because the Enumerable already implement an static method AsQueryable() for you, the Linq team in Microsoft already expect this is a common use case. So all you need to do is call the method can you IEnumberable&lgt;T> will become IQueryable<T>. like the following.

int[] intEnumerable = { 1, 2, 3 , 5}; IQueryable intQuery = intEnumerable.AsQueryable().Where( number => number > 2); foreach (var item in intQuery) { Console.WriteLine(item); } Console.WriteLine(intQuery.GetType().ToString()); //System.Linq.EnumerableQuery`1[System.Int32] //code decompiled by reflector ParameterExpression CS$0$0000; IQueryable intQuery = new int[] { 1, 2, 3, 5 }.AsQueryable<int>().Where<int>(Expression.Lambda<Func<int, bool>>(Expression.GreaterThan(CS$0$0000 = Expression.Parameter(typeof(int), "number"), Expression.Constant(2, typeof(int))), new ParameterExpression[] { CS$0$0000 })); foreach (object item in intQuery) { Console.WriteLine(item); } Console.WriteLine(intQuery.GetType().ToString());

So a it seems be a better to replace IEnumberable<T> with IQueryable<T>. As for as the interface concerns, the replacement does not give client any exactly same query experience and it is more difficult to implement. A great benefit of this replacement is the performance, using IEnumberable<T> will be much slower than IQuerable<T>. Consider the following code, the Where method for IQueryable<T> will treat the lambda expression as expression tree and query will be executed at server side which is much faster, while the IEnumerable<T> will treat the lambda expression as delegate and query will be executed at client side, which will be slower. Consider the following code.

var thisContact = contaceRepository. GetSomeContacts().Where( ctc => ctc.Id = 1).First();

Linq provide us a new way to design our domain model. In the post Extending the World, author says

Typically for a given problem, a programmer is accustomed to building up a solution until it finally meets the requirements. Now, it is possible to extend the world to meet the solution instead of solely just building up until we get to it. That library doesn't provide what you need, just extend the library to meet your needs.

It is very important to build extensible domain model by taking the advantage of IQueryable<T> interface. Using IEnumberable<T> only will hit the performance very seriously. The only pitfall to user IQueryable<T> is that user may send unnecessary complex to the server, but this can be resolved by designing the method so that only appropriate IQueryable<T> is returned, for example return GetSome instead of GetAll. Another solution is adding a view model which return a IEnumberable<T>

IEnumberable, IQueryable , Lambda expression - part1

When we type the following code

IEnumerable<int> intEnumerable = null; var q1 = intEnumerable.Where( x => x > 10);

we know that Where method is not part of the IEnumberable<T> interface or IEnumberable interface, it comes from extension method of Enumerable, which is static class and it has no inheritance relation With IEnumerable or IEnumberable<T>. The power of Linq-To-Object does not come from IEnumberable or IEnumberable or its implemenation, it comes from the extension method. Let's take a look what does the extension method do? Using Reflector we get the following source code.

public static class Enumerable { public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) { if (source == null) { throw Error.ArgumentNull("source"); } if (predicate == null) { throw Error.ArgumentNull("predicate"); } if (source is Iterator<TSource>) { return ((Iterator<TSource>) source).Where(predicate); } if (source is TSource[]) { return new WhereArrayIterator<TSource>((TSource[]) source, predicate); } if (source is List<TSource>) { return new WhereListIterator<TSource>((List<TSource>) source, predicate); } return new WhereEnumerableIterator<TSource>(source, predicate); } }

We can see there , the delegate passed in to the method is the code that does the filtering.

IQueryable inherit from IEnumerable. But what extra value does the IQueryable bring. Let's take a look of the following code. and the code it generated by c# compiler.

public interface IQueryable<T> : IEnumerable<T>, IQueryable, IEnumerable {} public interface IQueryable : IEnumerable { Type ElementType { get; } Expression Expression { get; } IQueryProvider Provider { get; } }

It does not tell too much? Let's move on an querable example and decomplie to see what it does.

IQueryable<int> intQuerable = null; var q2 = intQuerable.Where(x => x > 10); // decomplied by reflector ParameterExpression CS$0$0000; IQueryable<int> q2 = intQuerable.Where<int>(Expression.Lambda<Func<int, bool>>(Expression.GreaterThan(CS$0$0000 = Expression.Parameter(typeof(int), "x"), Expression.Constant(10, typeof(int))), new ParameterExpression[] { CS$0$0000 }));

From this example, we can see that the Lamda Expression is not converted to a delegate, but to an expression tree. But why the extension method Enumerable.Where(IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) is not used? It turns out that, the c# compiler pick a more a suitable extension from Queryable. Here is the code from Reflector.

public static IQueryable<TSource> Where<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate) { if (source == null) { throw Error.ArgumentNull("source"); } if (predicate == null) { throw Error.ArgumentNull("predicate"); } return source.Provider.CreateQuery<TSource>(Expression.Call(null, ((MethodInfo) MethodBase.GetCurrentMethod()).MakeGenericMethod(new Type[] { typeof(TSource) }), new Expression[] { source.Expression, Expression.Quote(predicate) })); }

Unlike the Enumerable.Where methhod, this method does not have a delegate to do the filtering. And also the expression can not do the filtering either, it is the IQuerable.Provider which does the filtering. The provider takes the expression tree and does filtering later by converting expression tree to provider specific algorithm like TSQL.

IEumberable<T> is very easy to implement, in fact all the collection they are IEnumberable<T*gt;. Iterator makes it even easier. So there is not such thing as implementing a IEnumberableProvider, because the delegate does the query. But to implement IQueryable is more difficult, because expression does not query. It is IQueryProvider does the job. You need to implement IQuerableProvider

public interface IQueryProvider { IQueryable CreateQuery(Expression expression); IQueryable<TElement> CreateQuery<TElement>(Expression expression); object Execute(Expression expression); TResult Execute<TResult>(Expression expression); }