Sunday, September 27, 2009

4 Equals, Reference Type, Value Type

The very fundamental design in .net clr is that type system is classified into two type, reference type and value type. This design decision has profound implication on the .net. One examples is to test the equality between objects.

Basically we have two kinds of comparison, identity comparison(whether two object has the same identity), semantic comparison(whether two object means the same thing, most people use value equality comparison, I use "semantic" because value of reference type is a reference, event the values of reference typed variable are the sames, it is possible that they mean the same thing in semantics). Since we have the two different type, this makes things complicated. For example, can we compare the "value" of reference type, or can we compare the reference of value type. If there had been only reference type, if there had been no value type, the .net world will be simpler. Why we need two types? This is a deep question, lots of this topics has been covered in a book "CLR via C#". Basically, this a consideration of memory efficiency and performance. What we need to know is that the value of reference type is reference, the value of value type is value.

Reference type identity comparison

To do identity comparison for reference type, we should call Object.ReferenceEquals(objA, objB), or you can use shortcurt operator "==" like "objA == objB". The following source code shows that ReferenceEquals and == operator is the same.

public class Object { [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] public static bool ReferenceEquals (Object objA, Object objB) { return objA == objB; } }

If they are the same, why we still need ReferenceEquals. It turns "==" means different things for different type value type. What exactly "==" does? For all reference type and all primitive value type, like int, double, enum, it become "ceq" instruction after it is compiled msil. What does "ceq" do? It is clr implementation question, I guess it compare identity equal for reference type and compare value equal for primitive value type. But it means "==" operator for custom value type like struct, which has not default implementation.

Reference type semantic comparison

The default semantic comparison of reference type is identity comparison, because the value of reference type variable is a reference. The default implementation is as follow.

// Returns a boolean indicating if the passed in object obj is // Equal to this. Equality is defined as object equality for reference // types and bitwise equality for value types using a loader trick to // replace Equals with EqualsValue for value types). // public virtual bool Equals(Object obj) { return InternalEquals(this, obj); } [MethodImplAttribute(MethodImplOptions.InternalCall)] internal static extern bool InternalEquals(Object objA, Object objB);

According the comments, for reference type object, InternalEquals just compare the reference, it does not compare referenced content. The following code shows this behavior.

static void Main(string[] args) { Customer c1 = new Customer { Name = "fred" }; Customer c2 = new Customer { Name = "fred" }; Customer c3 = c1; Console.WriteLine(object.ReferenceEquals(c1, c2)); //False Console.WriteLine(object.ReferenceEquals(c1, c3)); //True Console.WriteLine(c1 == c2); //False Console.WriteLine(c1 == c3); //True Console.WriteLine(c1.Equals(c2)); //False, event the reference content is same Console.WriteLine(c1.Equals(c3)); //True } public class Customer { public string Name { get; set; } }

But sometimes, we want to change this semantics. In our case, we can say if the name of customer is the same, regardless their identity. So we can override the instance Equals method like the following.

public class Customer { public string Name { get; set; } public override bool Equals(object obj) { var c = obj as Customer; if (c == null) { return false; } else { return this.Name == c.Name; } } }

Value type identity comparison

Can you compare identity of value type variable. "Yes". Should you compare identity of value types variable. "No". The result will always return "False", because object put in different boxes before comparison.

Console.WriteLine(object.ReferenceEquals(1, 1)); // False

Value type semantic comparison

Although you can use "==" operator with primitive value type like System.Int32, but you can not use it with custom value type such as struct before you implement the operator by your self. But you can use object type's instance Equals to do semantic comparison, which use reflection to check content equality like below.

public abstract class ValueType { public override bool Equals (Object obj) { BCLDebug.Perf(false, "ValueType::Equals is not fast. "+this.GetType().FullName+" should override Equals(Object)"); if (null==obj) { return false; } RuntimeType thisType = (RuntimeType)this.GetType(); RuntimeType thatType = (RuntimeType)obj.GetType(); if (thatType!=thisType) { return false; } Object thisObj = (Object)this; Object thisResult, thatResult; // if there are no GC references in this object we can avoid reflection // and do a fast memcmp if (CanCompareBits(this)) return FastEqualsCheck(thisObj, obj); FieldInfo[] thisFields = thisType.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); for (int i=0; i<thisFields.Length; i++) { thisResult = ((RtFieldInfo)thisFields[i]).InternalGetValue(thisObj,false); thatResult = ((RtFieldInfo)thisFields[i]).InternalGetValue(obj, false); if (thisResult == null) { if (thatResult != null) return false; } else if (!thisResult.Equals(thatResult)) { return false; } } return true; } }

So we should always override instance Equals() for your custom value type struct to improve performance.

Comparing objects of unknown type

If we don't know the types of two object, the best bet is to use static method object.Equals(objA, objB). This method check if the identity equal first, then check semantic equality, this if This method is as follow.

public static bool Equals(Object objA, Object objB) { if (objA==objB) { return true; } if (objA==null || objB==null) { return false; } return objA.Equals(objB); }

To wrap it, what does this means to me? We can follow the following pseudo code

if (we compare two object of the same type) { if (type is reference type) { if (we want semantic compare && we have override the objA.Eqauls method) { objA.Equals(B); } else //we just want to identity compare { always use "objA == objB"; but object.ReferneceEqual(objA, objB) and objA.Eqauls(objB) do the same thing in this case } } else //type is value type { if (we want identity compare) { forget about it, although we can call object.ReferenceEqual(objA, objB) it will always return false because of boxing } else //we should always use semantic compare { if (type is primitive value type like int) { x == y // it is compiled to ceq il instruction } else { if (you have implment the == operator for this type) { use objA == objB } else { use objA.Equels(objB) //if you want more efficent comparison override instece Equals method } } } } } else //we compare two object of unknown type { Object.Equals(objA, objB); }

For reference type, "==" is enough for a situation, unless you want to change the default semantics comparison. For primitive value type, "==" is enough for most situations. For struct, you are encourage to override default semantics comparison obj.Equals() for performance, although not mandatory, and use obj.Equals for comparison.

No comments:

Post a Comment