Previous month:
January 2010
Next month:
March 2010

February 2010

LINQ with .NET 4 – Zip

.NET 4.0 includes the new extension method Zip with the Enumerable, ParallelEnumerable, and Queryable classes. Zip allows to merge two sequences. Other LINQ operators that merge two sequences are Union, Intersect, Join, and GroupJoin. These operators have been available since .NET 3.5.

Read more about the LINQ operators to merge two sequences and how the new Zip operator can be used.

Union produces a union set of two sequences. With Union the element types of the two sequences must be the same. The sequence types can be different, they just need to implement IEnumerable<TSource>. Union returns a merged sequence of the same element type where all duplicates are removed. The elements of the two sequences can be compared by passing an IEqualityComparer<TSource>.

public static IEnumerable<TSource> Union<TSource>(
    this IEnumerable<TSource> first,
    IEnumerable<TSource> second,
    IEqualityComparer<TSource> comparer)

Intersect requires - similar to Union - the same element types but returns a sequence of a set intersection. Intersect first creates a distinct list of the first sequence, and then enumerates the second sequence to mark the elements that appear in both sequences.

public static IEnumerable<TSource> Intersect<TSource>(
    this IEnumerable<TSource> first,
    IEnumerable<TSource> second,
    IEqualityComparer<TSource> comparer)

Join combines two sequences where the elements can be of a different type. For every outer element Join enumerates the inner elements. With matching keys the result selector gets its job to return a result based on the inner and outer elements. The keys that are used for matching the elements are defined by key selectors. The key selectors allow that the key of the element itself can be of a different type (e.g. one sequence defines a string, and the other sequence defines an integer as the key), but the type returned by the key selectors must be the same. The result selector receives the matching elements and returns the result. The result can be different type as well.

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, TInner, TResult> resultSelector,
    IEqualityComparer<TKey> comparer)

GroupJoin is similar to Join to combine two sequences, it just allows for additional grouping as can be seen with the result selector that has the IEnumerable<TInner> argument. Thus the result selector receives a list of matching inner elements with every outer element.

public static IEnumerable<TResult> GroupJoin<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, IEnumerable<TInner>, TResult> resultSelector,
    IEqualityComparer<TKey> comparer)

Now, what about Zip? Zip that is available with .NET 4 and can be compared to the Join operator. Instead of the keys that are used Zip just uses the order of the elements to combine elements from two sequences. Zip creates one element from two elements taken from two input sequences. How the elements are merged depends on a result selector function, and the elements can be of different types. This makes the declaration of the Zip method simpler.

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
    this IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> resultSelector)

With Zip it’s for example possible to sum two the elements of two integer sequences by passing the Lambda expression (first, second) => first + second to the result selector. This creates the result values 8, 15, 22, 26 in the following code snippet.

int[] one = { 3, 7, 11, 14 };
int[] two = { 5, 8, 11, 12 };

var result1 = one.Zip(two, (first, second) => first + second);
foreach (var x in add)
{
    Console.WriteLine(x);
}

Of course it’s possible to do anything in the selector method. The next code snippet returns the lower value of the two collections – 3, 7, 11, 12.

var result2 = one.Zip(two, (first, second) => first < second ? first : second);
foreach (var x in result2)
{
    Console.WriteLine(x);
}

You can also combine sequences of different types and return another type. That all just depends on the selector method. The two sequences can also be of different length. In that case only the number of elements of the smaller sequence are combined.

string[] cars = { "Ferrari", "Williams", "McLaren", "Lotus", "Renault", "Brawn", "Mercedes" };
int[] titles = { 16, 9, 8, 7, 2, 1 };
var result3 = cars.Zip(titles, (car, title) => Tuple.Create(car, title));
foreach (var item in result3)
{
    Console.WriteLine("{0} {1}", item.Item1, item.Item2);
}

In .NET 3.5 the Zip extension method can be implemented with a simple while loop to iterate through the enumerators of both collections:

static class SequenceExtension
{
    public static IEnumerable<TResult> Zip<T1, T2, TResult>(
            this IEnumerable<T1> source1,
           
IEnumerable<T2> source2, Func<T1, T2, TResult> func)
    {
        using (var iter1 = source1.GetEnumerator())
        using (var iter2 = source2.GetEnumerator())
        {
            while (iter1.MoveNext() && iter2.MoveNext())
            {
                yield return func(iter1.Current, iter2.Current);
            } 
        }
    }
}

More information on LINQ in my new upcoming book Professional C# 4 with .NET 4. and in my C# and ADO.NET workshops.

Christian