Previous month:
July 2014
Next month:
November 2016

April 2016

Entity Framework: Prevent redundant JOINs - watch your LINQ !

Fetching one record from a collection using navigational properties in Entity Framework may lead to unnecessary JOINs. To show the problem we need two tables Products and Prices.

EF Blog - Redundant Joins - DB Schema

The query shown below is fetching products along with their first price.

var products = ctx.Products
      .Select(p => new
      {
          p.Name,
          FirstPriceStartdate = p.Prices.OrderBy(price => price.Startdate).FirstOrDefault().Startdate,
          FirstPriceValue = p.Prices.OrderBy(price => price.Startdate).FirstOrDefault().Value,
      })
      .ToList();

Looks simple.
Lets look at the SQL statement or rather the execution plan.

EF Blog - Redundant Joins - Before Subselect

The table Prices is JOINed twice because of the two occurrences of the expression "p.Prices.OrderBy(...).FirstOrDefault()". The Entity Framework doesn't recognize that these expressions are identical but we can help. Just use a sub-select.

var products = ctx.Products
       .Select(p => new
       {
           Product = p,
           FirstPrice = p.Prices.OrderBy(price => price.Startdate).FirstOrDefault()
       })
      .Select(p => new
      {
          p.Product.Name,
          FirstPriceStartdate = p.FirstPrice.Startdate,
          FirstPriceValue = p.FirstPrice.Value,
      })
      .ToList();

That's it, the table Prices is JOINed only once now.

EF Blog - Redundant Joins - After Subselect

Having a complex query you may need multiple sub-select to select a navigational property of another navigational property. But in this case please write an email to your colleagues or a comment so the other developers understand what's going on otherwise your funny looking query will be refactored pretty soon :)  


Entity Framework: High performance querying trick using SqlBulkCopy and temp tables

Implementing database access with Entity Framework is pretty convenient, but sometimes the query performance can be very poor. Especially using navigational properties to load collections leads to significantly longer execution times and more I/O. To see the impact of the loading of a collection we have to take a look into profiling tools like SQL Server Profiler.

Let’s look at the following use case which was extrapolated from a customer project. We have three tables Products, Suppliers and Prices containing an entire price history.

Blog - EF - Using SqlBulkCopy and temp tables - DB

We want to select all products with their suppliers and future prices according to a filter criteria. The easiest approach is to use the navigational properties.

using(var ctx = new Entities())
{
    var products = ctx.Products
        .Where(p => p.Name.Contains(“chocolate”))
        .Select(p => new FoundProduct()
        {
            Id = p.Id,
            Name = p.Name,
            FuturePrices = p.Prices
                .Where(price => price.Startdate > DateTime.Today),
            Suppliers = p.Suppliers
                .Where(s => s.IsDeliverable)
    })
    .ToList();
}

For the simple looking query, depending on the complexity of the data and the amount of data in the database, the execution can take a while. There are multiple reasons the database won’t like this query. The Entity Framework has to make huge JOINs, concatenations and sorting to fetch the products, prices and suppliers at once thus the result set is much bigger than fetching the collections separately. Furthermore, it is more difficult to find optimal indexes because of the JOINs, the confusing execution plan and suboptimal SQL statements Entity Framework has to generate to fulfill our demands.

If you have been using EF you may be wondering why you didn't have this problem before. The answer is you didn't notice it because the tables or the result set have been small etc. Just assume an unoptimized query takes 200 ms, an optimized query 20 ms. Although one query is 10 times faster than the other both response times are considered 'fast' - and this often leads to assumptions that the query is perfect. Though, in reality the database needs much more resources to perform the unoptimized query. But that doesn't mean we have to change all our EF queries using navigational properties, be selective. Use profiling tools to decide what query should be tuned and what not.

Let's look at the execution plan for the query from above to get an idea what operator consumes the resources the most. Half of the resources are needed for sorting the data although we don't have any order-by clause in our query! The problem is that the data must have special sort order so the Entity Framework is able to process (materialize) the SQL result correctly.

Blog - EF - Using SqlBulkCopy and temp tables - Execution Plan

So, let's assume the result set is pretty big, the query takes too long and the profiling tool shows hundreds of thousands of reads that are needed to get our products.
The first approach would be to split the query. First we load the products, then the suppliers and the prices.

using(var ctx = new Entities())
{
    var products = ctx.Products
        .Where(p => p.Name.Contains(“chocolate”))
        .Select(p => new FoundProduct()
        {
            Id = p.Id,
            Name = p.Name
        })
        .ToList();

    var productIds = products.Select(p => p.Id);

    var futurePricesLookup = ctx.Prices
        .Where(p => productIds.Contains(p.ProductId))
        .Where(p => p.Startdate > DateTime.Today)
        .ToLookup(p => p.ProductId);

    var suppliersLookup = ctx.Suppliers
        .Where(s => productIds.Contains(s.ProductId))
        .Where(s => s.IsDeliverable)
        .ToLookup(p => p.ProductId);

    foreach(var product in products)
    {
        product.FuturePrices = futurePricesLookup[product.Id];
        product.Suppliers = suppliersLookup[product.Id];
    }   
}

Now we are going 3 times to the database but the result sets are a lot smaller, easier to profile and easier to find optimal indexes for. In a project of one of our customers the reads are gone from 500k down to 2k and the duration from 3 sec to 200 ms just by splitting the query.

For comparison using our simplified example with 100 products and 10k prices:

  • Original query needs 300 ms and has 240 reads
  • Split queries need (1+14+1) = 16 ms and has (2 + 115 + 4) =121 reads

 

This approach performs very well when the number of product IDs we use in the Where statement stays small, say < 50. But it isn't always the case.
Especially when implementing a data exporter we have to be able to handle thousands of IDs. Using that many parameters will slow down the query significantly. But what if we could insert all product IDs into a temporary table using SqlBulkCopy because with bulk copy there is almost no difference whether there are 100 IDs to insert or 10k. At first we want to create a few classes and methods to be able to bulk insert IDs of type Guid using just a few lines of code. The usage will look like this:

private const string TempTableName = "#TempTable";

using(var ctx = new Entities())
{
    // fetch products and the productIds

    RecreateTempTable(ctx);
    BulkInsert(ctx, null, TempTableName, () => new TempTableDataReader(productIds));

    // here come the queries for prices and suppliers
}

Before copying the IDs we need to create a temp table.

private void RecreateTempTable(Entities ctx)
{
    ctx.Database.ExecuteSqlCommand($@"
        IF(OBJECT_ID('tempdb..{TempTableName}') IS NOT NULL)
            DROP TABLE {TempTableName};

        CREATE TABLE {TempTableName}
        (
            Id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY CLUSTERED
        );
    ");
}

The bulk insert is encapsulated into a generic method to be able to use it with all kind of data. The class BulkInsertDataReader<T> is a base class of mine to be able to implement the interface IDataReader very easily. The class can be found on GitHub: BulkInsertDataReader.cs

private void BulkInsert<T>(Entities ctx, DbContextTransaction tx, 
    string tableName, Func<BulkInsertDataReader<T>> getDatareader)
{
    SqlConnection sqlCon = (SqlConnection)ctx.Database.Connection;
    SqlTransaction sqlTx = (SqlTransaction)tx?.UnderlyingTransaction;

    using (SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlCon, 
        SqlBulkCopyOptions.Default, sqlTx))
    {
        bulkCopy.DestinationTableName = tableName;
        bulkCopy.BulkCopyTimeout = (int)TimeSpan.FromMinutes(10).TotalSeconds;

        using (var reader = getDatareader())
        {
            foreach (var mapping in reader.GetColumnMappings())
            {
                bulkCopy.ColumnMappings.Add(mapping);
            }

            bulkCopy.WriteToServer(reader);
        }
    }
}

Using the generic BulkInsertDataReader we implement a data reader for inserting Guids.

public class TempTableDataReader : BulkInsertDataReader<Guid>
{
    private static readonly IReadOnlyCollection<SqlBulkCopyColumnMapping> _columnMappings;

    static TempTableGuidDataReader()
    {
        _columnMappings = new List<SqlBulkCopyColumnMapping>()
        {
            new SqlBulkCopyColumnMapping(1, "Id"),
        };
    }

    public TempTableGuidDataReader(IEnumerable<Guid> guids)
        : base(_columnMappings, guids)
    {
    }

    public override object GetValue(int i)
    {
        switch (i)
        {
            case 1:
                return Current;
            default:
                throw new ArgumentOutOfRangeException("Unknown index: " + i);
        }
    }
}

Now we have all IDs in a temporary table. Let’s rewrite the query from above to use JOINs instead of the method Contains.

using(var ctx = new Entities())
{
    // fetch products and the productIds
    // create temp table and insert the ids into it

    var futurePricesLookup = ctx.Prices
        .Join(ctx.TempTable, p => p.ProductId, t => t.Id, (p, t) => p)
        .Where(p => p.Startdate > DateTime.Today)
        .ToLookup(p => p.ProductId);

    var suppliersLookup = ctx.Suppliers
        .Join(ctx.TempTable, s => s.ProductId, t => t.Id, (s, t) => s)
        .Where(s => s.IsDeliverable)
        .ToLookup(p => p.ProductId);

    // set properties FuturePrices and Suppliers like before
}

Here the question comes up where the entity set TempTable comes from when you are using database-first approach? The answer is we need to edit the edmx file manually to introduce the temp table to Entity Framework. For that open the edmx file in an XML editor and copy the EntityContainer, EntityType and EntityContainerMapping at the right place like it is shown below.

Remark: The Entity Framework supports so called DefiningQuery we use to define the temp table but the EF-Designer of Visual Studio doesn't support this feature. The consequence of that is that some sections we define manually will be deleted after an update of the EF-model. In this case we need to revert these changes.

<edmx:Edmx Version="3.0">
    <edmx:Runtime>
        <!-- SSDL content -->
        <edmx:StorageModels>
            <Schema Namespace="Model.Store" Provider="System.Data.SqlClient">
                <EntityContainer Name="ModelStoreContainer">
                    <EntitySet Name="TempTable" EntityType="Self.TempTable">
                        <DefiningQuery>
                            SELECT #TempTable.Id
                            FROM #TempTable
                        </DefiningQuery>
                    </EntitySet>
                </EntityContainer>
                <EntityType Name="TempTable">
                    <Key>
                        <PropertyRef Name="Id" />
                    </Key>
                    <Property Name="Id" Type="uniqueidentifier" Nullable="false" />
                </EntityType>
            </Schema>
        </edmx:StorageModels>
        <!-- CSDL content -->
        <edmx:ConceptualModels>
            <Schema Namespace="Model" Alias="Self">
                <EntityContainer Name="Entities" annotation:LazyLoadingEnabled="true">
                    <EntitySet Name="TempTable" EntityType="Model.TempTable" />
                </EntityContainer>
                <EntityType Name="TempTabled">
                    <Key>
                        <PropertyRef Name="Id" />
                    </Key>
                    <Property Name="Id" Type="Guid" Nullable="false" />
                </EntityType>
            </Schema>
        </edmx:ConceptualModels>
        <!-- C-S mapping content -->
        <edmx:Mappings>
            <Mapping Space="C-S">
                <EntityContainerMapping StorageEntityContainer="ModelStoreContainer" CdmEntityContainer="Entities">
                    <EntitySetMapping Name="TempTable">
                        <EntityTypeMapping TypeName="Model.TempTable">
                            <MappingFragment StoreEntitySet="TempTable">
                                <ScalarProperty Name="Id" ColumnName="Id" />
                            </MappingFragment>
                        </EntityTypeMapping>
                    </EntitySetMapping>
                </EntityContainerMapping>
            </Mapping>
        </edmx:Mappings>
    </edmx:Runtime>
</edmx:Edmx>

That’s it. Now we are able to copy thousands of records into a temp table very fast and use this data for JOINs.


Mimicking $interpolate: An Angular 2 interpolation service

In an Angular 1 application we have been creating for one of our customers we used the $interpolate service to build a simple templating engine. The user was able to create snippets with placeholders within the web application to use these message fragments to compose an email to reply to a support request.

In Angular 2 there is no such service like $interpolate - but that is not a problem because we have got abstract syntax tree (AST) parsers to build our own interpolation library. Let’s build a component that takes a format string (with placeholders) and an object with properties to be used for replacement of the placeholders. The usage looks like this:

// returns ‘Hello World!’
interpolation.interpolate(‘Hello {{place.holder}}!’, { place: { holder: ‘World!’}});

At first we need to inject the parser from Angular 2 and we need to create a lookup to cache our interpolations.

constructor(parser: Parser) {
    this._parser = parser;
    this._textInterpolations = new Map<string, TextInterpolation>();
}

The class TextInterpolation is just a container for saving the parts of a format string. To get the interpolated string we need to call the function interpolate. The example from above will have 2 parts:

  • String 'Hello '
  • Property getter for {{place.holder}}

 

class TextInterpolation {
    private _interpolationFunctions: ((ctx: any)=>any)[];

    constructor(parts: ((ctx: any) => any)[]) {
        this._interpolationFunctions = parts;
    }

    public interpolate(ctx: any): string {
        return this._interpolationFunctions.map(f => f(ctx)).join('');
    }
}

Before we can create our TextInterpolation we need to parse the format string to get an AST.

let ast = this._parser.parseInterpolation(text, null);

if (!ast) {
    return null;
}

if (ast.ast instanceof Interpolation) {
    textInterpolation = this.buildTextInterpolation( ast.ast);
} else {
    throw new Error(`The provided text is not a valid interpolation. Provided type ${ast.ast.constructor && ast.ast.constructor['name']}`);
}

The AST of type Interpolation has 2 collections, one with strings and the other with expressions. Our interpolation service should support property-accessors only, i.e. no method calls or other operators.

private buildTextInterpolation(interpolation: Interpolation): TextInterpolation {
    let parts: ((ctx: any) => any)[] = [];

    for (let i = 0; i < interpolation.strings.length; i++) {
        let string = interpolation.strings[i];

        if (string.length > 0) {
            parts.push(ctx => string);
        }

        if (i < interpolation.expressions.length) {
            let exp = interpolation.expressions[i];

            if (exp instanceof PropertyRead) {
                var getter = this.buildPropertyGetter(exp);
                parts.push(this.addValueFormatter(getter));
            } else {
                throw new Error(`Expression of type ${exp.constructor && exp.constructor.name1} is not supported.`);
            }
        }
    }

    return new TextInterpolation(parts);
};

The strings don’t need any special handling but the property getters do. The first part of the special handling happens in the method buildPropertyGetter that fetches the value of the property (and the sub property) of an object.

private buildPropertyGetter(exp: PropertyRead): ((ctx: any) => any) {
    var getter: ((ctx: any) => any);

    if (exp.receiver instanceof PropertyRead) {
        getter = this.buildPropertyGetter(exp.receiver);
    } else if (!(exp.receiver instanceof ImplicitReceiver)) {
        throw new Error(`Expression of type ${exp.receiver.constructor && (exp.receiver).constructor.name} is not supported.`);
    }

    if (getter) {
        let innerGetter = getter;
        getter = ctx => {
            ctx = innerGetter(ctx);
            return ctx && exp.getter(ctx);
        };
    } else {
        getter = <(ctx: any)=>any>exp.getter;
    }

    return ctx => ctx && getter(ctx);
}

The second part of the special handling is done in addValueFormatter that returns an empty string when the value returned by the property getter is null or undefined because these values are not formatted to an empty string but to strings 'null' and 'undefined', respectively.

private addValueFormatter(getter: ((ctx: any) => any)): ((ctx: any) => any) {
    return ctx => {
        var value = getter(ctx);

        if (value === null || _.isUndefined(value)) {
            value = '';
        }

        return value;
    }
}

The interpolation service including unit tests can be found on GitHub: angular2-interpolation


.NET Core: Lowering the log level of 3rd party components

With the new .NET Core framework and libraries we have got an interface called Microsoft.Extensions.Logging.ILogger to be used for writing log messages. Various 3rd party and built-in components make very good use of it. To see how much is being logged just create a simple Web API using Entity Framework (EF) and the Kestrel server and in a few minutes you will get thousands of log messages.

The downside of such a well-known interface is that the log level chosen by the 3rd party developers may be unfitting for the software using it. For example, Entity Framework uses the log level Information for logging generated SQL queries. For the EF developers it is a good choice because the SQL query is an important information for them - but for our customers using EF this information is for debugging purposes only.

Luckily it is very easy to change the log level of a specific logging source (EF, Kestrel etc.). For that we need a simple proxy that implements the interface ILogger. The proxy is changing the log level to Debug in the methods Log and IsEnabled and calls the corresponding method of the real logger with new parameters.

public class LoggerProxy : ILogger
{
	private readonly ILogger _logger;

	public LoggerProxy(ILogger logger)
	{
		if (logger == null)
			throw new ArgumentNullException(nameof(logger));

		_logger = logger;
	}

	public void Log(LogLevel logLevel, int eventId, object state, 
		Exception exception, Func<object, Exception, string> formatter)
	{
		if (logLevel > LogLevel.Debug)
			logLevel = LogLevel.Debug;

		_logger.Log(logLevel, eventId, state, exception, formatter);
	}

	public bool IsEnabled(LogLevel logLevel)
	{
		if (logLevel > LogLevel.Debug)
			logLevel = LogLevel.Debug;

		return _logger.IsEnabled(logLevel);
	}

	public IDisposable BeginScopeImpl(object state)
	{
		return _logger.BeginScopeImpl(state);
	}
}

To inject the LoggerProxy we have to create another proxy that implements the interface Microsoft.Extensions.Logging.ILoggerFactory. The method we are interested in is CreateLogger that gets the category name as a parameter. The category name may be the name of the class requesting the logger or the name of the assembly. In this method we make the real logger factory create a logger for us and if this logger is for Entity Framework we return our LoggerProxy wrapping the real logger.

public class LoggerFactoryProxy : ILoggerFactory
{
	private readonly ILoggerFactory _loggerFactory;
	
	public LogLevel MinimumLevel
	{
		get { return _loggerFactory.MinimumLevel; }
		set { _loggerFactory.MinimumLevel = value; }
	}

	public LoggerFactoryProxy(ILoggerFactory loggerFactory)
	{
		if (loggerFactory == null)
			throw new ArgumentNullException(nameof(loggerFactory));

		_loggerFactory = loggerFactory;
        }

	public ILogger CreateLogger(string categoryName)
	{
		var logger = _loggerFactory.CreateLogger(categoryName);

		if (categoryName.StartsWith("Microsoft.Data.Entity.", StringComparison.OrdinalIgnoreCase))
			logger = new LoggerProxy(logger);

		return logger;
        }

	public void AddProvider(ILoggerProvider provider)
	{
		_loggerFactory.AddProvider(provider);
	}

	public void Dispose()
        {
		_loggerFactory.Dispose();
	}
}

Finally, we need to register the factory proxy with the dependency injection container.

public void ConfigureServices(IServiceCollection services)
{
	var factory = new LoggerFactoryProxy(new LoggerFactory());
	services.AddInstance(factory);
}

For now on the log messages coming from Entity Framework will be logged with the log level Debug.