Mimicking $interpolate: An Angular 2 interpolation service
Entity Framework: Prevent redundant JOINs - watch your LINQ !

Entity Framework: High performance querying trick using SqlBulkCopy and temp tables

Implementing database access with Entity Framework is pretty convenient, but sometimes the query performance can be very poor. Especially using navigational properties to load collections leads to significantly longer execution times and more I/O. To see the impact of the loading of a collection we have to take a look into profiling tools like SQL Server Profiler.

Let’s look at the following use case which was extrapolated from a customer project. We have three tables Products, Suppliers and Prices containing an entire price history.

Blog - EF - Using SqlBulkCopy and temp tables - DB

We want to select all products with their suppliers and future prices according to a filter criteria. The easiest approach is to use the navigational properties.

using(var ctx = new Entities())
{
    var products = ctx.Products
        .Where(p => p.Name.Contains(“chocolate”))
        .Select(p => new FoundProduct()
        {
            Id = p.Id,
            Name = p.Name,
            FuturePrices = p.Prices
                .Where(price => price.Startdate > DateTime.Today),
            Suppliers = p.Suppliers
                .Where(s => s.IsDeliverable)
    })
    .ToList();
}

For the simple looking query, depending on the complexity of the data and the amount of data in the database, the execution can take a while. There are multiple reasons the database won’t like this query. The Entity Framework has to make huge JOINs, concatenations and sorting to fetch the products, prices and suppliers at once thus the result set is much bigger than fetching the collections separately. Furthermore, it is more difficult to find optimal indexes because of the JOINs, the confusing execution plan and suboptimal SQL statements Entity Framework has to generate to fulfill our demands.

If you have been using EF you may be wondering why you didn't have this problem before. The answer is you didn't notice it because the tables or the result set have been small etc. Just assume an unoptimized query takes 200 ms, an optimized query 20 ms. Although one query is 10 times faster than the other both response times are considered 'fast' - and this often leads to assumptions that the query is perfect. Though, in reality the database needs much more resources to perform the unoptimized query. But that doesn't mean we have to change all our EF queries using navigational properties, be selective. Use profiling tools to decide what query should be tuned and what not.

Let's look at the execution plan for the query from above to get an idea what operator consumes the resources the most. Half of the resources are needed for sorting the data although we don't have any order-by clause in our query! The problem is that the data must have special sort order so the Entity Framework is able to process (materialize) the SQL result correctly.

Blog - EF - Using SqlBulkCopy and temp tables - Execution Plan

So, let's assume the result set is pretty big, the query takes too long and the profiling tool shows hundreds of thousands of reads that are needed to get our products.
The first approach would be to split the query. First we load the products, then the suppliers and the prices.

using(var ctx = new Entities())
{
    var products = ctx.Products
        .Where(p => p.Name.Contains(“chocolate”))
        .Select(p => new FoundProduct()
        {
            Id = p.Id,
            Name = p.Name
        })
        .ToList();

    var productIds = products.Select(p => p.Id);

    var futurePricesLookup = ctx.Prices
        .Where(p => productIds.Contains(p.ProductId))
        .Where(p => p.Startdate > DateTime.Today)
        .ToLookup(p => p.ProductId);

    var suppliersLookup = ctx.Suppliers
        .Where(s => productIds.Contains(s.ProductId))
        .Where(s => s.IsDeliverable)
        .ToLookup(p => p.ProductId);

    foreach(var product in products)
    {
        product.FuturePrices = futurePricesLookup[product.Id];
        product.Suppliers = suppliersLookup[product.Id];
    }   
}

Now we are going 3 times to the database but the result sets are a lot smaller, easier to profile and easier to find optimal indexes for. In a project of one of our customers the reads are gone from 500k down to 2k and the duration from 3 sec to 200 ms just by splitting the query.

For comparison using our simplified example with 100 products and 10k prices:

  • Original query needs 300 ms and has 240 reads
  • Split queries need (1+14+1) = 16 ms and has (2 + 115 + 4) =121 reads

 

This approach performs very well when the number of product IDs we use in the Where statement stays small, say < 50. But it isn't always the case.
Especially when implementing a data exporter we have to be able to handle thousands of IDs. Using that many parameters will slow down the query significantly. But what if we could insert all product IDs into a temporary table using SqlBulkCopy because with bulk copy there is almost no difference whether there are 100 IDs to insert or 10k. At first we want to create a few classes and methods to be able to bulk insert IDs of type Guid using just a few lines of code. The usage will look like this:

private const string TempTableName = "#TempTable";

using(var ctx = new Entities())
{
    // fetch products and the productIds

    RecreateTempTable(ctx);
    BulkInsert(ctx, null, TempTableName, () => new TempTableDataReader(productIds));

    // here come the queries for prices and suppliers
}

Before copying the IDs we need to create a temp table.

private void RecreateTempTable(Entities ctx)
{
    ctx.Database.ExecuteSqlCommand($@"
        IF(OBJECT_ID('tempdb..{TempTableName}') IS NOT NULL)
            DROP TABLE {TempTableName};

        CREATE TABLE {TempTableName}
        (
            Id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY CLUSTERED
        );
    ");
}

The bulk insert is encapsulated into a generic method to be able to use it with all kind of data. The class BulkInsertDataReader<T> is a base class of mine to be able to implement the interface IDataReader very easily. The class can be found on GitHub: BulkInsertDataReader.cs

private void BulkInsert<T>(Entities ctx, DbContextTransaction tx, 
    string tableName, Func<BulkInsertDataReader<T>> getDatareader)
{
    SqlConnection sqlCon = (SqlConnection)ctx.Database.Connection;
    SqlTransaction sqlTx = (SqlTransaction)tx?.UnderlyingTransaction;

    using (SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlCon, 
        SqlBulkCopyOptions.Default, sqlTx))
    {
        bulkCopy.DestinationTableName = tableName;
        bulkCopy.BulkCopyTimeout = (int)TimeSpan.FromMinutes(10).TotalSeconds;

        using (var reader = getDatareader())
        {
            foreach (var mapping in reader.GetColumnMappings())
            {
                bulkCopy.ColumnMappings.Add(mapping);
            }

            bulkCopy.WriteToServer(reader);
        }
    }
}

Using the generic BulkInsertDataReader we implement a data reader for inserting Guids.

public class TempTableDataReader : BulkInsertDataReader<Guid>
{
    private static readonly IReadOnlyCollection<SqlBulkCopyColumnMapping> _columnMappings;

    static TempTableGuidDataReader()
    {
        _columnMappings = new List<SqlBulkCopyColumnMapping>()
        {
            new SqlBulkCopyColumnMapping(1, "Id"),
        };
    }

    public TempTableGuidDataReader(IEnumerable<Guid> guids)
        : base(_columnMappings, guids)
    {
    }

    public override object GetValue(int i)
    {
        switch (i)
        {
            case 1:
                return Current;
            default:
                throw new ArgumentOutOfRangeException("Unknown index: " + i);
        }
    }
}

Now we have all IDs in a temporary table. Let’s rewrite the query from above to use JOINs instead of the method Contains.

using(var ctx = new Entities())
{
    // fetch products and the productIds
    // create temp table and insert the ids into it

    var futurePricesLookup = ctx.Prices
        .Join(ctx.TempTable, p => p.ProductId, t => t.Id, (p, t) => p)
        .Where(p => p.Startdate > DateTime.Today)
        .ToLookup(p => p.ProductId);

    var suppliersLookup = ctx.Suppliers
        .Join(ctx.TempTable, s => s.ProductId, t => t.Id, (s, t) => s)
        .Where(s => s.IsDeliverable)
        .ToLookup(p => p.ProductId);

    // set properties FuturePrices and Suppliers like before
}

Here the question comes up where the entity set TempTable comes from when you are using database-first approach? The answer is we need to edit the edmx file manually to introduce the temp table to Entity Framework. For that open the edmx file in an XML editor and copy the EntityContainer, EntityType and EntityContainerMapping at the right place like it is shown below.

Remark: The Entity Framework supports so called DefiningQuery we use to define the temp table but the EF-Designer of Visual Studio doesn't support this feature. The consequence of that is that some sections we define manually will be deleted after an update of the EF-model. In this case we need to revert these changes.

<edmx:Edmx Version="3.0">
    <edmx:Runtime>
        <!-- SSDL content -->
        <edmx:StorageModels>
            <Schema Namespace="Model.Store" Provider="System.Data.SqlClient">
                <EntityContainer Name="ModelStoreContainer">
                    <EntitySet Name="TempTable" EntityType="Self.TempTable">
                        <DefiningQuery>
                            SELECT #TempTable.Id
                            FROM #TempTable
                        </DefiningQuery>
                    </EntitySet>
                </EntityContainer>
                <EntityType Name="TempTable">
                    <Key>
                        <PropertyRef Name="Id" />
                    </Key>
                    <Property Name="Id" Type="uniqueidentifier" Nullable="false" />
                </EntityType>
            </Schema>
        </edmx:StorageModels>
        <!-- CSDL content -->
        <edmx:ConceptualModels>
            <Schema Namespace="Model" Alias="Self">
                <EntityContainer Name="Entities" annotation:LazyLoadingEnabled="true">
                    <EntitySet Name="TempTable" EntityType="Model.TempTable" />
                </EntityContainer>
                <EntityType Name="TempTabled">
                    <Key>
                        <PropertyRef Name="Id" />
                    </Key>
                    <Property Name="Id" Type="Guid" Nullable="false" />
                </EntityType>
            </Schema>
        </edmx:ConceptualModels>
        <!-- C-S mapping content -->
        <edmx:Mappings>
            <Mapping Space="C-S">
                <EntityContainerMapping StorageEntityContainer="ModelStoreContainer" CdmEntityContainer="Entities">
                    <EntitySetMapping Name="TempTable">
                        <EntityTypeMapping TypeName="Model.TempTable">
                            <MappingFragment StoreEntitySet="TempTable">
                                <ScalarProperty Name="Id" ColumnName="Id" />
                            </MappingFragment>
                        </EntityTypeMapping>
                    </EntitySetMapping>
                </EntityContainerMapping>
            </Mapping>
        </edmx:Mappings>
    </edmx:Runtime>
</edmx:Edmx>

That’s it. Now we are able to copy thousands of records into a temp table very fast and use this data for JOINs.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

The comments to this entry are closed.