Previous month:
September 2017
Next month:
May 2018

April 2018

Entity Framework Core Performance: Beware of N+1 Queries

After working with Entity Framework 6 (EF 6) for several years, a software developer can predict the SQL statements being generated by EF just by looking at the LINQ queries. With Entity Framework Core (EF Core) the SQL statement generation has changed - in some cases for the better, in others for the worse.

In this blog post we will check a few LINQ queries and see which of them are executing N+1 SQL statements where N is the number of selected records.

Given is a DbContext with 2 entities Product and ProductGroup. (Repo with sample code: github.com/PawelGerr/Presentation-EntityFrameworkCore)

public class DemoDbContext : DbContext
{
public DbSet<Product> Products { get; set; }
public DbSet<ProductGroup> ProductGroups { get; set; }
}
public class Product
{
public Guid Id { get; set; }
public string Name { get; set; }

public Guid GroupId { get; set; }
public ProductGroup Group { get; set; }
}
public class ProductGroup
{
public Guid Id { get; set; }
public string Name { get; set; }

public ICollection<Product> Products { get; set; }
}

Let's print out all product groups having the word "Group" in their names with corresponding products via Include() first and using Select() second.

// Using Include()
var groups = Context.ProductGroups
.Include(g => g.Products)
.Where(g => g.Name.Contains("Group"))
.ToList();

Print(groups);
// Using Select() var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
g.Products
})
.ToList();

Print(groups);

In both cases 2 SQL statements are executed by EF Core: 1 for the product groups and 1 for the products. On the contrary, EF  6 executes just 1 statement. This may imply that the performance of EF 6 is better than the of EF Core, but in practice it is worse because the queries are getting huge and produce more load on the database.

-- Fetching product groups
SELECT [g].[Id], [g].[Name]
FROM [ProductGroups] AS [g]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
ORDER BY [g].[Id]
-- Fetching products
SELECT [g.Products].[Id], [g.Products].[GroupId], [g.Products].[Name]
FROM [Products] AS [g.Products]
INNER JOIN
(
    SELECT [g0].[Id]
    FROM [ProductGroups] AS [g0]
    WHERE CHARINDEX(N'Group', [g0].[Name]) > 0
) AS [t] ON [ g.Products].[GroupId] = [t].[Id]
ORDER BY [t].[Id]

Now we don't take all products but only those with the term "1"  in their names and print them out twice(!). 

var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
Products = g.Products.Where(p => p.Name.Contains("1"))
})
.ToList();

Print(groups); // 1st iteration over product groups
Print(groups); // 2nd iteration over product groups

The result is disappointing. Having 5 product groups matching the condition we get 11 SQL statement executions: 1 query for fetching 5 product groups and (2 * 5=10) for fetching the products. Let's put a ToList() at the end of the products query.

var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
Products = g.Products.Where(p => p.Name.Contains("1")).ToList()
})
.ToList();

Now we have 6 (=1+5)  queries being sent to the database, it is getting better but still not satisfying.

-- 1 query for fetching product groups
SELECT [g].[Id], [g].[Name]
FROM [ProductGroups] AS [g]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
-- 5 queries for fetching products (i.e. 1 query per fetched product group)
SELECT [p].[Id], [p].[GroupId], [p].[Name]
FROM [Products] AS [p]
WHERE (CHARINDEX(N''1'', [p].[Name]) > 0) AND
(@_outer_Id = [p].[GroupId])

Obviously, EF Core has some difficulties translating queries if Select() contains a filtered collection. Let's select just the first product.

var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.Select(g => new
{
ProductGroup = g,
Product = g.Products.FirstOrDefault()
})
.ToList();

Print(groups);

We still getting 6 queries meaning that the "problem" doesn't lie in the cardinality of the response type (Product vs ICollection<Product>) but in collections in general.

Solutions

We can reduce the number of queries by not using the navigational property Products but doing the "JOIN" by ourselves, for example via GroupJoin.

var productsQuery = Context.Products.Where(i => i.Name.Contains("1"));

var groups = Context.ProductGroups
.Where(g => g.Name.Contains("Group"))
.GroupJoin(productsQuery, g => g.Id, p => p.GroupId, (g, p) => new
{
ProductGroup = g,
Products = p
})
.ToList();

Print(groups);

The previous LINQ query produces just 1 query.

SELECT [g].[Id], [g].[Name], [t].[Id], [t].[GroupId], [t].[Name]
FROM [ProductGroups] AS [g]
LEFT JOIN
(
    SELECT [i].[Id], [i].[GroupId], [i].[Name]
    FROM [Products] AS [i]
    WHERE CHARINDEX(N'1', [i].[Name]) > 0
) AS [t] ON [g].[Id] = [t].[GroupId]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
ORDER BY [g].[Id]

An alternative solution is to fetch the data separately and doing lookups in .NET.

var groupsQuery = Context.ProductGroups
.Where(g => g.Name.Contains("Group"));

var productsByGroupId = groupsQuery
                           .SelectMany(g => g.Products.Where(i => i.Name.Contains("1")))
.ToLookup(p => p.GroupId);

var groups = groupsQuery
.Select(g => new
{
ProductGroup = g,
Products = productsByGroupId[g.Id]
})
.ToList();

The generated SQL statements are easier to handle by the database but there are 2 of them. 

-- For product groups
SELECT [g].[Id], [g].[Name]
FROM [ProductGroups] AS [g]
WHERE CHARINDEX(N'Group', [g].[Name]) > 0
-- For products
SELECT [g.Products].[Id], [g.Products].[GroupId], [g.Products].[Name]
FROM [ProductGroups] AS [g]
INNER JOIN [Products] AS [g.Products]
    ON [g].[Id] = [g.Products].[GroupId]
WHERE (CHARINDEX(N'Group', [g].[Name]) > 0) AND
     (CHARINDEX(N'1', [g.Products].[Name]) > 0)

Depending on the database model, amount of data, indexes, number of collections and columns being fetched the one or the other solution may perform better.

Closing Words

The query generation of EF Core is not optimal yet but the Entity Framework Team is currently working on the "N+1 queries" problem so we will re-check all queries with EF Core 2.1 very soon.

In general, whether it is EF 6, EF Core or other O/R mapper it is recommended to use a database profiling tool, so we get good understanding of the technology we use yet again.