Entity Framework Include Statements

I have been recently working on a performance issue within an app. The app was written in .NET Framework with C# and EF 6 connected to a Sql Server. The issue was that the app CPU usage and the database CPU usage was high (around 60%) even though the load was not high enough to justify it. Also, the response to some of the API calls were very long, 1.5 seconds sometimes. I had a colleague of mine look with me and analyze the code to see we can find the problem and the solution. The issue became clear quickly that it was a specific endpoint in the REST API that the app exposes that was the problem, and this REST endpoint issues a search query to the database.

So naturally, I enabled tracing on SQL Server to see the exact query being run. To my suprised, the query was very big. Very very big. To be honest, the search query has many many parameters and includes quite many tables, but the query this time was very large compared to the previous release of the app. This time, it was around 4000 lines of SQL query. How did it get this big? It turned out that it is the .Include() fault.

The C# code looked like this

var query = context.MyTable.AsNoTracking()
	.Include(m => m.Table1)
	.Include(m => m.Table2)
	.Include(m => m.Table3)
	// ... many includes
	.Include(m => m.Table7);
//.. do something with the query

Usually these .Include() functions translate into JOIN statement and it brings the data back from the database. Hence, usually Include() statements are safe. However, it is safe only if the relationship between MyTable and Tablex is 1-to-1. If the relationship is 1-to-many, then EF, in order to bring all the data it needs in one query, issues a complex UNION call. This way, EF thinks that it is optimizing the performance of the query. But in my case, I had 2 or 3 Include() statements that have 1-to-many relationship with MyTable, hence the very long query.

But that was not all. When we ran this very large query on SQL Server, it ran in 300 ms. The next question was, why is the response time reaches seconds if the database can return the result very quickly. It cannot be the call roundtrip because I am running the query and the API locally. This is where an EF Profiler was helpful in showing the lifetime of the context and the query parameters built by EF. We used an EF profiler, and indeed the response from the database was quick, but the context life was very long. This means EF is doing something. When we looked at the query EF generated and the parameters it creates for the query, we noticed that almost all the parameters we put in the code at hard-coded in the query. Hence EF has to generate a new query every time the API is hit, hence the low performance. Generally EF caches the query, but this cannot be done for this query.

At this point, we removed the 1-to-many Include() statements, and replaced them with extra JOIN calls to the database. And that was much faster.

Indeed, EF Include() is very dangerous sometimes, and this is a well-documented issue and you can find some questions on StackOverflow regarding this problem. Next time you use EF and you have a performance issue, remember to intercept each SQL query and check with any EF profiler to see where the issue is.