RSS

Problem with Bulk insert using Entity Framework

22 Feb

Recently, I worked on an ASP.NET(.NET Framework 3.5) project which uses Entity Framework for database(SQL Server 2008) interactions. For this project, I got a requirement to allow a user to do bulk insert of data into some of the tables in the database. In my case, all the data comes from an Excel file, so I created a page with an Upload control for the admin to upload this Excel file. To read the Excel file I used an open source Excel Library called EPPlus. After retrieving the Excel file records, I inserted each record into the database using the Entity Framework classes that I already had in my code. Then I tried uploading data from an Excel file with over 2500 records into the database. And guess what, it took over 10 mins to complete the process. Initially I thought it’s  the Excel Library that was running slow and which in turn was delaying the entire process. When I started digging into the problem, I realized that Entity Framework was the main culprit.

Ok, that’s it. I will end my long story now. Don’t want to make others get bored.

I would say, Entity Framework is an excellent framework which allows us to interact with the database in a more object oriented way, and that makes the life of a developer easy. But, when it comes to bulk inserts into the database, Entity Framework is a wrong guy to choose(as the performance is very slow in such cases), just stay away from it. Try using SqlBulkCopy, that was introduced in .NET 2.0.

I will illustrate this case with an example, without the Excel thing. Here goes…..

For this example, I used an existing database named “Northwind” (you can use any database) and created a new table called “Players”, which is as follows:

Now in my Visual Studio project, I added a new “ADO.NET Entity Data Model” based on the “Northwind” database and selected the “Players” table. Now the Entity Framework generates the corresponding classes for me to interact with the database.

My code starts now…..

private void InsertEntities()
{
    NorthwindEntities1 context = new NorthwindEntities1();
    List<Player> players = BuildPlayers();

    foreach(Player player in players)
    {
        context.AddToPlayers(player);
    }

    context.SaveChanges();
}

The BuildPlayers() function is as follows:

private List BuildPlayers()
{
    List players = new List();

    for (int i = 0; i < 10000; i++)
    {
        players.Add(new Player
                        {
                            PlayerID = i + 1,
                            PlayerName = "Player " + (i + 1),
                            Created = DateTime.Now,
                            CreatedBy = Guid.NewGuid()
                        });
    }

    return players;
}

Now I measured the time it took to execute the InsertEntities() method as follows:

System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();

stopWatch.Start();
InsertEntities();
stopWatch.Stop();

Console.WriteLine("Time elapsed = " + stopWatch.ElapsedMilliseconds);

This method took 5544 milliseconds(5.544 seconds) to insert 10,000 records into the “Players” table.

Now I will try a different approach using SqlBulkCopy. I am going to use WriteToServer() method of the SqlBulkCopy class(you can read more about this class here). Since this method takes an IDataReader/DataTable as input, to convert our List of Player Entities to IDataReader type, I am going to use Linq Entity Data Reader class that can be found at this link. Download and add this class to your project. In my case I used Linq Entity Data Reader, but if you want you can also find other converters on the internet, that convert a List to DataTable and use them in your project.

The new code will be as follows:

private void InsertEntitiesNew()
{
    List<Player> players = BuildPlayers();
    NorthwindEntities1 context = new NorthwindEntities1();
    var entityConn = context.Connection as System.Data.EntityClient.EntityConnection;
    var dbConn = entityConn.StoreConnection as SqlConnection;

    SqlBulkCopy bulkInsert = new SqlBulkCopy(dbConn);
    bulkInsert.DestinationTableName = "Players";
    bulkInsert.WriteToServer(players.AsDataReader());
}

To measure the execution time of this new method:

System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();

stopWatch.Start();
InsertEntitiesNew();
stopWatch.Stop();

Console.WriteLine("Time elapsed = " + stopWatch.ElapsedMilliseconds);

And this method took 324 milliseconds(0.324 seconds) to insert the same 10,000 records into the “Players” table.

Notice the time difference in both cases, it was 5.544 seconds using Entity Framework(this time was measured on my dev machine, it can get doubled or tripled on a staging/production server) and 0.324 seconds using SqlBulkCopy class. So, it’s better not to use Entity Framework for bulk inserts if you want a good performance.

Here are some things you need to be aware of when using SqlBulkCopy class:

1. The order of the properties in your entity class should be the same as the order of the columns in your database table. For example, consider the Players table:

Your entity class representing “Player” should have its properties as:

public class Player
{
    public int PlayerID;
    public string PlayerName;
    public DateTime Created;
    public Guid CreatedBy;
}

(Note: This is not the exact Entity class as it is generated by Entity Framework)

2. SqlBulkCopy class can insert only scalar values like int, string, etc. (Like the columns in Players table, SqlBulkCopy class can insert int, string, datetime, Guid, etc)

3. So if the table has a Foreign Key relationship, Entity Framework creates a custom object for that column instead of a property. For example, I have a Users table as follows:

and the “Created” column in Players table is a foreign key reference to “UserId” column in Users table. Entity Framework generates the entity class representing “Players” table as follows:

public class Player
{
    public int PlayerID;
    public string PlayerName;
    public DateTime Created;
    public User CreatedBy;
}

(Note: This is not the exact Entity class as it is generated by Entity Framework)

Now, if you try to use SqlBulkCopy class using the above “Player” entity class by building the “Player” List accordingly, only the the “PlayerID”, “PlayerName” and “Created” columns in the “Players” table are filled with data. The “CreatedBy” column will be empty. Reason for this is SqlBulkCopy cannot identify custom properties(like “User”), it can identify only scalar properties.

That’s it guys, hope this helps someone.

About these ads
 
5 Comments

Posted by on February 22, 2012 in .NET

 

Tags: , , ,

5 responses to “Problem with Bulk insert using Entity Framework

  1. Vadim K.

    August 20, 2012 at 7:16 pm

    I had same ideas and solving similar problems and your post helped me a lot. Thanks!

     
  2. Praneeth Moka

    August 20, 2012 at 10:45 pm

    Good to know that….. :)

     
  3. andy

    November 29, 2012 at 10:11 am

    you could switch off the change-detection: .Configuration.AutoDetectChangesEnabled = false and just before saving, call DetectChanges – should be faster, but never as fast as bulk of course ;-). greetz, andy

     
  4. Praneeth Moka

    December 8, 2012 at 1:43 am

    Thanks for the tip Andy……..next time when I go to Entity Framework, I will definitely take a bite on it…..

     

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: