Quick Tip: Make SQL scripts generate SQL scripts for you.

Quick Tip

So the synopsis is this: sometimes you have a TON of rows you need to insert into a database from a spreadsheet, but you need to do some custom massaging of the data to generate the statements. While a few dozen rows is okay, try doing this with 10,000+ rows! The nice thing is that you can make the database do the work for you.

Recently I had the following dilemma. I was given a spreadsheet to insert into the database with the following layout:

ProductID   CategoryCode  CategoryName
1A1231A     SWT           Sweets
3C1111B     PM            Processed Meats
1Z33325     PM            Processed Meats
1X3A33C     PM            Processed Meats
777A3X1     AC-01         Accessories (Group 1)
9WXX323     AC-01         Accessories (Group 1)

Now on it’s face, this looks like a straight forward data import. Let the import tools do the work for you. However, there’s some redundant data in there, namely the Categories. Any programmer worth their salt will at least take a moment and evaluate the concern about this. What if Processed Meats changes it’s name or category? Well, we’d have to change those two records to accomodate for that. That might not be too bad, but when you’re dealing with 10,000+ records (as I was) then that update becomes potentially unmanageable, or dangerous, if a stray user only updates one field.

To make this work better, we setup two tables. A category table and a product_mapping table. That way we can identify each category once and if a change to the category occurs, we only need to update the category in one place:

Category Table

id   code    name
1    SWT     Sweets
2    PM      Processed Meats
3    AC-01   Accessories (Group 1)

Product_Mapping Table

id   product_id   category_id
1    1A1231A      1
2    3C1111B      2
3    1Z33325      2
4    1X3A33C      2
5    777A3X1      3
6    9WXX323      4

The tricky part is how to convert our original spreadsheet to use our new categories and mappings, especially when we’re dealing with a huge volume of records. The first step is to import our raw data into our database using whatever tool we have handy. This will create our table with our three columns. I imported mine with the name RawMappingData. From there, we need to extract our unique categories with a SELECT DISTINCT statement

SELECT DISTINCT CategoryCode, CategoryName FROM RawMappingData

 

This gives us a basic result set:

code   name
SWT    Sweets
PM     Process Meats
AC-01  Accesssories (Group 01)

This result set is pretty friendly to work with. We can dump this result set to the clipboard and put some “INSERT INTO category (code, name) VALUES” type text in  front of it and voila, our SQL script is done. However, the dataset I’m working with has 178 categories. Doing even some fancy regular expression search and replace magic (which I’ve done before) can become time consuming.

All is not lost though! With our SQL statements, we can append text in front of our variables when generating our result sets. Try this query on for size:

SELECT DISTINCT 'Woot: ' +  CategoryCode, CategoryName FROM RawMappingData

This will gives us the following result set:

code          name
Woot:  SWT    Sweets
Woot:  PM     Process Meats
Woot:  AC-01  Accesssories (Group 01)

Using this trick, we can hack up our SQL query to generate our category insert queries for us!

SELECT DISTINCT 'INSERT INTO category (code, name) VALUES (''' + CategoryCode + ''', ''' + CategoryName + ''')' FROM RawMappingData

Take a peek at these results:

(No column Name)
INSERT INTO category (code, name) VALUES ('SWT', 'Sweets')
INSERT INTO category (code, name) VALUES ('PM', 'Processed Meats')
INSERT INTO category (code, name) VALUES ('AC-01', 'Accessories (Group 01)')

We can dump this result set into a text editor, or query editor, save and run. We’ll get uniquely generated Ids for our categories and execution time on the script will be really fast.

But what about the mapping records themselves. There’s a good 10,000+ records to insert. Can we use this approach? Yes we can, but we’ll take a slightly different angle to it, which will require a little bit of manual text editing, but far from anything that is time consuming. For our mapping records, we’re going to setup a transactional approach, and insert 1000 records at a time in the database in batches. If we did them one by one, it would take a long amount of time to process, since each INSERT statement would have its own log record (for restore purposes) and that takes extra overhead.

The transactional approach uses the basic format for its queries:

SET NOCOUNT ON;
SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
INSERT INTO [dbo].[product_mapping]([product_id], [category_id])
SELECT N'1A1231A', 1 UNION ALL
SELECT N'3C1111B', 2 UNION ALL
SELECT N'77A3X1', 3
COMMIT;

RAISERROR (N'[dbo].[submarket_mapping]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
GO

What this will do is process all of the statements within the transaction block and then commit them in one batch, which saves us time. Typically this can be done in 1000 or 5000 record transactions. If an error occurs within the block, none of the records are stored in the database. In our case, this isn’t a big concern. So we can setup a similar query as we did above, generating all our SELECT statements that fit between our “INSERT INTO” and “COMMIT” block. We’ll then take this text, dump it into a text editor, and insert our “BEGIN TRANSACTION” and “COMMIT” statements that wrap around every 1000 lines or so of code.

We still have one last hurdle to get past. Our raw data still contains the category code and name, but not the ID that we need for our product_mapping table. Fortunately, we can nest SELECT statements to pull out the data we need. Our “generator” query then looks something like this:

SELECT 'SELECT N''' + [ProdoctId] + ''', (SELECT [id] FROM [dbo].[category] WHERE 1 = ''' + [CategoryCode] + ''' AND [name] = ''' + [CategoryName] + ''') UNION ALL' FROM RawMappingData

And this will help us generate our final SQL Statement we need:

SET NOCOUNT ON;
SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
INSERT INTO [dbo].[product_mapping]([product_id], [category_id])
SELECT N'1A1231A', (SELECT [id] FROM [dbo].[submarket] WHERE 1 = 'AT01' AND [name] = 'Region 01') UNION ALL
SELECT N'3C1111B', (SELECT [id] FROM [dbo].[submarket] WHERE 1 = 'AT01' AND [name] = 'Region 01') UNION ALL
SELECT N'77A3X1', (SELECT [id] FROM [dbo].[submarket] WHERE 1 = 'AT01' AND [name] = 'Region 01')
COMMIT;

RAISERROR (N'[dbo].[submarket_mapping]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
GO

So there you have it. What took me about half a day with a text editor and some old regular expression replacement tricks (which still included some errors to fix) I was able to recreate in about 5 minutes by piecing these things together. This saved time and work.

Have you done anything similar using different means? I'd love to know. I'm always looking for new tricks for my toolbelt!

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s