Become a Data Migration Ninja - Part 1. Introduction

Published in Articles

How often do you have a problem with the data migration? Today I want to introduce you the Power Query tool. I have only recently started using it and found it very useful. Let's imagine that we have a powerful tool that allows us to write own complex functions and apply it to the data, merge multiple source files, etc. Interesting? Let's start from installation this powerful tool :)

If you use Microsoft Office 2016 or newer, then the Power Query is already integrated into your Excel. Otherwise, you can download it here. If it’s possible, I recommend using the 64-bit office, because the 32-bit office has some limitations. The 32-bit version of Power Query has a data processing limitation of approximately 1GB. The 64-bit version will only be limited by available virtual memory.

By default, new queries will automatically be loaded to a new worksheet, or directly to the Data Model when loading multiple queries at a time. This can utilize a large amount of memory, especially with queries you are not ready to work with yet, and can also slow down the query editing process. You may want to disable loading to either a worksheet or to the Data Model by default, effectively loading only the connection and preview rows to edit the query. To manage it, please navigate to the Data tab and click on the "New Query" button, after that go to the "Query Options".

For the Office 2013, you can find "Options" under the "Power Query" tab:

Now, when we completed the Power Query configuration, let's complete one small task. I already prepared set of files, that you can download here. It contains the Accounts.xlsx file with exported accounts from external system.

Let's start with creating a query using our source files. Please note that I will provide only Office 2016 example to reduce the article size. First, go to the Data tab, click the "New Query" button and choose the “From File”. In our case, you need to choose the "From Workbook" option because our files have the XLSX file type.

In the open window, you should open the Accounts.xlsx file and then click the Import button.

Accounts.xlsx has two sheets: "Accounts Part 1" and "Accounts Part 2". Load them as two seprate queries.

As you loaded it, let's open the query editor and merge these two files in one. For it double click on one of our queries to open the query editor.

I would recommend to group all queries. These two queries we can group as "Data Sources". I guess you do not like mess, use groupings to avoid it in Power Query.

And when you do that, I recommend to do not modify source files. Let's create a reference to a source file. You can also use "duplicate" option, but it will copy a query with all actions, while reference will allow you to use original query as a source. As you done it, append the second part to the "All Accounts" query. Click on the Home tab and "Append Queries" button. I would recommend to use informative names like "All Accounts".

Now it's time to prepare our CSV for exporting to salesforce org.


As you can see, it's pretty easy to start using Power Query. Now, let's imagine that client sends us a "delta" while we working on the accounts migration. Fortunately, we used the Power Query tool. So, we can just replace the source file to a new one. After that, all actions that we did with the original source file will be applied to the new file.

This example does not demonstrate the serious benefits of the Power Query but let's imagine that the number of actions will be more than 50 and the number of records will be more than 100 000. Excel, in this case, will be crashed or froze for the long period. And new "delta" will means that you need to do all actions again, what can be painful in some cases.

Next example is a good example of bad practice. I strongly recommend using informative names for the actions. Below you can see an example with unhelpful naming, you can only guess what is happening there

Another good example of bad practice. On the screenshot below you can see a total mess. It’s hard to understand what each query do, even for a person who did it a few days ago. I strongly recommend to use folder grouping and use understandable naming.

On the screenshot below you can see the example of grouping that I use now. I do not force you to use this structure, but you can use it as start point. Do not hesitate to leave comments under the article if you have some corrections or propositions.

Here is the example from real life:

And for the dessert some recommendations:

I hope you like it as much as I like. In the next article I will write about common actions and tips.

Useful links:

Comments powered by CComment