Converting HTML Rows to Excel Columns

This will not be my most scintillating post of the month…or even the day.  If you're not interested in transferring data from webpages to spreadsheets, just keep moving along, nothing to see here, folks.  But today I found an easy way to take rows of HTML data and convert it to columns in an Excel spreadsheet–I'd expected this to be a gruesomely time-consuming task, so I thought I'd document the process for future reference.

To illustrate, let's say you have a some data in alternating rows on a web page, like this:

Name: John Doe
Site: http://www.johndoe.com
Email: john@johndoe.com

Name: Jane Roe
Site: http://www.janeroe.com
Email: jane@janeroe.com

Name: Ron Mexico
Site: http://www.ronmexico.com
Email: ron@ronmexico.com

And you want it all in columnar format in an Excel spreadsheet, sort of like this:

Name Site Email
John Doe http://www.johndoe.com john@johndoe.com
Jane Roe http://www.janeroe.com jane@janeroe.com
Ron Mexico http://www.ronmexico.com  ron@ronmexico.com

Here's what you need to do:

Step One: Transfer the HTML Data To Excel

  • Click-and-drag to select the HTML data.
  • Copy and paste it all into a single column on a sheet–easy.

Step Two: Remove Extraneous Data

  • Every row in the spreadsheet will now have extraneous data, such as "Name:" or "Site:"
  • Select the entire range, and use the Replace function to delete any extraneous data.  (Replacing "Name:", for example, with nothing will delete it.)  Repeat as necessary.
  • You should be left with only the data you want to display in columnar form, but trapped in alternating rows.

Step Three: Filter and Copy the Data by Row Type

  • OK, now it gets tricky.  We need to configure the data in the column so that only the rows of a certain type are visible, and all the other cells are hidden.  Then we'll copy and paste the visible cells to a separate column on a new sheet, repeating the process for each separate type of row.
  • In the column adjoining the one with the data, insert the following formula in the first cell:

    =MOD(ROW(1:1),x)=(x-(x-1))

    where x = the number of different types of rows in your original data set, i.e. the number of columns your final Excel spreadsheet should have.  (In the example above, x =3: Name, URL and Email.)  The MOD function returns the remainder after a number is divided by a divisor.  Here's the syntax: MOD(number, divisor).  The formula above essentially says, "Divide this cell's row number by x; if the remainder is equal to (x-(x-1)), return TRUE; if not, return FALSE."

  • Having entered the formula in the first cell, grab the handle and extend it down the entire column (or just copy and paste, whatever method you prefer.)
  • You should now have two columns: one with your data in alternating rows, and a second with cells reading TRUE or FALSE.  If you have three types of rows, every third row, beginning with the first, should read TRUE and the others should read FALSE.
  • Insert an Autofilter so you can sort the data by the TRUE and FALSE values.  Go to Data > Filter, and select "Autofilter".  A little rounded box with a downward-pointing arrow will appear in the top cell of each column.
  • Click the rounded box at the top of the second column, and select "TRUE".  Only the rows with TRUE as the value of the second column will remain visible; all others will disappear.
  • Select the visible cells.  If you have a large amount of data–and you should; otherwise, why the hell are you doing this?–hit F5 > Special > Visible Cells Only.
  • Copy and paste the selected cells to a new sheet.  You should have two columns, the first one consisting only of data from rows with TRUE in the second column from your original data set, and the second one a series of TRUE values.  Delete the second column and return to the original sheet.
  • Click the rounded box at the top of the second column (the TRUE/FALSE column, which now shows only TRUE values), and select (All).  All of your original data should now be visible again.
  • You now need to alter the formula in the second column, so that it reads as follows:

    =MOD(ROW(1:1),x)=(x-(x-2))

    where x = the number of different types of rows in your original data set, i.e. the number of columns your final Excel spreadsheet should have.  As you can see, we've added 1 to the last number in the final variable, increasing it from 1 to 2.

  • Repeat the steps above x times, increasing the last number in the final variable by 1 each time until it equals x.  At that point, substitute 0 for the entire final variable before proceeding with the final series.  Each time you complete these series of steps, you'll have a selection of data to paste to your second sheet consisting only of data from one type of row in the original data set (along with some TRUE values, which can be deleted.)
  • The final result should be an Excel sheet with data in columnar form, which will allow you to easily select all the data in a single column, such as names or email addresses.

You can thank me later.

Hulk HoganEinstein

tag:

5 Responses

  1. After weeks of breaking our heads, and asking everyone we know, how to convert over 1,000 rows of names and addresses into columns, it occurred to me to search the net on how to easily accomplish this and I found your blog. With your easy to follow instructions it took seconds to convert the information from rows to columns. THANKS SO VERY MUCH FOR SHARING KNOWLEDGE…YOU DON’T KNOW HOW MUCH WE APPRECIATE!!!

  2. Marlene, I’m so glad this was helpful–it’s a great example of the blogosphere at work! Thanks for taking the time to let me know–you made my day.

  3. Ed,
    I stumbled across this site while searching for other Excel tips. Wow, I have needed to do this in the past and will again in the future. What a time saver! Thank you so much for sharing

Leave a Reply

Discover more from Ed Batista

Subscribe now to keep reading and get access to the full archive.

Continue reading