PDA

View Full Version : An excel formula that skips duplicate data

Ninjabear
10-08-2011, 07:52 PM
I was wondering if there's a formula that allows me to find out the number of accounts in total from a excel spreadsheet

There are columns going down that lists the account numbers like

2001080
2001082
2001082

What I'm after is there are about 1000 cells going down and some are duplicates.
Is there a formula that can tell me the number of accounts in total excluding duplicates?

Thanks

Jen
10-08-2011, 07:56 PM
What about using Data Filter > Advanced filter > Unique records only?

Ninjabear
10-08-2011, 08:43 PM
Hmm.. I will try that tomorrow. Thanks

Phil B
10-08-2011, 08:54 PM
A couple of IF's & a SUM should do that

Sort the account numbers into numerical order. This will group duplicates.
Assuming your data starts in A1, if not adjust the cell reference numbers to suit.
In another cell, Z1 if you like, enter this formula
IF(A1="","",IF(A1=A2,"",1)) & drag it down to the last row of your data.
Then in another cell enter the formula SUM(Z1:Z1000) & it will give you the number of accounts with the duplicates removed.
There's probably a much more complicated formula, but that means I'd have to think http://pressf1.co.nz/images/icons/icon12.gif

I presume in Excel you drag the formula down by the handle in the lower right hand corner of a cell, the same as OOO Calc.

Ninjabear
11-08-2011, 11:21 AM
A couple of IF's & a SUM should do that

Sort the account numbers into numerical order. This will group duplicates.
Assuming your data starts in A1, if not adjust the cell reference numbers to suit.
In another cell, Z1 if you like, enter this formula
IF(A1="","",IF(A1=A2,"",1)) & drag it down to the last row of your data.
Then in another cell enter the formula SUM(Z1:Z1000) & it will give you the number of accounts with the duplicates removed.
There's probably a much more complicated formula, but that means I'd have to think http://pressf1.co.nz/images/icons/icon12.gif

I presume in Excel you drag the formula down by the handle in the lower right hand corner of a cell, the same as OOO Calc.

I'm trying this formula

So i went into a random empty and ploted this formula
=IF+A1:A420(A1="","",IF(A1=A2,"",1))

Not sure if this is right?It goes down to 420
but it comes up with "#Name?"

Phil B
11-08-2011, 03:38 PM
=IF+A1:A420(A1="","",IF(A1=A2,"",1)) Will never work

Download the file from http://www.4shared.com/document/iHay3plZ/Duplicates.html It shows how it works.. No macros are used.

Before you do anything & as a backup, save the file you're playing with to somewhere else , just in case.

Column A is the account numbers
Column K is a helper column for sorting
Select the whole sheet, then unselect rows 1-8. You do this so when you sort, it moves all the cells in the rows at the same time & you don't end up with a jumbled up mess, whilst leaving the totals at the top, where you can see them.
Go to "Sort" wherever that is in Excel & sort column A into ascending order. That will group all the duplicated account numbers together. Column E shows the total of all the account number entries in column A, minus the duplicates. Column G shows all of the account number entries in Column A including the duplicates. Cell I7 shows the total number of duplicates. Click E9,G9,E7,G7 or I7 to see the formulas in the formula bar. I also conditional formatted column A to show duplicate cells with a red background after they've been sorted into numerical order. Makes it easier to find the duplicates when scrolling down. I'm not sure if that part will carry through from Calc to Excel. If you want to put it all back in the original order, do the same selection process again & sort column K into ascending order.
You'll have to change the cell references (A1,B2,C3 etc) to suit your sheet. You can use any spare columns for the formulas. I used E G & I. Just make sure the column you reference for the account numbers, is the column that has the account numbers. In my case it was column "A" IF(A9="","",IF(A9=A10,"",1))
Don't forget when you try it on your sheet ,to drag the handle on the bottom right hand corner of the cell you have entered the formula in, right down to the last row that has an account number in it.
Hope that helps

andrew93
11-08-2011, 05:47 PM
Here's a formula solution that doesn't require any sorting. Assuming your list starts at A2, enter this formula in cell B2 and copy down to the bottom of the list.

=IF(COUNTIF(A\$2:A2,A2)=1, MAX(B\$1:B1)+1, "")

The number of unique entries will be at the large number towards the bottom of column B. You can have a text value in cell B1 and the formula will work a-ok - just don't have a numerical value in cell B1. If you put this formula into another column (other then B) then make sure you use the column appropriate column reference after the MAX part e.g. if you put this into column C, then it should be MAX(C\$1:C1).

Phil B
11-08-2011, 07:39 PM
Good one Andrew. That one sifts out the duplicates & puts them at the end. Nice & simple
My solution, although more messing around, gives the total he was asking for.
As mine pairs them up, he can use it to check to make sure all the entries are the same/correct for the duplicate accounts. Then use yours afterwards to shift them all into one place & delete them if he/she wants

Phil B
12-08-2011, 11:37 AM
http://www.4shared.com/document/FNxrsq45/Duplicates.html

It's a cleaned up version of the previous sheet. It's got Andrews formula as well so you can see how they each work.

Andrews solution shows all the duplicated entries from where the formula column (column i) goes blank (rows 100-118).
Select the whole spreadsheet, un-select rows 1 to 8, sort column A into ascending order & it groups all the duplicates together. It also gives you the total number of duplicates in E7. If the conditional formatting carries through to Excel, each red row is a duplicate account number entry of the row below.
To go back to how it was, do the select process again & sort column G into ascending order.

andrew93
13-08-2011, 10:55 AM
I designed the formula to show the unique entries, not the duplicates. I interpreted the original post as wanting a count of the number of unique accounts, so I'm curious as to how it showed the duplicates grouped. I think that file you linked to has been removed.

Phil B
13-08-2011, 05:53 PM
http://www.4shared.com/document/KeCikA0x/Duplicates.html
I managed to get the file onto a computer with Excel on it. The conditional formatting doesn't work & it's to much of a learning curve to get it to work in a couple of minutes ;) Otherwise it's ok.
Select the lot except the column titles etc & sort column A, all the duplicates are paired next to each other in numerical order & the total number of duplicate entries is in E7. I thought of doing it that way, as it is easier to compare the two entries, so one can be deleted. If the CF worked, it would show the top row of each duplicate pair as red. Do the select again & sort by the helper column to put it all back to how it was. Even the select & unselecting process is different. It appears that in Excel you cant select the whole sheet & unselect the top rows. More playing needed I think.

andrew93
13-08-2011, 10:28 PM
Hi Phil

I had a look at your sheet and I think you mis-applied my formula. Per my post I recommended the formula was entered into row 2. Given you have entered the first formula into row 9, the formula needs to be entered like this:

=IF(COUNTIF(A\$9:A9,A9)=1,MAX(I\$8:I8)+1,"")

Sorry I didn't spell this out in my post that in the event it was entered into a row other than 2 that the row number needed to change for the first formula. Once amended, you should see it gives an incremental number to each unique item, and it doesn't require the data to be pre-sorted.

Cheers, Andrew

Phil B
14-08-2011, 03:08 PM
Hi Phil

I had a look at your sheet and I think you mis-applied my formula. Per my post I recommended the formula was entered into row 2. Given you have entered the first formula into row 9, the formula needs to be entered like this:

=IF(COUNTIF(A\$9:A9,A9)=1,MAX(I\$8:I8)+1,"")

Sorry I didn't spell this out in my post that in the event it was entered into a row other than 2 that the row number needed to change for the first formula. Once amended, you should see it gives an incremental number to each unique item, and it doesn't require the data to be pre-sorted.

Cheers, Andrew

Aha. I see that now.. He/She's now got 2 different solutions to play with
Cheers