PDA

View Full Version : spreadsheet sorting



Nigel Thomson
08-09-2004, 06:29 PM
Greetings all

I am currently trying to sort a spreadsheet, the problem for me is that I want to sort by "occurrence" ie how often a specific model occurs, and then display it

I know the formatting on here is going to look screwy but here goes

unsorted Data

MODEL.....PARTA......PARTB
a1.............g456.......324b
a3.............h356.......321n
a2.............g456.......324c
a3.............z766.......324c
a3.............g456.......324b
a6.............g346.......324c
b2.............g434.......334b
b2.............z564.......334b

This is what i want it to look like

Sorted Data

MODEL.....PARTA......PARTB
a3.............z766.......324c
a3.............g456.......324b
a3.............h356.......321n
b2.............g434.......334b
b2.............z564.......334b
a1.............g456.......324b
a2.............g456.......324c
a6.............g346.......324c

Make any sense?

I have both open office and MS office 2003 (slightly more familiar with this one)

andrew93
08-09-2004, 06:52 PM
Hi Nigel, I see you have resorted to Excel rather than Access for your multi-part model problem.

If I understand you correctly, you want to count the number of parts by model number and then sort the list of models (and their associated parts) by the number of times a model number "occurs" in descending order - is this correct? (And where there are multiple models with the same number of parts then you sort the list based on the model number in alphabetical order) This would be quite easy in Access but I'm struggling to think of an easy way in Excel.

The first way that comes to mind would be to import your Excel list into Access and do it in there (import spreadsheet, new query with subtotals, sorted on the descending count of the parts and sorted secondly on the model number, export results back to Excel) - very easy, would take about 2-5 minutes.

But you are using Excel, and in the absence of a macro (which isn't my forte) or a standard function of which I'm not aware (guys, please correct me if I am wrong) and if I were forced to use Excel, I would firstly sort the list based on the model number and do a count on the parts by model number in a new column using an if statement and a counter, then recalculate the counts in a new column to force the values for each model to be the same, copy and paste those values and then you have a field that you can use in the standard Excel sort function. Very very messy but it will work with a bit of time and thought regarding the counters.

HTH

Nigel Thomson
08-09-2004, 07:16 PM
Hey Andrew

actually I am still using Access

and splitting the table up as below

MODEL.....PART.......Description
a1.............g456.......PARTA
a3. ...........h356.......PARTA
a2.............g456.......PARTA
a3.............z766.......PARTA
a3.............g456.......PARTA
a6.............g346.......PARTA
b2.............g434.......PARTA
b2.............z564.......PARTA
[/b]
a1.............324b.......PARTB
a3. ...........321n.......PARTB
a2.............324c.......PARTB
a3.............324c.......PARTB
a3.............324b.......PARTB
a6.............324c.......PARTB
b2.............334b.......PARTB
b2.............334b.......PARTB --duplicate values cause problemos

but....

this leads to some duplication or non unique values
and the model number and part number are a joint primary key

this caused me a major problem with Access I kept getting errors and append queries kept stuffing out, so I thought i would do this sorting out of duplicates in excel before the table got split into the above format.

it made perfect sense to me at the time, (now it seems clear as mud)

Hmmm it looks difficult so i may go into Access and
-try it in a seperate table
-with no keys
-sort out the duplication probs
-re-export it to the table with the primary keys

--did that sound like it will work?

Thanks Nigel

andrew93
08-09-2004, 09:02 PM
From what you have said, I presume you are using the intermediary table we discussed in your other thread. It appears the intermediary table has the following data :

> MODEL.....PART.......Description
> a1.............g456.......PARTA
> a3. ...........h356.......PARTA
> a2.............g456.......PARTA
> a3.............z766.......PARTA
> a3.............g456.......PARTA
> a6.............g346.......PARTA
> b2.............g434.......PARTA
> b2.............z564.......PARTA
> a1.............324b.......PARTB
> a3. ...........321n.......PARTB
> a2.............324c.......PARTB
> a3.............324c.......PARTB
> a3.............324b.......PARTB
> a6.............324c.......PARTB
> b2.............334b.......PARTB
> b2.............334b.......PARTB --duplicate
>
> values cause problemos
>
> but....
>
> this leads to some duplication or non unique values
> and the model number and part number are a joint
> primary key

I'm not sure how the duplicate entry managed to get into the table if the model number and part number are joint primary keys in the table - this might be worth looking at (e.g. check the joint primary keys been set up correctly in the intermediary table plus check to see if one of the entries doesn't have a space or another "hidden" character such as a full stop etc.) - until this is resolved then you are pushing the proverbial up a hill.

> this caused me a major problem with Access I kept
> getting errors and append queries kept stuffing out,

not sure why you are using an append query - I would have thought a simple select query would have been enough but fill me in with more details if you like

> so I thought i would do this sorting out of
> duplicates in excel before the table got split into
> the above format.

Oh - hence the Excel question

> it made perfect sense to me at the time, (now it
> seems clear as mud)
>
> Hmmm it looks difficult so i may go into Access and
> -try it in a seperate table

Use a query instead of a table - the query will source it's data from the intermediary table - create a new query, add the intermediary table (model parts?), add the three fields, view totals, group by model, group by part, count description, set criteria for description to >1 and you should find the duplicates. You can use a similarly designed query for your original Excel problem and others.

> -with no keys

If the keys have been set up correctly then the data will be right, no need for specific keys in a query because the query gets its attributes from the table

> -sort out the duplication probs

I recommend you look at your data, or the table design to sort out the duplicates or use the query I recommended above

> -re-export it to the table with the primary keys

Rather then re-export the results to a table, you can view the data from the query. Safer that way - if changes are made to the data then the query reflects that plus you don't then have mulitple versions of your data in your database

> --did that sound like it will work?
>
> Thanks Nigel

This might take a little bit of to-ing and fro-ing but if you have got duplicate data problems then there will be problems with append queries etc. This needs to be sorted out and I suspect the problem will be with the keys in the intermediary table, although I did recommend a couple of other possibles above. (I trust I am reading the problem correctly - let me know if I'm not).

I've gotta go out for a few hours (driving range :D) so will be back in about 2-3 hours, let me know how you get on.

Andrew

parry
08-09-2004, 09:52 PM
Hi Nigel, in order to do the sort you will need to sort by another column. Assuming your data starts at A2 (with A1:C1 being your column headers Model, PartA, PartB) then enter this formula in D2 and copy down...

=COUNTIF($A$2:$A$9,A2)

This will count the number of times the Model number will appear.

Select Data|Sort then do a sort first by Column D (choose Descending sort option), then by Column A (choose ascending option to keep Models that have the same frequency in alphabetical order).

hth

Nigel Thomson
09-09-2004, 11:32 AM
Ok I think I must have explained this a tad poorly
actually I think it's due to me calling them tables when they are still .csv files

I am scanning in datasheets (160ish pages worth)in the form of tables with 40 rows and 26 columns, (split over two pages so two 40 x 13 tables), these are then being stored as tab deliminated text files (.csv)

So they are laid out similar to that below

TABLE 1 this is in .csv form
MODEL.....PARTA......PARTB --->24 more parts
a1.............g456.......324b
a2.............g456.......324c
a3. ............h356.......321n
a3............. z766.......324c
a3.............g456.......324b
a6.............g346.......3 24c
b2.............g434.......334b
b2.............z564.......334b

now what I need to do is to get each part to link to a price in another table
TABLE 2
Part........Price
g456........5.60

I have found that to get it work I need to convert the first table to this format below
TABLE 3 this is also in .csv form
MODEL.....PART.......Description
a1.............g456.......PART A
a3. ...........h356.......PARTA
a2.............g456.......PARTA
a3........... ..z766.......PARTA
a3.............g456.......PARTA
a6.............g346.... ...PARTA
b2.............g434.......PARTA
b2.............z564.......PARTA

a1.............324b.......PARTB
a3. ...........321n.......PARTB
a2.............324c.......PARTB
a3........... ..324c.......PARTB
a3.............324b.......PARTB
a6.............324c.... ...PARTB
b2.............334b.......PARTB
b2.............334b.......PART B

and it's very simple to connect to the price table

but in converting from table1 layout to table3 layout, I get some duplication (check out the last 2 entries on table 3).
The table I wish to create is exactly like table3, but obviously in Access, and to make it work logically the model number and the part number are used to form a joint primary key


I know i can set it up so it only includes unique indexes, but I first need to find the non-unique indexes, double check them and forward them to the boss, hence the fact I would like it to display the duplicates.

Russell D
09-09-2004, 12:25 PM
Using Excel..

If you load your .csv file into Excel and the MODEL, PART, DESCRIPTION data is in separate columns - if you use Data/Filter/AdvancedFilter - and select - filter in place - unique entries only - and the MODEL column entries as the Criteria. Copy the result to another worksheet, then delete the result and select Data/Filter/ShowAll, and the remaining entries will be non-unique ie duplicate entries. Note: These entries will still be in their original positions in the table, but a Sort will bring them to the top.

parry
09-09-2004, 12:27 PM
OK, so its duplicate checking your after. See Chip Pearsons's site (http://www.cpearson.com/excel/duplicat.htm#HighlightingDuplicates) for various techniques on finding duplicates.

A simple alternative is to concatenate cells that together make up your unique indicator (ie your Model & Part #) then do a Countif on that column with any result >1 being a duplicate.

eg pretending your model and part numbers are in columns A&B respectively then in C2 enter this formula and copy down
=A2&B2

Now in cell D2 enter this formula where your data range goes from row 2 to 1000...
=COUNTIF($C$2:$C$1000,C2)

You can then do a Filter (or sort descending) on the Countif column.

hth

Nigel Thomson
09-09-2004, 12:31 PM
Thanks parry

i got that function working (found a lot of scanning errors too) so once i have fixed it all up, I'll be back

Nigel Thomson
09-09-2004, 12:34 PM
heck

I was replying to the first post from parry, and by the time I had finished typing there are two new posts, must learn to type faster

andrew93
09-09-2004, 09:13 PM
Hi Nigel

I had a go at importing your sample file and finding the duplicates and think I have the answer. Apologies in advance for the long reply but I needed to do some explaining along the way.

Assuming you are (sort of) happy with table 3 then I will work with that (plus it's in the right layout to support the many-to-many relationship we discussed some time ago). Firstly, I'm going to assume that where there are duplicate entries that you actually want them stripped out as opposed to using introducing another variable such as quantity. (Food for thought : if model b2 has part number 334b twice, does this mean that part is used twice in the model? Or is it a scan / import problem? If it is used twice then would you want that quantity recorded somewhere in your model / parts table?)

I imported your file into access into a new table (I used a txt version and stripped out the dots using a find and replace function), on importing the file I gave appropriate names to the various fields in the process (i.e. "model number", "part number" & "desc" - I have used these variable names in my example below). The trick here is to not have your indexes set up, in other words I imported it into a new (temporary) table with no indexes. If you try to import that data into a table with the indexes already defined then you will get errors for the duplicates (as you did with your append query) - you can either accept the import / append errors and let the duplicates drop out (which is not what you want), or you can identify the errors beforehand and maybe do something about them.

From your post it appears you would rather identify the duplicates and then do something about them so it would be better to import the data into a temporary table (e.g. table = "import_test_xx" or named something else)

First up, it is much safer to do your importing etc into temporary tables, that way if you stuff it up then you won't muck up your real tables that are the foundation of your database. Once you are happy with the data in the temporary tables then you append the data to the real tables using an append query (possibly as part of a macro for a higher level process?).

Anyway, once your data is in the temporary table (duplicates and all given there are no indexes), to create a new query to identify the duplicates, do the following (pardon the completeness of the instructions) :

Query, New, Design View, select table "import_test_xx" (or whatever you have called it), Add, Close, first field select "model number" (or whatever you have called the model variable), second field select "part number", third field select "desc" (the part a, part b bit), View, Totals, change the "group by" for desc to "Count", in the criteria field under "desc" enter ">1" (without the quotes), View, Datasheet View - this will give you a list of the duplicate entries and the number of times they have been duplicated. Save the query.

I think this has answered your question, but the next thing I think you should look at is why the duplicates are appearing.

Were the duplicates in the original data that you scanned? Or did they appear only after you converted from table 1 to table 3? If that is the case, then you might want to have another look at your query that was used to create table 3.

If they were in the original data that's cool but I would be concerned about stripping data out that might be needed (in other words, there was no duplicate, the append query inadvertently changed the data and now we strip out the duplicate and the original record may be lost) - so, you might want to have a count of some sort between the various steps - count the number of part records before and after the various stages of converting from one table to the next. This way you will know your data is incomplete.

HTH, Andrew