TOSECdev Forum

TOSEC Project => Database / Datfiles => Topic started by: Crashdisk on September 25, 2012, 04:55:43 PM

Title: Special DAT for bad dump
Post by: Crashdisk on September 25, 2012, 04:55:43 PM
Hi,

I need to create a special file DAT with special specifications to clean the AMIGA database (but no only). Why?
 - Underdump
 - Overdump (unless track 80, 81,...)
 - Duplicate track (merged DMS. ex: DEMOa.dms with track 00-40 + DEMOb.dms with track 40-79)
 - Auto modification of bootvirus (counter, mutation as Lamer Exterminator)
 - Some mutilation caused by viruses (saddam disk validator) which are reversible.
 - bad dumping
 - Read error when transferring in ADF (bad sector)
 - Read error during dms dump (classic errdms tag...)
 ...
This is a real pollution of unnecessary files and unusable.
Many mistakes can be corrected, or sometimes identical files exist but without errors.
I've created a dat file but compliance with the TNC is complicated because we must not waste time on maintenance.
I mean that we should not change the name of a bad file because the good has changed its name.
At the moment I use this model:
3F76E4F50150DCBA5B57ED18101D1EA2 [73D36888][b saddam damage].adf
3F76E4F50150DCBA5B57ED18101D1EA2 [CAEA6070][b saddam damage].adf
...

the MD5 hash of the good version => 3F76E4F50150DCBA5B57ED18101D1EA2
the CRC32 of the bad version => [73D36888]
the cause of the bad version => [b saddam damage]

Or more long, a new tag "ZZZ-BAD-" :
ZZZ-BAD-3F76E4F50150DCBA5B57ED18101D1EA2 [73D36888][b saddam damage].adf
ZZZ-BAD-3F76E4F50150DCBA5B57ED18101D1EA2 [CAEA6070][b saddam damage].adf

Suggestions?
Title: Re: Special DAT for bad dump
Post by: TKaos on September 26, 2012, 05:30:47 PM
I dont really see the need to create special datfiles for all the other dumps, it only increases the amount of DATs we have.
It just creates extra work on DATs which I dont see a point of, anyway you didnt say if you need the DAT for yourself or if u want it for the project standard, I guess second.
But in the end it has been talked about lots of time, if you want perfect files only then you simply dont use TOSEC DATs and rather get a gamebase or DATs of projects that collect good dumps only.
Title: Re: Special DAT for bad dump
Post by: Crashdisk on September 26, 2012, 06:32:12 PM
GameBase? I'm not talking about games, but dumps in general. We integrate sometimes damaged files by mistake or want of better. When a version is integrated without error, the former becomes annoying because you have to maintain it like any other fileset. Again we lose time to brew waste. I repeat, do not exclude the hack, crack, programming error or even viruses but the damage post production. The alternative would be to simply remove them from TOSEC db but it is the worst solution ....
Title: Re: Special DAT for bad dump
Post by: PandMonium on September 26, 2012, 11:24:03 PM
Hey guys,

The aim of the project is to identify all these sets. Without them in the dats, renamers will not be able to identify them as something already checked/datted and will repeat work forever. That's why they are kept and renamed in the dats and there is no point in dumping all this information.

Still, i do think we should provide some way / tool to filter an existent dat, removing the sets a user might not want to be included in the datfile (based on our flags). Hopefully i will manage to do that someday :P
Title: Re: Special DAT for bad dump
Post by: Crashdisk on September 27, 2012, 01:11:05 PM
Could we arrange a private space on the server to share dat files for internal use? early dat WiP, very bad files DB, ....
Title: Re: Special DAT for bad dump
Post by: PandMonium on September 28, 2012, 02:51:49 PM
Sure, you can put them anywhere you like. Will pm you for details. :P
Title: Re: Special DAT for bad dump
Post by: mai on October 02, 2012, 07:33:54 PM
Said it many times before, i dont like those overdump and underdump and also this unique [b errdms] flag, should not be included in TOSEC, all this bad dumps are result of incorrect dumping procedure.
In most cases, we have good dump from the same image.
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 02, 2012, 08:10:06 PM
Well i understand the situation considering Amiga but i don't see any optimal solution, all have different problems.
Title: Re: Special DAT for bad dump
Post by: mai on October 02, 2012, 08:40:23 PM
Well i understand the situation considering Amiga but i don't see any optimal solution, all have different problems.
I am aware, that i would never change anything with my opinion, in any case my job is rather (scene)software preserving, instead of collecting and including any garbage.
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 02, 2012, 08:49:32 PM
Well, since this is a collaborative project, changes start with brainstorming and proposal of solutions. We shouldn't change things easily and without care because that was one of the causes of TNC complexity. Still, i do consider that in this case the problem is relevant / interesting and something so it will indeed be considered.
Title: Re: Special DAT for bad dump
Post by: Maddog on October 02, 2012, 09:39:49 PM
I support the idea of a "clean-up" of TOSEC dats.
I think we could add one dat per system (no need to have separate dats for Demos, Games etc since we already have tons of folder in a complete TOSEC tree) that includes all bad dumps from that system.

This way you get the best of both worlds.
-Remove junk files from regular dats.
-Reduce size of complete downloads for anyone that only wants a complete usable collection.
-Keep hashes of known bad dumps in a specific place and still be able to recognize them easily.
-Obsessive collectors of roms that want every piece of crap out there can still download the bad files if they wish. All others will just ignore the "bad dump" dat.
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 02, 2012, 11:14:36 PM
I got the idea.

Based on what Crashdisk and mai explained, there are some systems (namely amiga and c64) where a lot of dumps float. Many (most?) of those dumps are bad dumps but there are also a lot of variations of original sets created by dumping modified disks (unfortunately not write protected). Our goal is to catalog images and not to create a purely original, best dumps collection, however the situation in these systems is indeed ridiculous.

By adding so much of these sets in the last years, the project in part even helped preserving them. I recall that idoru was already avoiding the addition of new alts and bad dumps for amiga due to the same reasons and i do support the idea, since following this path will end in 50000 new repeated or unused sets.


Now the other side... for good and bad those unrenamed sets will not go away since they are datted and collected by some as we know. If these sets don't get cataloged, the current renamers will keep receiving the same sets to verify again and again. This probably already happens today, with mai and others testing sets (which don't get renamed) that were tested by idoru and others before (and also not renamed).

I do support the addition of (part of) these unrenamed sets, still if we add all these to the current dats the number of sets/dupes will skyrocket with many being even unusable. The creation of different dats might be the best (or less bad?) solution and should be done to help the renamers even if they take some time to be introduced in the catalog.

Still there are many questions. What really is a bad dump? What should be done with alternates and the many different types that exist (i imagine this may be a problem with the same dimension than the bad dumps' one)? What about sets that where only bad dumps exist? Over and under dumps?
Even more, is the fact that many of the current bad dumps and alts might after all be badly renamed (as mai is discovering :)). Plus, using a single dat per system might create an huge, unusable dat with "various" types of software but creating various is in many cases plain stupid. Here, the best solution would probably depend from system to system and its dimensions, just as it happens to the other dats.

Finally, i think (albeit i don't have knowledge about every other system) that this problems are common in a few systems.
Anyone has better ideas or opinions against the solution?


... and now as recreation, some random stats:
83,26% of the alternate images are in 5 systems (C64, Amiga, Spectrum, Atari ST and Atari 8bit), with 60.737 of a total of 72.494 alts.
69,51% of the bad dumps are in 5 systems (NES, Amiga, N64, Megadrive, C64) with 7.649 in 11.004.
If we look at percentages, we have Robotron Z1013 with 43% alts and NEC SuperGrafx where 83,3% of the sets are bad dumps (however the dat is just 6 dumps of the same software with 5 bad dumps), Game boy comes second with 54,65% of the sets being bad dumps but the dat is really small too.
All these numbers would actually change a lot by adding these dumps left out until now. :P
Title: Re: Special DAT for bad dump
Post by: mai on October 03, 2012, 09:39:52 AM
I like to use examples to show my opinion:
Charlie J Cool (1996)(NRC)(Disk 1 of 2)
Charlie J Cool (1996)(NRC)(Disk 1 of 2)bad
Charlie J Cool (1996)(NRC)(Disk 2 of 2)
Charlie J Cool (1996)(NRC)(Disk 2 of 2)bad
This are unnecessary overdumps.
First 880kb(standard Amiga image file size) data are byte per byte exactly the same, where is the reason to collect and catalog such stuff.
All those images <880kb and >880kb are result of faulty dumping.
Title: Re: Special DAT for bad dump
Post by: Maddog on October 03, 2012, 10:19:24 AM
Still there are many questions. What really is a bad dump? What should be done with alternates and the many different types that exist (i imagine this may be a problem with the same dimension than the bad dumps' one)? What about sets that where only bad dumps exist? Over and under dumps?
Even more, is the fact that many of the current bad dumps and alts might after all be badly renamed (as mai is discovering :)). Plus, using a single dat per system might create an huge, unusable dat with "various" types of software but creating various is in many cases plain stupid. Here, the best solution would probably depend from system to system and its dimensions, just as it happens to the other dats.

Finally, i think (albeit i don't have knowledge about every other system) that this problems are common in a few systems.
Anyone has better ideas or opinions against the solution?

I don't look to go the NoIntro way of 1 dump per game. Sometimes it might be hard to say which (alt) is best and it's even possible that spending energy on that isn't really needed. So, alts that are not CLEARLY bad should just stay in their existing dats. On the other hand, if something is absolutely, positively bad/over/under (like what Mai illustrates above...), then it should be removed from the normal dat and go to a clearly marked "Bad" dat.
This helps all ways:
-Less clutter for normal people
-One more dat to complete for the Pokemon "wanna have everything" guys
-Availability of the file in case it's misidentified as bad (nothing prevents moving something back from the "Bad" to the "Regular" dat if required)
-Fewer unidentified files to be checked and re-checked ad infinitum for the renamers

On the other hand, if something only exists as a "bad" dump, then it should stay in the regular dat, since that is the file people will want to collect (at least until a better option is available one day). These cases should be relatively few. Will need hand picking, ie no automatic creation of the "Bad" dat just by looking at the ["b"] flag, unless we try the MAME way and have a new flag along the lines of "Best available/No good dump known" for those few files, in which case the "Bad" dat can still be created automatically.

I supported the idea of a single "Bad" dat per system, regarding it as a pool of shit. Most people would want to avoid it completely, but some might still want to have it in their back yard and some others might even want to dive in to search for a single missing gem. But there's not any truly compelling reason to have multiple shitpools around. :P
If the renamers actually feel it's better to have multiple "Bad" dats for Demos, Games etc, it's fine by me. Suppose each one will know the needs of the system they are working with better.
Hope you get my idea...  ;)
Title: Re: Special DAT for bad dump
Post by: Crashdisk on October 03, 2012, 01:18:34 PM
I updated my program to detect a "new" type of corruption and the result is afflicting. 19 new bad files (need mai confirmation) for "Commodore Amiga - Games - [ADF]" starting from 0 to A.This is just the beginning of a flood of [b useless] ...
http://eab.abime.net/showpost.php?p=843000&postcount=526
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 03, 2012, 01:38:59 PM
I think i already got the idea.

We can simply see it as just a normal new dat category. When there are a lot of sets we already divide things further, for instance from using just a "Games" dat, to various "Games - [EXT]" or even further with "Games - Public Domain - [ext]", collections, compilations and so on. Years ago we even had several "Various" dats with a mixture of everything.

Some of my concerns are with the possible loss of information by creating single those new "Various" bad dump dats per system that may contain similarly named files from different dats (because the roms have different extensions, are different types of software and so on). Again, this really depends from system to system and common sense from renamers :)

At least we agree on:
- Documenting the existent sets, and not the single best ones
- There are renaming mistakes, existent alts and bads might be actually something different but not properly checked
- Some software might only exist with dumping errors or already edited, creating some kind of alts / modifications

Things are different across systems, you can have Amiga with 2313 bad dumps already (which could increase a lot i guess) or C128 with only 19. In some cases it might be logical / helpful to have only a "Bad Dumps" and others might need "Bad Dumps - Games" (...).


Finally, you said alts should remain as they are but i suspect (mai will know better, at least in Amiga) that alternates are a far bigger problem going by the current statistics. The definition of the flag (alternate) is not clear for many or at least it was never followed as it was supposed, plus there is a lot of confusion with the hack/modified and others. I view the flag as something used to identify different versions of existent (mostly) original sets - for instance a copy with harder enemies, different background in a game and so on. Original sets that were modified due to unprotected media should be marked with a modification flag, such as [m highscore], [m savegame].

Nowadays we have already *tons* of alts which are probably wrongly renamed sets, marked alternate by laziness or lack of knowledge/information from the renamers and used as a super speedy way of adding tons of sets. The numbers with alternates are or a different magnitude than bad dumps but if renamed correctly they will mostly be bad dumps or modifications (and we will start having tons of it, which in the future will also probably be discussed to suffer the same reorganization :)).

As an example, only in Commodore C64 we have 30.088 alternate flags, the Commodore Amiga - Games - [ADF] dat alone is 26,6 % alts (7261).


My current position is in favor of the idea of moving those bad dumps, especially in cases where the number is high, based on renamers feedback (mai/Crashdisk). New sets can be renamed to save future work and things will still be cataloged and we will improve in organization.
Still, it would be interesting to have more opinions from others too (members or not).
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 03, 2012, 01:41:41 PM
I updated my program to detect a "new" type of corruption and the result is afflicting. 19 new bad files (need mai confirmation) for "Commodore Amiga - Games - [ADF]" starting from 0 to A.This is just the beginning of a flood of [b useless] ...
http://eab.abime.net/showpost.php?p=843000&postcount=526
Nop, it is a flood of renaming improvements. Probably changing a lot of the unclear "[a something]" (or any other flag) to a proper naming with [b something] / information. I fear the problem occurs also in other systems too but we don't have the man power and capabilities there. Hopefully in the future you guys may get interest in other systems too :D
Title: Re: Special DAT for bad dump
Post by: Crashdisk on October 03, 2012, 01:51:37 PM
[b doscopy] flag denotes a copy of a disk with overwriting data on a strategic point of the disk (rootblock). If we now better informed of the problem, it is still useless.
Title: Re: Special DAT for bad dump
Post by: Crashdisk on October 03, 2012, 02:14:12 PM
Separation of bad DAT require more maintenance because of exchange set a DAT to another (PD Game => Game / Demo - Various => musicdisk ....), besides changing names in good AND in the [EXT] dat file
Acoustic Revolution 3, The (19xx)(-)(Disk 1 of 3)
=> Tune Show III (1990)(The Acoustic Revolution)(Disk 1 of 3)

We must also change the name in the [EXT] file
Acoustic Revolution 3, The (19xx)(-)(Disk 1 of 3)[b dump]
=> Tune Show III (1990)(The Acoustic Revolution)(Disk 1 of 3)[b dump]

My idea of ​​changing the name of the bad file with the MD5 hash of the correct version has two advantages:
  - The name does not change if the name of the correct version changes
  - It keeps track of paternity

Sound of Silents (1990-08-31)(Silents)[o ].adf
This is overdump of which version?
Sound of Silents (1990-08-31)(Silents).adf
Sound of Silents (1990-08-31)(Silents)[a].adf

Remember that the names change, not hashes...
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 03, 2012, 03:15:49 PM
I've to leave now but will post again later :P

Renaming things to an hash is something that i do not agree with because it makes the files much harder to search. Even between the renamers, it is harder to look for an hash than for a simple name of the title you're testing. More importantly it goes in the opposite direction of the goal we pursue. If i have a set that changes from something to a hash, it wont help me a lot, at least with a real file name i know what it is and that i should search the non bad version.

Even if the files are bad, knowing why and what they are might be useful for people trying to recover other versions, and other ends. I understand that it might give more work to rename various images too but if renamed right, further renames don't tend to happen that much. Also, the idea (IMO) should be to move them carefully there, once tested and not to batch move every found bad dump (or worse, every currently marked [b ]) to another datfile.

As for the origin of each dump and their relationship, they all come from physical media (disks), not from other roms. In the case you described the alt is probably badly named.

Imagine
"Sound of Silents (1990-08-31)(Silents)(DE)"
"Sound of Silents (1990-08-31)(Silents)(PT)"
"Sound of Silents (1990-08-31)(Silents)[cr PDX]"
"Sound of Silents (1990-08-31)(Silents)(DE)[m savegame]"

These 4 dumps would have been created from 4 different disks (or at least 3, in different times for 1st and last). If i somehow get an Amiga disk, dump it and create a bad/over dump, the overdump origins from one easily identifiable set given the differences. If the 4 were just called [a] to [a4] it would be hard. I understand that there are a lot of cases, many sets are probably dupes due to popular games being dumped a lot and containing different savegames or highscores, dumping errors and all that. In addition there can be even dumps edited now just to create more garbage.

This means that the paternity thing  is hard to establish many times, especially based on these dumps. If you see MAME or other projects (goodmerge?), some times the idea is used to group versions of the same game that share a lot in common and not a direct relation to one, single dump. In your example, one set (probably one of the originals, untouched and based on location world/older/etc) would be seen as parent or they could just all be viewed as "Sound of Silents" versions.


Your points are issues in the renaming process / information managing that must indeed be solved, but with decent solutions. We can not save every bit of information in the file name or expect to manage relations and multiple name changes based on it. These issues (some previously discussed) of relationships and ease of renaming will hopefully be solved some day but it is hard to please all renamers :|
Title: Re: Special DAT for bad dump
Post by: Cassiel on October 11, 2012, 04:01:41 PM
Yeah, this issue bubbles back up every couple of years.

I agree having bad (b,o,u,v) images perpetually shared and collected is far from ideal, but having the same bad 'new' images constantly submitted/reviewed is even less so.

For the record, I'm not a fan of having separate 'bad' DATs at all. Never have been actually.

When we used to have real time DAT generation through the website, you used to able to toggle whether you included bad images or not. I always thought that was a very elegant solution, putting the choice in hands of end users without losing any information/hashes. Sadly no one else agreed (with the whole TOSEC  DAT Generator thing I mean).

For a long time I've had "Investigate ClrMamePro's <baddump> flag" on the unofficial TOSEC to do list, since I believe this can achieve the same thing - put the choice/control in hands of end user whilst still maintaining full catalogue of images.

I have zero free time atm… any volunteers to look into this? Anyone have any similar ideas?
Title: Re: Special DAT for bad dump
Post by: Crashdisk on October 11, 2012, 09:32:56 PM
I'm probably repeating myself, but no matter. The legacy of the past leads us to treat many alternatives that are sometimes bad dump more or less visible. However, when the damage is clearly identified and that it is a bad copy of a good already cataloged: Why continue to maintain? For whom and why? I regularly dump disk and fortunately I do not integrates all my waste. The problem is that some people do that! The dat Yori are also filled with waste that I do not want to integrate in TOSEC (as mai). Which wants to add many crappy dump? But the problem is that these disks are not identified / not renamed for average person and does not share our work between member TOSEC.

Here's what I suggest (with some changes compared to previous messages)
Extract the current file "Commodore Amiga - Diskmags (TOSEC-v2011-11-01_CM). dat" :
Code: [Select]
game (
name "Stolen Data - Issue 10 (1992-12-26)(Anarchy)(Disk 1 of 2)"
description "Stolen Data - Issue 10 (1992-12-26)(Anarchy)(Disk 1 of 2)"
rom ( name "Stolen Data - Issue 10 (1992-12-26)(Anarchy)(Disk 1 of 2).adf" size 901120 crc e1d31603 md5 62279dc9a6bb9a9daa4df19eb3a20bee sha1 049617a48250330a719c7d28e9018037b0f15e4e )
)

Extracted from my working dat file "Commodore Amiga - Garbage (PRIVATE-V2012-10-09_CM). dat" (note: in my dat file, I included the reference file) :
Code: [Select]
game (
name "Diskmags - E1D31603"
description "Diskmags - E1D31603"
rom ( name "Diskmags - E1D31603 [39DA2221][b doscopy].adf" size 901120 crc 39da2221 md5 92fae334465a2beecc08f248160d1f1b sha1 62d67a4819730b47aab7833094a42e8057d9e484 )
rom ( name "Diskmags - E1D31603 [F634A4C6][b sector].adf" size 901120 crc f634a4c6 md5 7a7064d91288920e5aaaea7c7f18fe88 sha1 dfd0c26fbc659bb6f1a73808b8576c43490691f1 )
rom ( name "Diskmags - E1D31603 [FB12BC3B][b doscopy].adf" size 901120 crc fb12bc3b md5 b349fd2bcfce0c588e8bac0759cf275a sha1 ee02690e24174c565e7bea650ba4d7cd590264c1 )
rom ( name "Diskmags - E1D31603 [FB12BC3B][b doscopy][part 1].adf" size 450560 crc f61b09a7 md5 3cdd1852ba75fdae60a08031b7328703 sha1 8168e40c0719daa7f79d6b79b6d68b6dd5cd6bb5 )
rom ( name "Diskmags - E1D31603 [FB12BC3B][b doscopy][part 2].adf" size 450560 crc d8f36a2a md5 0d92bac55c6494c352baccff740a5426 sha1 f17f0a67e0509abab1c561e77a2e3056fa9f3790 )
rom ( name "Diskmags - E1D31603.adf" size 901120 crc e1d31603 md5 62279dc9a6bb9a9daa4df19eb3a20bee sha1 049617a48250330a719c7d28e9018037b0f15e4e )
)

The use CRC32 as the reference name to avoid losing work identification without having to transmit the changes made in the current file.
Title: Re: Special DAT for bad dump
Post by: TKaos on October 12, 2012, 07:55:25 PM
I dont really like the idea of renaming the bad files with CRCs.
It's not even close to TNC rules, if you really want to have seperate ones, do it with their current name and if you wish so add the CRC of the good version do it with [more info] flag cause that's what the Info is.
In the end it'll be a lot easier to seperate then and I dont believe that everyone wants to split their DATs with that new naming.
Anyway have to agree Cassiel, not a big fan of it and we should rather provide an ability to generate DATs like the people want, cause that seems to be the better way.
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 13, 2012, 01:03:52 PM
Kinda agree but since we currently do not provide that, i understand that they want to rename / catalog all the bad dumps they are receiving.

So, imho you can do any dat you like as personal but to be included one day in TOSEC all the sets need to follow TNCs or it will not make sense. As i see it, you can still do an independent dat (bad dumps) if the number of sets is too much - which will probably be true with your new findings. Just like what happens with file formats, if we have a Games - ADF dat with 20k sets and half of them are bad dumps and where only 1000 different titles exist i kinda understand using different dats. Still, they should have TNC names and you could use more info to put the crc if you find that really valuable (questionable).

Meanwhile and since i know you don't like the idea of adding new dumps, just add the new ones to your private dats. In the future when we have this sorted or / and a system to help users filtering bad dump sets / dats they might be included.

Makes sense? :p
Title: Re: Special DAT for bad dump
Post by: Zandro on October 30, 2012, 03:21:31 AM
<rom name="Corruption (19xx)(-)[ b ].adf" size="814080" crc="2f8ba19c" status="baddump"/>
or
rom ( name "Corruption (19xx)(-)[ b ].adf" size 814080 crc 2f8ba19c status baddump )

(https://sites.google.com/site/zandro/baddumps.png)

(https://sites.google.com/site/zandro/corruption.png)

CMPro doesn't seem to mind it being there whether "Show 'baddump' ROMs & inverted CRCs" is enabled or disabled, it treats it as incomplete rather than unneeded. Meaning, if one chooses to automatically remove all such bad dumps, he must right-click > Delete > Currently Selected Set (or "All Listed Incomplete Sets" if the set is otherwise fully complete).

Any thought given to explicitly providing parent-clone relationships? Using 7z to eliminate redundancy would take the trivial concern of space out of the equation while tightening organization, as related disks with foreign titles would be united. I've always considered TOSEC a best attempt at a complete catalog, a polar opposite of No-Intro, highly parallel to GoodSets. If you don't mind this typecasting, then let the bads stick around. Otherwise, the garbage will simple have to be ignored, or worse, re-identified as such time and time again when each shipment comes in.

I am willing to help create such merge info if it isn't already available behind the scenes, I have sufficient experience as you already know.  ;)
Title: Re: Special DAT for bad dump
Post by: PandMonium on October 30, 2012, 05:04:02 PM
Hey Zandro, i'm in a hurry right now but afaik providing proper parent-clone relationships is kinda hard. First we need to address what that relation really means, we have examples of that in mame, or another simple example could be the GoodMerge sets for 7z (are they still used?).

A first approach could be to group sets based on the title+year+publisher combination, the mandatory fields. That will give you some kind of relation between the grouped sets inside a dat. Still we have several dats for the same software in different formats. Even if this and other ideas may help, there must be some manual part in the process to sort all things out - there are similar titles with different names such as localizations, hacks and so on.
Title: Re: Special DAT for bad dump
Post by: Zandro on October 30, 2012, 10:05:02 PM
Preliminary sorting with klutzy MSBatch.  Might want to add a clause for multi-disk software later.... Warning, this WILL introduce problems with long paths, work on a backup close to root. Even that won't save Leisure Larry unfortunately.

@echo off
cd /d "%~dp0\"
for /f "tokens=1 delims=[" %%a in ('dir /b *[*') do (
  if exist "%%a*" (
    if not exist "%%a\" (
      md "%%a\"
      move "%%a*" "%%a\">nul
    )
  )
)
for %%a in (*) do (
  if "%%a" neq "%~nx0" (
    md "%%~na\"
    move "%%a" "%%~na\">nul
  )
)


It  works with two phases as per limitations on parsing what implies extension (hint, not the last dot). If the idea passes, someone else better handle the code.  :)

*Edit: This version is a little better...

@set d=(
@echo off
cd /d "%~dp0\"
for /f "tokens=1 delims=%d%" %%a in ('dir /b *%d%*') do (
  if exist "%%a%d%*" (
    if not exist "%%a" md "%%a"
    move "%%a%d%*" "%%a">nul
  )
)
Title: Re: Special DAT for bad dump
Post by: idrougge on March 14, 2015, 01:04:23 AM
I'm with the Amiga people here that the afflicted systems need a "bad" DAT to keep the main categories sane. However, I don't think renaming them according to parent CRC (if known) is a good idea.