Author Topic: Special DAT for bad dump  (Read 9081 times)

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1323
Re: Special DAT for bad dump
« Reply #15 on: October 03, 2012, 01:38:59 PM »
I think i already got the idea.

We can simply see it as just a normal new dat category. When there are a lot of sets we already divide things further, for instance from using just a "Games" dat, to various "Games - [EXT]" or even further with "Games - Public Domain - [ext]", collections, compilations and so on. Years ago we even had several "Various" dats with a mixture of everything.

Some of my concerns are with the possible loss of information by creating single those new "Various" bad dump dats per system that may contain similarly named files from different dats (because the roms have different extensions, are different types of software and so on). Again, this really depends from system to system and common sense from renamers :)

At least we agree on:
- Documenting the existent sets, and not the single best ones
- There are renaming mistakes, existent alts and bads might be actually something different but not properly checked
- Some software might only exist with dumping errors or already edited, creating some kind of alts / modifications

Things are different across systems, you can have Amiga with 2313 bad dumps already (which could increase a lot i guess) or C128 with only 19. In some cases it might be logical / helpful to have only a "Bad Dumps" and others might need "Bad Dumps - Games" (...).


Finally, you said alts should remain as they are but i suspect (mai will know better, at least in Amiga) that alternates are a far bigger problem going by the current statistics. The definition of the flag (alternate) is not clear for many or at least it was never followed as it was supposed, plus there is a lot of confusion with the hack/modified and others. I view the flag as something used to identify different versions of existent (mostly) original sets - for instance a copy with harder enemies, different background in a game and so on. Original sets that were modified due to unprotected media should be marked with a modification flag, such as [m highscore], [m savegame].

Nowadays we have already *tons* of alts which are probably wrongly renamed sets, marked alternate by laziness or lack of knowledge/information from the renamers and used as a super speedy way of adding tons of sets. The numbers with alternates are or a different magnitude than bad dumps but if renamed correctly they will mostly be bad dumps or modifications (and we will start having tons of it, which in the future will also probably be discussed to suffer the same reorganization :)).

As an example, only in Commodore C64 we have 30.088 alternate flags, the Commodore Amiga - Games - [ADF] dat alone is 26,6 % alts (7261).


My current position is in favor of the idea of moving those bad dumps, especially in cases where the number is high, based on renamers feedback (mai/Crashdisk). New sets can be renamed to save future work and things will still be cataloged and we will improve in organization.
Still, it would be interesting to have more opinions from others too (members or not).

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1323
Re: Special DAT for bad dump
« Reply #16 on: October 03, 2012, 01:41:41 PM »
I updated my program to detect a "new" type of corruption and the result is afflicting. 19 new bad files (need mai confirmation) for "Commodore Amiga - Games - [ADF]" starting from 0 to A.This is just the beginning of a flood of [b useless] ...
http://eab.abime.net/showpost.php?p=843000&postcount=526
Nop, it is a flood of renaming improvements. Probably changing a lot of the unclear "[a something]" (or any other flag) to a proper naming with [b something] / information. I fear the problem occurs also in other systems too but we don't have the man power and capabilities there. Hopefully in the future you guys may get interest in other systems too :D

Offline Crashdisk

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 248
Re: Special DAT for bad dump
« Reply #17 on: October 03, 2012, 01:51:37 PM »
[b doscopy] flag denotes a copy of a disk with overwriting data on a strategic point of the disk (rootblock). If we now better informed of the problem, it is still useless.

Offline Crashdisk

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 248
Re: Special DAT for bad dump
« Reply #18 on: October 03, 2012, 02:14:12 PM »
Separation of bad DAT require more maintenance because of exchange set a DAT to another (PD Game => Game / Demo - Various => musicdisk ....), besides changing names in good AND in the [EXT] dat file
Acoustic Revolution 3, The (19xx)(-)(Disk 1 of 3)
=> Tune Show III (1990)(The Acoustic Revolution)(Disk 1 of 3)

We must also change the name in the [EXT] file
Acoustic Revolution 3, The (19xx)(-)(Disk 1 of 3)[b dump]
=> Tune Show III (1990)(The Acoustic Revolution)(Disk 1 of 3)[b dump]

My idea of ​​changing the name of the bad file with the MD5 hash of the correct version has two advantages:
  - The name does not change if the name of the correct version changes
  - It keeps track of paternity

Sound of Silents (1990-08-31)(Silents)[o ].adf
This is overdump of which version?
Sound of Silents (1990-08-31)(Silents).adf
Sound of Silents (1990-08-31)(Silents)[a].adf

Remember that the names change, not hashes...
« Last Edit: October 03, 2012, 02:21:24 PM by Crashdisk »

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1323
Re: Special DAT for bad dump
« Reply #19 on: October 03, 2012, 03:15:49 PM »
I've to leave now but will post again later :P

Renaming things to an hash is something that i do not agree with because it makes the files much harder to search. Even between the renamers, it is harder to look for an hash than for a simple name of the title you're testing. More importantly it goes in the opposite direction of the goal we pursue. If i have a set that changes from something to a hash, it wont help me a lot, at least with a real file name i know what it is and that i should search the non bad version.

Even if the files are bad, knowing why and what they are might be useful for people trying to recover other versions, and other ends. I understand that it might give more work to rename various images too but if renamed right, further renames don't tend to happen that much. Also, the idea (IMO) should be to move them carefully there, once tested and not to batch move every found bad dump (or worse, every currently marked [b ]) to another datfile.

As for the origin of each dump and their relationship, they all come from physical media (disks), not from other roms. In the case you described the alt is probably badly named.

Imagine
"Sound of Silents (1990-08-31)(Silents)(DE)"
"Sound of Silents (1990-08-31)(Silents)(PT)"
"Sound of Silents (1990-08-31)(Silents)[cr PDX]"
"Sound of Silents (1990-08-31)(Silents)(DE)[m savegame]"

These 4 dumps would have been created from 4 different disks (or at least 3, in different times for 1st and last). If i somehow get an Amiga disk, dump it and create a bad/over dump, the overdump origins from one easily identifiable set given the differences. If the 4 were just called [a] to [a4] it would be hard. I understand that there are a lot of cases, many sets are probably dupes due to popular games being dumped a lot and containing different savegames or highscores, dumping errors and all that. In addition there can be even dumps edited now just to create more garbage.

This means that the paternity thing  is hard to establish many times, especially based on these dumps. If you see MAME or other projects (goodmerge?), some times the idea is used to group versions of the same game that share a lot in common and not a direct relation to one, single dump. In your example, one set (probably one of the originals, untouched and based on location world/older/etc) would be seen as parent or they could just all be viewed as "Sound of Silents" versions.


Your points are issues in the renaming process / information managing that must indeed be solved, but with decent solutions. We can not save every bit of information in the file name or expect to manage relations and multiple name changes based on it. These issues (some previously discussed) of relationships and ease of renaming will hopefully be solved some day but it is hard to please all renamers :|

Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1470
    • Email
Re: Special DAT for bad dump
« Reply #20 on: October 11, 2012, 04:01:41 PM »
Yeah, this issue bubbles back up every couple of years.

I agree having bad (b,o,u,v) images perpetually shared and collected is far from ideal, but having the same bad 'new' images constantly submitted/reviewed is even less so.

For the record, I'm not a fan of having separate 'bad' DATs at all. Never have been actually.

When we used to have real time DAT generation through the website, you used to able to toggle whether you included bad images or not. I always thought that was a very elegant solution, putting the choice in hands of end users without losing any information/hashes. Sadly no one else agreed (with the whole TOSEC  DAT Generator thing I mean).

For a long time I've had "Investigate ClrMamePro's <baddump> flag" on the unofficial TOSEC to do list, since I believe this can achieve the same thing - put the choice/control in hands of end user whilst still maintaining full catalogue of images.

I have zero free time atm… any volunteers to look into this? Anyone have any similar ideas?

Offline Crashdisk

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 248
Re: Special DAT for bad dump
« Reply #21 on: October 11, 2012, 09:32:56 PM »
I'm probably repeating myself, but no matter. The legacy of the past leads us to treat many alternatives that are sometimes bad dump more or less visible. However, when the damage is clearly identified and that it is a bad copy of a good already cataloged: Why continue to maintain? For whom and why? I regularly dump disk and fortunately I do not integrates all my waste. The problem is that some people do that! The dat Yori are also filled with waste that I do not want to integrate in TOSEC (as mai). Which wants to add many crappy dump? But the problem is that these disks are not identified / not renamed for average person and does not share our work between member TOSEC.

Here's what I suggest (with some changes compared to previous messages)
Extract the current file "Commodore Amiga - Diskmags (TOSEC-v2011-11-01_CM). dat" :
Code: [Select]
game (
name "Stolen Data - Issue 10 (1992-12-26)(Anarchy)(Disk 1 of 2)"
description "Stolen Data - Issue 10 (1992-12-26)(Anarchy)(Disk 1 of 2)"
rom ( name "Stolen Data - Issue 10 (1992-12-26)(Anarchy)(Disk 1 of 2).adf" size 901120 crc e1d31603 md5 62279dc9a6bb9a9daa4df19eb3a20bee sha1 049617a48250330a719c7d28e9018037b0f15e4e )
)

Extracted from my working dat file "Commodore Amiga - Garbage (PRIVATE-V2012-10-09_CM). dat" (note: in my dat file, I included the reference file) :
Code: [Select]
game (
name "Diskmags - E1D31603"
description "Diskmags - E1D31603"
rom ( name "Diskmags - E1D31603 [39DA2221][b doscopy].adf" size 901120 crc 39da2221 md5 92fae334465a2beecc08f248160d1f1b sha1 62d67a4819730b47aab7833094a42e8057d9e484 )
rom ( name "Diskmags - E1D31603 [F634A4C6][b sector].adf" size 901120 crc f634a4c6 md5 7a7064d91288920e5aaaea7c7f18fe88 sha1 dfd0c26fbc659bb6f1a73808b8576c43490691f1 )
rom ( name "Diskmags - E1D31603 [FB12BC3B][b doscopy].adf" size 901120 crc fb12bc3b md5 b349fd2bcfce0c588e8bac0759cf275a sha1 ee02690e24174c565e7bea650ba4d7cd590264c1 )
rom ( name "Diskmags - E1D31603 [FB12BC3B][b doscopy][part 1].adf" size 450560 crc f61b09a7 md5 3cdd1852ba75fdae60a08031b7328703 sha1 8168e40c0719daa7f79d6b79b6d68b6dd5cd6bb5 )
rom ( name "Diskmags - E1D31603 [FB12BC3B][b doscopy][part 2].adf" size 450560 crc d8f36a2a md5 0d92bac55c6494c352baccff740a5426 sha1 f17f0a67e0509abab1c561e77a2e3056fa9f3790 )
rom ( name "Diskmags - E1D31603.adf" size 901120 crc e1d31603 md5 62279dc9a6bb9a9daa4df19eb3a20bee sha1 049617a48250330a719c7d28e9018037b0f15e4e )
)

The use CRC32 as the reference name to avoid losing work identification without having to transmit the changes made in the current file.
« Last Edit: October 11, 2012, 09:37:52 PM by Crashdisk »

Offline TKaos

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 533
Re: Special DAT for bad dump
« Reply #22 on: October 12, 2012, 07:55:25 PM »
I dont really like the idea of renaming the bad files with CRCs.
It's not even close to TNC rules, if you really want to have seperate ones, do it with their current name and if you wish so add the CRC of the good version do it with [more info] flag cause that's what the Info is.
In the end it'll be a lot easier to seperate then and I dont believe that everyone wants to split their DATs with that new naming.
Anyway have to agree Cassiel, not a big fan of it and we should rather provide an ability to generate DATs like the people want, cause that seems to be the better way.

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1323
Re: Special DAT for bad dump
« Reply #23 on: October 13, 2012, 01:03:52 PM »
Kinda agree but since we currently do not provide that, i understand that they want to rename / catalog all the bad dumps they are receiving.

So, imho you can do any dat you like as personal but to be included one day in TOSEC all the sets need to follow TNCs or it will not make sense. As i see it, you can still do an independent dat (bad dumps) if the number of sets is too much - which will probably be true with your new findings. Just like what happens with file formats, if we have a Games - ADF dat with 20k sets and half of them are bad dumps and where only 1000 different titles exist i kinda understand using different dats. Still, they should have TNC names and you could use more info to put the crc if you find that really valuable (questionable).

Meanwhile and since i know you don't like the idea of adding new dumps, just add the new ones to your private dats. In the future when we have this sorted or / and a system to help users filtering bad dump sets / dats they might be included.

Makes sense? :p

Offline Zandro

  • Newbie
  • *
  • Posts: 19
Re: Special DAT for bad dump
« Reply #24 on: October 30, 2012, 03:21:31 AM »
<rom name="Corruption (19xx)(-)[ b ].adf" size="814080" crc="2f8ba19c" status="baddump"/>
or
rom ( name "Corruption (19xx)(-)[ b ].adf" size 814080 crc 2f8ba19c status baddump )





CMPro doesn't seem to mind it being there whether "Show 'baddump' ROMs & inverted CRCs" is enabled or disabled, it treats it as incomplete rather than unneeded. Meaning, if one chooses to automatically remove all such bad dumps, he must right-click > Delete > Currently Selected Set (or "All Listed Incomplete Sets" if the set is otherwise fully complete).

Any thought given to explicitly providing parent-clone relationships? Using 7z to eliminate redundancy would take the trivial concern of space out of the equation while tightening organization, as related disks with foreign titles would be united. I've always considered TOSEC a best attempt at a complete catalog, a polar opposite of No-Intro, highly parallel to GoodSets. If you don't mind this typecasting, then let the bads stick around. Otherwise, the garbage will simple have to be ignored, or worse, re-identified as such time and time again when each shipment comes in.

I am willing to help create such merge info if it isn't already available behind the scenes, I have sufficient experience as you already know.  ;)
« Last Edit: October 30, 2012, 04:09:56 AM by Zandro »

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1323
Re: Special DAT for bad dump
« Reply #25 on: October 30, 2012, 05:04:02 PM »
Hey Zandro, i'm in a hurry right now but afaik providing proper parent-clone relationships is kinda hard. First we need to address what that relation really means, we have examples of that in mame, or another simple example could be the GoodMerge sets for 7z (are they still used?).

A first approach could be to group sets based on the title+year+publisher combination, the mandatory fields. That will give you some kind of relation between the grouped sets inside a dat. Still we have several dats for the same software in different formats. Even if this and other ideas may help, there must be some manual part in the process to sort all things out - there are similar titles with different names such as localizations, hacks and so on.

Offline Zandro

  • Newbie
  • *
  • Posts: 19
Re: Special DAT for bad dump
« Reply #26 on: October 30, 2012, 10:05:02 PM »
Preliminary sorting with klutzy MSBatch.  Might want to add a clause for multi-disk software later.... Warning, this WILL introduce problems with long paths, work on a backup close to root. Even that won't save Leisure Larry unfortunately.

@echo off
cd /d "%~dp0\"
for /f "tokens=1 delims=[" %%a in ('dir /b *[*') do (
  if exist "%%a*" (
    if not exist "%%a\" (
      md "%%a\"
      move "%%a*" "%%a\">nul
    )
  )
)
for %%a in (*) do (
  if "%%a" neq "%~nx0" (
    md "%%~na\"
    move "%%a" "%%~na\">nul
  )
)


It  works with two phases as per limitations on parsing what implies extension (hint, not the last dot). If the idea passes, someone else better handle the code.  :)

*Edit: This version is a little better...

@set d=(
@echo off
cd /d "%~dp0\"
for /f "tokens=1 delims=%d%" %%a in ('dir /b *%d%*') do (
  if exist "%%a%d%*" (
    if not exist "%%a" md "%%a"
    move "%%a%d%*" "%%a">nul
  )
)
« Last Edit: November 10, 2012, 10:24:30 PM by Zandro »

Offline idrougge

  • Newbie
  • *
  • Posts: 25
Re: Special DAT for bad dump
« Reply #27 on: March 14, 2015, 01:04:23 AM »
I'm with the Amiga people here that the afflicted systems need a "bad" DAT to keep the main categories sane. However, I don't think renaming them according to parent CRC (if known) is a good idea.