TOSECdev Forum

TOSEC Project => Database / Datfiles => Errors & Contributions => Topic started by: Vaxalon on April 03, 2011, 11:30:01 PM

Title: A 99% similarity check with dupechecker...has me worried
Post by: Vaxalon on April 03, 2011, 11:30:01 PM
ok ..well...99% similarity check....19000 files....2000 further ones found as duped. Knowing what i do of that particular file type....and that fact that people dicked with the CRCS just so they could call the image theirs....it does make me wonder if its worth taking the buggers out. Still...dats supposed to be EVERYTHING..init.
Damn annoying knowing how many time wasters there are out there thou :)
Title: Re: A 99% similarity check with dupechecker...has me worried
Post by: Symmo on April 04, 2011, 12:51:46 PM
lol
Yes its the same with c64 in most cases were just the directory it different track 18
I think a disk image scanner is needed to find alts and stuff or semi sort.
Guess that's why cowering made goodroms tools.
Title: Re: A 99% similarity check with dupechecker...has me worried
Post by: PandMonium on April 04, 2011, 02:38:08 PM
At least in C64 Duncan made some sort of tool that will open the roms (in various formats) and compare them file by file (i think!).

Also, 99% is not that good :P
It all depends on the total size, system and so on. You can't easily or clearly know they are all the same software, wrongly dumped or hacked to create more dumps, versus original different versions. On the other side it might be a good way to differentiate between software titles based on that similarity.

Sets with a similarity of 99+% are probably the same software title, still they can be a different version, modification or just bad dumps.

In a full disc of 700MB, 1% different is 7MB which could just be the executable file (an update, cracked version or something), while all data was the same.  On the other hand, older and smaller sets didn't had much more data so a difference of 1% could just represent a few changes in a savegame or executable, right?
Title: Re: A 99% similarity check with dupechecker...has me worried
Post by: Duncan Twain on April 04, 2011, 09:11:27 PM
At least in C64 Duncan made some sort of tool that will open the roms (in various formats) and compare them file by file (i think!).

That's right, my tool supports prg, t64, d64 and p00 (and soon lnx). Matching is done per entry inside the rom. Good results found so far. Especially in the C64 area there is lots to be gained by matching. Reconstructing missing info for instance or simply adding new files.
Title: Re: A 99% similarity check with dupechecker...has me worried
Post by: Symmo on April 05, 2011, 11:19:43 AM
yeah u are right PandMonium.
Most things are images and the diff will be in small place.
like c64 d64 images track 18 stores the dir listing so in hex editor u see a bit more.
Alts sux but not much u can do about them unless u use a header skipper or similar.
Just like a iso u can put same file to same iso layout but different hash so need to scan the internal data.
Personally i think a image scanner by track for eg would work least when scanning new files u can see there 100% the same internally.



Title: Re: A 99% similarity check with dupechecker...has me worried
Post by: Vaxalon on April 06, 2011, 10:14:59 PM
well, after some thought on this, i finally realised where the difference is.

for spectrum tzx files, its quite simply really

same game, same game, re converted through a different version of maketzx, OR, had its archive header info appended.