TOSECdev Forum

TOSEC Project => Database / Datfiles => Topic started by: doomwarrior on April 16, 2022, 11:21:26 AM

Title: duplicate SHA1 in DAT files
Post by: doomwarrior on April 16, 2022, 11:21:26 AM
Hi,

i currently write a little TOSEC mover and are a bit confused about the number of duplicate SHA1 checksum (with matching crc and md5). The worst systems are Ti-xx, C64, Radio Shark. I'm not sure whats up with that.

I can group them into 2 categories. Obviously duplicates entered in two TOSEC files. I'm not if this is desired. Example:
Code: [Select]
WARNING:root:TOSEC file {MGT Sam Coupe - Magazines - [DSK]/FRED Issue 04 (1990)(-)} with same sha1 {7ad3fa837dbfa9c92d0a039c5a0eb2c8948ef138} and matching {md5=True file=True} already found in other TOSEC file {Sinclair ZX Spectrum - Magazines - [DSK]/Fred issue 04 (19xx)(-)(+3)}
WARNING:root:TOSEC file {Commodore C64 - GEOS - [D64]/Geos (19xx)(-)(Disk 1 of 4 Side B)} with same sha1 {c5a34c8830ceb2aea41548a3f91fd7ae5ebdeac9} and matching {md5=True file=True} already found in other TOSEC file {Commodore C16, C116 & Plus-4 - Utilities - [D64]/GEOS v3.5 (1985)(CBM264 Software)(DE)(Disk 1 of 4 Side B)}
even through it is a bit odd to have different attributes for those entries.

But other entries are a bit strage:

Code: [Select]
WARNING:root:TOSEC file {Atari ST - Collections - Floppyshop/Floppyshop Demos 6116 (19xx)(Floppyshop)} with same sha1 {655cc46b10664057b4a8aa158d5806113da755a3} and matching {md5=True file=True} already found in other TOSEC file {Atari ST - Diskmags - [ST]/ST+ Issue 21 (1997-10)(ST+ Incorporating)[a2]}
WARNING:root:TOSEC file {Enterprise 64 & 128 - Games - [BAS]/Lander (19xx)(-)(PD)[basic]} with same sha1 {4a4c1572e14fbaadfddb9143a9738544fa7ae11a} and matching {md5=True file=True} already found in other TOSEC file {Enterprise 64 & 128 - Games - [Multipart]/Lander (198x)(-)(PD)[m zzzip][basic]}
WARNING:root:TOSEC file {MSX MSX - Games - [ROM]/Puzzle Panic (1986)(System Soft)(JP)} with same sha1 {a36c00f32cad6b603bc5525a741cb09064670f34} and matching {md5=True file=True} already found in other TOSEC file {Coleco ColecoVision - Games/Puzzle Panic (2001)(Bienvenu, Daniel)(PD)}
WARNING:root:TOSEC file {Sinclair ZX Spectrum - Games - [Z80]/Inca Gold, The (2001)(Nyitrai, Laszlo)(48K-128K)(HU)(en)[aka Hunt the Hurkle]} with same sha1 {a5fd1e3db6412ca7b06caf63eca14dc4e85e95a0} and matching {md5=True file=True} already found in other TOSEC file {Sinclair ZX Spectrum - Unknown - [Z80]/INCAGOLD (2005)(Jatekgyaros)(HU)}
WARNING:root:TOSEC file {Apple II - Diskmags/Hacker, The #1 (1985)(Boot-Legger Enterprises)(US)(Side 1)[boot]} with same sha1 {48960b057a33f8107beec97cbd7d2664f9d1473d} and matching {md5=True file=True} already found in other TOSEC file {Apple II - Magazines - [DSK]/Hacker, The #1 (1985)(Boot-Legger Enterprises)(US)(Side 1)[boot]}
WARNING:root:TOSEC file {Atari 5200 - Games/Miner 2049er (1983)(Big Five Software)(US)[BF1912]} with same sha1 {0564b1867a0b570d66dfcbc11adc3e51a2c6f28c} and matching {md5=True file=True} already found in other TOSEC file {Atari 8bit - Games - [BIN]/Miner 2049er (1982)(Big Five Software)[!]}

the naming convention does not explain what to expect. For me I don't want to have duplicate files in different folders. What is the ruling here?
Title: Re: duplicate SHA1 in DAT files
Post by: mictlantecuhtle on April 17, 2022, 03:54:33 PM
Thanks for this, we'll take a look and see how best to resolve these duplicates.

There is potentially an argument in my opinion for having dupes in some limited circumstances e.g. where we want to represent a file both as part of a collection but also in its respective category. Just a bit of a limitation of the system we use at the moment.

That said, I definitely want to minimise duplication so will look carefully at each of these and try to resolve appropriately.
Title: Re: duplicate SHA1 in DAT files
Post by: doomwarrior on April 17, 2022, 10:10:04 PM
ok thanks. I will keep that in mind and very likely change my algorithm to handle this. Not sure how - maybe with links.

But thanks for the explanation.
Title: Re: duplicate SHA1 in DAT files
Post by: Duncan Twain on April 24, 2022, 08:33:15 AM
duplication across systems is to be expected as some disks can contain multiple system releases. Within the system dats there should be no duplication!