Author Topic: Duncan's WIP  (Read 64758 times)

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1323
Re: Duncan's WIP
« Reply #30 on: February 22, 2011, 03:08:12 PM »
Does it matches sets based on the content comparing each file by hash or something like that? If so it is indeed valuable. Not that it will improve any of the already existent errors in all sets (eg.: if a date is wrong, it will continue wrong) but it surely may help to at least get ride of the many variations across dats for the similar sets (eg.: titles with caption or other small differences, sets with exact date where others have 19xx and so on).

In any case, use it carefully to avoid introducing new errors too!

Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1561
    • Email
Re: Duncan's WIP
« Reply #31 on: February 22, 2011, 04:14:17 PM »
Duncan - did you see this btw?

http://www.tosecdev.org/index.php/forum/index.php?topic=264.msg2948#msg2948

Can you take a look at this when you checking your other DATs. I noticed it ages ago too, but completely forgot to mention.... damn my swiss cheese brain!

Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1561
    • Email
Re: Duncan's WIP
« Reply #32 on: February 22, 2011, 05:18:42 PM »
Opps..... same issue as this, just didn't connect the dots

http://www.tosecdev.org/index.php/forum/index.php?topic=261.0

Offline Duncan Twain

  • TOSEC Member
  • Hero Member
  • *****
  • Posts: 514
Re: Duncan's WIP
« Reply #33 on: February 22, 2011, 06:26:45 PM »
Opps..... same issue as this, just didn't connect the dots

http://www.tosecdev.org/index.php/forum/index.php?topic=261.0


Have been fixed last week.

Offline Duncan Twain

  • TOSEC Member
  • Hero Member
  • *****
  • Posts: 514
Re: Duncan's WIP
« Reply #34 on: February 22, 2011, 06:32:29 PM »
It will eventually improve the overall quality of the DATs. Correlating files means that filenames get consistent and 'correct eachother' by taking date or publisher information from one to the other file.
Correlation in done by an exact (byte-by-byte) comparison of the file contents (ie. D64 entires). I've alread seen some staggering results, and will share them on the forum soon.

Adding new files plays a small part in this. Main goal is to find errors in the current DATs.

Does it matches sets based on the content comparing each file by hash or something like that? If so it is indeed valuable. Not that it will improve any of the already existent errors in all sets (eg.: if a date is wrong, it will continue wrong) but it surely may help to at least get ride of the many variations across dats for the similar sets (eg.: titles with caption or other small differences, sets with exact date where others have 19xx and so on).

In any case, use it carefully to avoid introducing new errors too!
« Last Edit: February 22, 2011, 06:35:24 PM by Duncan Twain »

Offline TKaos

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 538
Re: Duncan's WIP
« Reply #35 on: February 22, 2011, 06:46:32 PM »
Keep in mind that sometimes a game got 2 publisher with different year or same publisher but a different year...or even published by 3 different companies, the variations could be alot.
Just saying so you don't auto merge 1 game with lets say 2 different TNC names into 1 and then delete the info that it was actually another release of it by different publisher.

I know from the Atari8bit DATs that those cases often happen, alot games got for example published as cassette by company x and on disk by company y.

Offline Duncan Twain

  • TOSEC Member
  • Hero Member
  • *****
  • Posts: 514
Re: Duncan's WIP
« Reply #36 on: February 22, 2011, 06:52:32 PM »
For now it will be used to find the errors. If a release say in P00 format is byte exact to a PRG file one could savely assume that it's the same release. If the file name differs, that would be the start for investigation.
There's no auto merge, just finding the hard to spot errors.

Offline Duncan Twain

  • TOSEC Member
  • Hero Member
  • *****
  • Posts: 514
Re: Duncan's WIP
« Reply #37 on: February 22, 2011, 06:55:18 PM »
Tool created to match T64 files against PRG files, works like a charm. Enables me to get exact matches on file content and rename file accordingly. This is the first step to correct filenames. When eventually file relationships are created it's 'just' a matter of verifying and renaming if needed.

Tool modified to match (single file entries only) combinations of:
- PRG
- P00
- D64
- T64

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1323
Re: Duncan's WIP
« Reply #38 on: February 22, 2011, 06:58:58 PM »
Yes i understood the process and i find it really useful. Even more if it you can do it for other file types of other systems too but that would need an extra amount of time to understand the existent image formats and i'm not sure all are well documented.
My last sentence is just that even if the process will dramatically improve dats consistence it wont mean the information itself was correct from the start. It is highly needed but combined with manual verification too of many of these sets in the future.

An example from C64:
With this instead of having "Barmy Bills Flight of Fun (198x)(Publisher)" and "Barmy Bill's Flight of Fun (1984-10-20)(Publisher)" you could fix both and have all the sets with "Barmy Bill's Flight of Fun (1984-10-20)(Publisher)", still that does not mean it is correct and in the end it could be: "Barmy Bill's Flight of Fun, The (1994-10-20)(Publisher)[cr Oracle]"

Anyway i really like the idea, TOSEC is full of title (and other fields) variations in different dats, any step forward improving quality is good, specially on these easily noticeable errors.


TKaos also makes a valid point, one more reason to be careful with it. Still, since you say you're comparing the content of each image file by file it will match only exactly equal software on different formats, so you will catch a lot of inconsistencies with it. I can find a lot of these easily in C64, even in the same dat, these 2 seem a lot similar (at least the setnames):

Commodore C64 - Games - [T64]
Code: [Select]
Duomato - Wheel of Fortune (1995)(Assassin Software)[cr F4CG]
Duomato Wheel of Fortune (1995)(Assasin)[cr F4CG][tr en]

Offline TKaos

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 538
Re: Duncan's WIP
« Reply #39 on: February 22, 2011, 07:05:30 PM »
Well if you want to make it perfect then take the biggest games dat in C64, check the images 1by1 and then compare those images with the other games dats, that way you can make sure that the info you put into the other ones is atleast correct for sure cause you checked all.

Offline Duncan Twain

  • TOSEC Member
  • Hero Member
  • *****
  • Posts: 514
Re: Duncan's WIP
« Reply #40 on: February 22, 2011, 07:24:32 PM »
Games is one the list. Haven't worked on it at all.

Like you guys know there's lots and lots of work to do on the C64 DATs. I intend to first throw (home made) tools at them and later, if things are a bit more tidier, start doing manual corrections.
 

Offline Symmo

  • TOSEC Contributor
  • Jr. Member
  • **
  • Posts: 55
Re: Duncan's WIP
« Reply #41 on: February 22, 2011, 07:29:46 PM »
Hi
Well its a good way to sort d64 image because a good chunk are just a single prg in the image were as in the extracted prg is named right and the 
d64 dat are named wrongly with missing data like dates etc.. .
Just think about putting a windows exe into a iso image , least if the internal data is 100% with a the single exe u know its that program anyways to name it properly.
Was also like tape images were its a single prg in the there gets around the image difference u want to name by the data to save time.
Also to u might have altered images were people have redone or changed like label so u get a new file least this way u get what the image actually is and can make it a alt.
Should help improve naming across formats and keep it all consistent .
Lot of c64 disk were level packed later into a single with trainers the lot like stuff from anubis.
So u will have a lot prg's match the ones in the d64 images and tapes.

Good work duccan
« Last Edit: February 22, 2011, 07:40:23 PM by Symmo »
Try this as your wallpaper if you are new :-) http://symmo.net/tosec/tosectnc.png

Offline Duncan Twain

  • TOSEC Member
  • Hero Member
  • *****
  • Posts: 514
Re: Duncan's WIP
« Reply #42 on: February 22, 2011, 07:37:23 PM »
Oh, and it gets better. Next improvement will be the use of iAN CooG's UNP64. This allows me to 'unpeal' packed image by removing compression and thus having an even better comparison for single file images.

Offline TKaos

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 538
Re: Duncan's WIP
« Reply #43 on: February 22, 2011, 09:06:46 PM »
This might be of interest for you, the tape images you have of this software should all get (Side B) and would be nice if you could put [side A ATR8bit] or something similar into more info.
You can also copy&paste the other part of the filename I guess (without the [BASIC]). :P

Dampfmaschine (1985)(Europa Computer Club)(DE)(Side A)[side B C64][BASIC]
Seeschlacht (1984)(Europa Computer Club)(DE)(Side A)[side B C64]
TKKG - Das Leere Grab im Moor (1985)(Europa Computer Club)(DE)(Side A)[side B C64][BASIC]


If I find more of them I'll pass you a message.

[EDIT]
Found next one but from educational software:
Deutsch-Stunde 3, Die (19xx)(Europa Computer Club)(DE)(Side A)[side B C64][BASIC].cas
Maybe you can find out the date. :)
« Last Edit: February 22, 2011, 09:16:10 PM by TKaos »

Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1561
    • Email
Re: Duncan's WIP
« Reply #44 on: February 22, 2011, 09:47:46 PM »
Have been fixed last week.

Nice one...   :)