TOSECdev Forum

TOSEC Project => Database / Datfiles => Topic started by: tolie on January 07, 2020, 12:34:28 AM

Title: How to deal with many files and dirs for a ROM?
Post by: tolie on January 07, 2020, 12:34:28 AM
So, I found a popular dump of Nintendo Wii-U ROMS. Some come in a full image (.wud or .wux), but others like "eShop" are just directories with no bounds on file/dir count... thousands per title. So *if* I were to .dat them, I'd hash thousands of files per game? Makes sense until it doesn't. Is there some hidden approach like creating a queue of the files in alphabetical order then creating a .iso from that queue? A standard way at all?

Since they are "eShop", I almost think the URL to the official download could be used in some way, but I'm not sure how to find that and even if I did it can't be guaranteed all titles would have a link (or known/public link anyways). The .wud and .wux are easy, just wish they all were  that easy :-/. Also, off topic a bit, but there doesn't seem to be a lot of Wii-U games... is this correct? There seems to be very few titles, even considering the region dupes.
Title: Re: How to deal with many files and dirs for a ROM?
Post by: Maddog on January 07, 2020, 09:14:46 AM
There's a way, but this would be bending the established rules badly: zip all files together as provided, then torrentzip the resulting .zip and finally hash the .zip (as a file) and not the files contained within themselves. Torrentzip produces zips done in a constant way and hashing the resulting zip would be the same as hashing any other "container" format. But as I said, this is not the standard method. Standard would require you to hash individual files, same way as for example MAME is hashing several different roms for every game.

Wii U has been a badly failed console in terms of sales, so it's not strange that companies didn't exactly rush to publish games for it.
Wikipedia has a list with 767 games, this however includes many download-only: https://en.wikipedia.org/wiki/List_of_Wii_U_games
As for physical releases, a quick Google revealed 157 US releases shortly before the full death of the console, but I am definitely too lazy to check further. :)
Title: Re: How to deal with many files and dirs for a ROM?
Post by: tolie on January 07, 2020, 01:23:50 PM
In that case,  why not just stop at the first zip and hash that? TorrentZip does state it uses "standard values", but I'm not sure how (nothing about the method seems mentioned).

IWith downloaded "ROMS" becoming clearly the majority of ROMS currently, something has to be out there eventually as a suggested standard. The days of dumping media storage are coming to an end slowly (at least half way there maybe?)

The below is faulted and ends in the obvious, but it's all I have now :-/....

Without knowing how compression works, the problem seems to be the directories. For the files one could just get the SHA (or whatever) of all files,  sort those ascending and add them in that order to a container right? But the directories, unless they are only added in when each file requires them, I don't see a more unique way. But even then there would be the problem of identical files, which leads my brain to the problem of how to insert identical files with using just the directory tree as a reference. I could get an entire branched list, sort it with some method, then add them... which is back to the obvious, but which method? Some POSIX, UTF aware sort method would do, but which?
Title: Re: How to deal with many files and dirs for a ROM?
Post by: Maddog on January 07, 2020, 02:09:46 PM
If you hash the first zip file, nobody will ever be able to rebuild that zip the same way and get the same hash.
Torrentzip is the only known way to get constant hashes for a zip file between computers.
There are too many factors in play, meaning that zipping a bunch of files will not yield constant results between different computers. Among them are compression levels, file timestamps and directory structures. Torrentzip has been engineered exactly to solve this problem, but I am not aware (or care much either...) about the exact technical workings. It works as intended and that's enough for me.  ;)