TOSECdev Forum

TOSEC Project => Database / Datfiles => Topic started by: khsater on February 02, 2014, 11:16:25 PM

Title: Information about "Database"
Post by: khsater on February 02, 2014, 11:16:25 PM
Hi all,
I'm currently working on a touch-screen based emulation front-end and I'm looking for a good way to incorporate information about ROMs into the interface.  The TOSEC data is probably the most comprehensive data source many systems, so I'd like to pull from the data.  Is there a publically available database of TOSEC datas that I can use as a resource or is everything based on dat files?  I'd mostly like a copy of the database in some form that I can easily transfer to a MySQL database (which I am using to populate entries in my interface).  I'd rather not have to parse all the dat files myself.  If I have to, I'll share and probably write a file writer, too.
Title: Re: Information about "Database"
Post by: PandMonium on February 03, 2014, 07:14:21 PM
Hi khsater,

That's a complex question. Our data is indeed contained mostly in our datafiles and updated in each new release with a new pack of them. Furthermore, the dats contain names in TNC format which is complex and hard to parse (specially some old/bad designed flags). That said, we have a few tricks (a bit outdated) to extract info from dats (separating some flags from a set in tosec format) but it is incomplete and would need extra work to get there. I've played a bit with a similar idea but it is now outdated too. Can you elaborate exactly what you wanted to extract? (PM if you want)

IMHO one good solution would be to explore different sources of data. TOSEC is comprehensive and will be useful for old, exotic systems. Still, our data catalogues tons of stuff (bad dumps for instance) that might not be interesting to you. In addition, some systems are very preliminary. Have you considered also exploring also things like gamebases, the MESS software lists or other sources? That will give you a cleaner set of data at least for main systems (considering playability/best existing copies of each software).
Title: Re: Information about "Database"
Post by: khsater on February 04, 2014, 03:59:38 AM
Thanks for your quick explanation.  I'll answer your question about my interest twice:

First, I'm interested in categorizing things using the following hierarchy: System Manufacturer -> System -> Date/Genre/Developer/Publisher/Etc? -> Game.  Obviously some of that is outside of the scope of TOSEC, but that brings me to my second answer:

I'm also interested in contributing to the preservation of video game history through TOSEC and through maintaining/generating my own database and contributing data I have to TOSEC/others.  I've got a lot of good info on the history of Enix, for example.  The PC-8801 was a major system for Enix, and many of their games are cataloged in the PC-8801 dats. I plan to somehow incorporate info from multiple sources, relying on the most comprehensive of resources for a given system.  MESS is good for a lot of things, GoodTools might be useful, too.  Of course TOSEC is the only place to find information on some systems.

Anyways, the point of my project is to make exploring video game history easy and fun through automation of emulation (especially for the old stuff like the obtuse and untranslated Japanese system emulators).  This is obviously a massive undertaking, but I hope the payoff will be the foundation for a comprehensive resource on basic video game information (and associated rom dumps).  Back to the point - I'm starting with the basics and mostly I'm going to pull name, developer, and year and potentially md5 and size data.  It'll be a challenge to sort through the bad dumps and the bad flags, especially coming at this as a hobbyist without formal training in these types of things.  I'm definitely up for a long-term challenge, though.  Ideally I'd be able to help out with a parser/generator.  Can you give me an example of these "old/bad designed flags" and some insight into these tricks of yours?

It's too bad TOSEC wasn't started using XML as the format for DATs.  The information is well suited for it and there are parsers for every language out there..
Title: Re: Information about "Database"
Post by: PandMonium on February 09, 2014, 05:59:08 PM

I've been quite busy so my reply might be a bit short. I see you have a lot of work in there but if you have the time, go ahead. :) There are many information sources indeed. Some might be less credible, here we also have sets that are badly named and always improving (see all the updates to Commodore lately).

Some time ago we tried to implement some of the things you talk there but the common problem to all of us is always the lack of time so the ideas stopped (and some are only paused). The issues i've mentioned are related with out naming convention, and are there since the creation / introduction of such flags (dinosaur Cassiel or others might now the reasons). For instance, TOSEC sets all are renamed according to those rules and there is a flag named "Media Label", used to input the name / text in the label of the disk/disc/tape/whatever when needed (e.g. "Installation Disk"). The problem with such flag is that it must accept any text inside and as a result any typo in other flags will (normally) end up being parsed as media label.

[Example: "Title (1999-10-10)(Publisher)(US)" is a correct TNC name, "Title (1999-10-10)(Publisher)(Us)" is still correct but Us now represents a media label. This is a limitation of TNC and generally of using strings/set names to save the information. We can't save everything there easily. We can parse that automatically but some of the flags may end in the wrong field. Another example are the dump flags, many support the flag info (ex. modification info) and also the author of such thing. Still, if you have [f PAL] you cannot automatically or even manually know 100% what it means. You can have a set of rules and use common sense. For instance, in this case most will say it means a fix to work in PAL systems but in rare cases it could also mean a fix, without description, done by some group or guy named PAL]. We have tools that can parse the setnames, check and generate them. They suffer from the limitations and issues explained before, caused by the complexity of our naming scheme. We still hope to find time to solve such issues and bring some nice/new things to the project ;)

Using XML (or other such format) is an old goal but haven't been done yet. One of the major issues with such thing is the lack of time to update/create tools for that in our side but specially the way TOSEC works. For renamers, it is way more pratical to pick sets, play with the files (emulators, hex editors, disassemblers or other tools) and then rename them accordingly. On previous discussions they always hated the idea of playing with an extra tool or form.
Title: Re: Information about "Database"
Post by: khsater on February 15, 2014, 01:51:19 AM
Thanks for the rundown.
It seems like interpreting the semantics is the issue, to overly-simplify your explanation.
I'll dig in with my determined problem-solving brain and see what comes out.
If I come up with any revolutions or handy tricks along the way, I'll be sure to share!