Author Topic: A proposal to TOSEC to solve the metadata problem in emulation  (Read 2121 times)

Offline ReaperX

  • Newbie
  • *
  • Posts: 1
A proposal to TOSEC to solve the metadata problem in emulation
« on: February 01, 2012, 04:12:09 AM »
In this post, I'm proposing an idea to the emulation community in general, and to TOSEC in particular.

The emulation community has done a fantastic job preserving and cataloging ROM and media files associated with classic computer systems, creating emulators to run them, tools to verify and audit them, and collecting or digitizing a wealth of secondary media such as books, magazines, manuals, interviews, reviews, boxes, screenshots, music, video, game  walkthroughs and solutions to preserve our understanding of and the human experience surrounding old systems and the software that exists for them.

Enormous progress has been made in creating a systematic accounting of the primary media files through projects such as TOSEC and the same approach is now being applied to secondary media collections through TOSEC-PIX.

Unfortunately, no unified approach to metadata collection exists yet. Metadata includes not just information about the software itself such as publisher, release year, programmers, graphic artists, musicians, game language, genre, but also a complete list of references to where files related to this software are located in standardized collections such as TOSEC. References to books and magazines would include page numbers; references to audio or video files might include time codes.

It's not that hyperlinked databases for software haven't been created. You can find many of them on the web, such as Mobygames, Lemon and HOL, and others in freely distributable, downloadable form, such as the GameBase projects.

However, due to the lack of a unified data  standard, the fragmentation of the data over many different collections, the collection-specific nature of the referenced secondary media, and in the case of online databases, the technical and social difficulties associated with scraping, frontend projects and developers usually  cannot make use of such pre-existing metadata collections and painstakingly have to build their own database from scratch.

No frontend exists yet that unites the totality of existing emulation files and databases into a comprehensive, fully hyperlinked whole. Imagine a program that basically just needs the path(s) to your (complete) collection of emulation files. It spends a few hours scanning them and then you have the perfect frontend. It can show you lists of games by any conceivable category, and for each title, it can launch any version for any system with correct emulation settings with a click and no needed individual configuration. For each title,  it offers a comprehensive library of secondary media such as manuals, reviews, screenshots or in-game video available for instant viewing.

Right now, this type of frontend is unrealistic. A universal, flexible metadata format for emulation related data would make this a practical vision.  Once such a format has been defined, existing, partial collections of metadata would be converted and merged into a comprehensive database, and references to existing media collections such as the TOSEC ones added. The latter would not be as time consuming as it might at first appear since it could be performed semi-automatically by matching metadata to metadata embedded in filenames.

The result would be a giant step forward for emulation. No more re-inventing the wheel for frontend developers, no more slow and painful manual community efforts to build data files of supported games. All work done in gathering metadata would be permanent and cumulative. As a side effect, information that now only exist precariously on websites that could go offline at any time due to lack of funds or legal challenges could be backed up and preserved in standardized form to ensure its long-term survival.

I think that TOSEC is uniquely positioned and qualified to create such a metadata format and that this would be the logical next step after expanding the project's coverage from primary to secondary media.
« Last Edit: February 01, 2012, 04:15:33 AM by ReaperX »



Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1303
Re: A proposal to TOSEC to solve the metadata problem in emulation
« Reply #1 on: February 02, 2012, 04:15:25 PM »
Hi ReaperX,

Although i agree (at least in part) with your idea, it is something that would require a ton of resources to finalize. Over the last few years i've focused a bit on similar ideas. I don't rename sets so i've spent most of my time trying to improve the project by creating tools and such that help us maintaining and renaming our sets and dats. Other ideas such as information extraction and verification were also approached both by me and other members.

Since the last (academic) year i've become a lot more busy and so my work as slowed down a lot. I still have tons of ideas that i want to do, just waiting for the time to come. One of the things that i learned is that i need to employ a more "agile" approach instead of trying to do / plan huge things at a time or i will never finish any of them due to the time constrains.

So, we do support the idea of gathering more information and sharing it in a more manageable and free way, at least for our data. There will always be projects with differing ideas which is great. However, first we need to change, sort, improve some critical aspects on our project and this is already going at a slow pace. Once we have such foundations ready we may then start to thing about your ideas.

Summing it up, a good idea and something we would / will be interested in (at least a part of it) but we do need to finish a lot of our planned changes before thinking about it. :)

Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1450
    • Email
Re: A proposal to TOSEC to solve the metadata problem in emulation
« Reply #2 on: February 02, 2012, 04:42:39 PM »
Well put....  :)