In this post, I'm proposing an idea to the emulation community in general, and to TOSEC in particular.
The emulation community has done a fantastic job preserving and cataloging ROM and media files associated with classic computer systems, creating emulators to run them, tools to verify and audit them, and collecting or digitizing a wealth of secondary media such as books, magazines, manuals, interviews, reviews, boxes, screenshots, music, video, game walkthroughs and solutions to preserve our understanding of and the human experience surrounding old systems and the software that exists for them.
Enormous progress has been made in creating a systematic accounting of the primary media files through projects such as TOSEC and the same approach is now being applied to secondary media collections through TOSEC-PIX.
Unfortunately, no unified approach to metadata collection exists yet. Metadata includes not just information about the software itself such as publisher, release year, programmers, graphic artists, musicians, game language, genre, but also a complete list of references to where files related to this software are located in standardized collections such as TOSEC. References to books and magazines would include page numbers; references to audio or video files might include time codes.
It's not that hyperlinked databases for software haven't been created. You can find many of them on the web, such as Mobygames, Lemon and HOL, and others in freely distributable, downloadable form, such as the GameBase projects.
However, due to the lack of a unified data standard, the fragmentation of the data over many different collections, the collection-specific nature of the referenced secondary media, and in the case of online databases, the technical and social difficulties associated with scraping, frontend projects and developers usually cannot make use of such pre-existing metadata collections and painstakingly have to build their own database from scratch.
No frontend exists yet that unites the totality of existing emulation files and databases into a comprehensive, fully hyperlinked whole. Imagine a program that basically just needs the path(s) to your (complete) collection of emulation files. It spends a few hours scanning them and then you have the perfect frontend. It can show you lists of games by any conceivable category, and for each title, it can launch any version for any system with correct emulation settings with a click and no needed individual configuration. For each title, it offers a comprehensive library of secondary media such as manuals, reviews, screenshots or in-game video available for instant viewing.
Right now, this type of frontend is unrealistic. A universal, flexible metadata format for emulation related data would make this a practical vision. Once such a format has been defined, existing, partial collections of metadata would be converted and merged into a comprehensive database, and references to existing media collections such as the TOSEC ones added. The latter would not be as time consuming as it might at first appear since it could be performed semi-automatically by matching metadata to metadata embedded in filenames.
The result would be a giant step forward for emulation. No more re-inventing the wheel for frontend developers, no more slow and painful manual community efforts to build data files of supported games. All work done in gathering metadata would be permanent and cumulative. As a side effect, information that now only exist precariously on websites that could go offline at any time due to lack of funds or legal challenges could be backed up and preserved in standardized form to ensure its long-term survival.
I think that TOSEC is uniquely positioned and qualified to create such a metadata format and that this would be the logical next step after expanding the project's coverage from primary to secondary media.