Author Topic: Why does the TNC exist in it's current form? It should be updated or obsoleted.  (Read 337 times)

Offline Casteele

  • Newbie
  • *
  • Posts: 6
The majority of image identifications are based on the CRC, MD, and SHA algorithms, not on the filenames which people can rename files to anything. I could write my recipe for pancakes to a file named "Super Mario Bros.nes", and most tools would ignore the name and check the hash, anyhow. So I keep wondering why the TNC exists in it's current form, and why the dat files use it extensively, and worse--exclusively.

It makes the dat files extremely difficult to parse and maintain. It provides no proof or security against malformed (accidental or intentional) names. It suffers the many limitations of naming files, such as problems with quotes, colons, braces, and other special characters. And other issues.

What if files/images were identified exclusively by the known hashed? You could then separate the title from the other details:

<file name="TOSEC Naming Convention"><hashset>...</hashset><date>20150323</date><publisher>TOSEC</publisher><flags><obsoleted/></flags><version>4</version></file>

(Or it can use attributes instead of full elements.)

Sure, humans may have difficulty reading that, but less so then all he encoded data which is positional dependent. There are many XML tools which can read the XML code in any order, and output a standard order. Users could name their files any name they want, since it is identified by hashset anyhow--and if they so desire, they can name their files following the current TNC: the data is still there, but compartmentalized in to it's parts instead of all mashed and strung together.

Sure, some people may look at a directory listing and have all the details in one name--again, they can always choose to output/rename following the TNC. Or they could output filenames by hashset/UUID/etc, and along side the files, output an index.html file that provides the same information and can be sorted, sliced, diced, chopped, whatever, using HTML/JS/CSS and tables, lists, whatever. Furthermore, HTML indexes can use HTML (entities and images, etc) to format things like titles as closely to the original as possible, without all the "noise" TNC adds, or difficulties certain characters have when being part of a file name. It can also bring more clarity. More extensive notes and comments are possible that would just clutter a filename, or make it unsuitable for use, or make it so horribly long that it exceeds the pathname limits of he filesystem.

The TNC also makes it harder for maintenance. I often find titles that are malformed of missing information. "Adventures of Batman & Robin, The", "Adventures of Batman and Robin v2.1, The ", and "Barman and Robin v2.1" may all be the same title and version. just different naming and hashsets. A maintainer might not even catch such issues. If the title was de-coupled from the image file, it could simply be a UUID reference to another dat file that provides consistent titles, and only requires editing/correcting a title in one place instead of having to search through hundreds of dat files trying to find every malformed instance. It also de-couples the *title* from the *platforms* that carry it. IBM PC, Apple Macintosh, Sony PlayStation, etc, may have the same title published, but there only needs to be *one* reference to it in each system's catalog, along with the unique hashsets for that system. In a database, this is called database normalization, and done correctly can greatly reduce overall database size and improve performance.

Additional benefits include ideas like a single title may have, for example, a Wikipedia entry. The title dat file can include the URI to it without needing to find every instance of the title to add such a URI to it's data (or worse, to a filename that cannot include a ':', '/', or even a '?' query string). Other URI's, such as MobyGames pages on game titles can be included this way, and maintainers can add additional information and comments about a title that would otherwise have no ability to give under the TNC.

Even ISO image file cue sheets can be provided as *templates* instead of hard-coding file names in to them. Tools can identify an image file by it's hash, find the matching cue sheet, insert the appropriate file name and other data, and be good to go.

Tools which use the TOSEC data can also be simplified. They only need to work with the hashsets, which need to be computed regardless, and no have to write complex code to parse the local file names *and* hen parse the TOSEC names. Less code, faster run times, more accuracy.

Other options become availale or made trivial, too. The dat files can become SQLite database files which can be quickly queried. Updates to the database can be done with less work and affect many rows of he database with a single SQL statement. And changelogs can be kept of such statements to permit easier undo/rollbacks. Overall TOSEC files can be made smaller, using less bandwidth to download. Updates can be made in to small update files that only update changed data from yearly point-releases (differential updates), or even more fine-tuned to "once-a-year" point-releases, followed by monthly updates, followed by daily or as-needed updates. Using a DBMS/SQL can also add other functtionality such as "full text" searches on the data.

Another option includes making a web interface to the database and updating it almost trivial. Hundreds, if not thousands of users can help keep the database current, accurate, and usable, instead of only a handful dedicated maintainers "giving a little love" here and there as their time permits... And as an added bonus, such updates may be made by those more familiar with the data, improving it's completeness and accuracy. I may know of every revision/release of the Zork text-based adventures, and thus can contribute many more details than someone who has not heard of or played the game.

And I have only touched on some of the pro's/benefits that have been rolling around in my head for many years now.

Some potential problems exists, too. Older tools may not cope well a re-implementation. But that is not a real problem: Since all the data is still there, a tool can be provided output a "TNC-compatible" dat file for such tools. Another is the work involved in converting the current data to a new standard. As I understand the TNC, part of the rationale behind it's structure and ordering was to make possible to consistently parse the data--so creating tool to do just that and output a new data set should be trivial. What about detecting errors and inconsistencies? The current TNC does little or nothing in that regard already. I already gave an example above where a "The ..." was changed to a "..., The", but notice that it ended up putting the version string "v2.1" in there _and_ it resulted in a TNC "violation" because the version string should have appeared after the title.

The current TNC cannot handle such issues as-is. Separating and de-coupling the parts would actually make those kind of errors stand out more and be more obvious. Titles could further be split in to title, sub-title, sequence/series, and so on. The example I gave could be entered in the titles data as:

<title>Batman & Robin<prefix>The Advenures of</pefix></title>

Allowing one singular title, regardless of whether or not the prefix--or only part of the prefix--is found and matched.

So, again, can anyone give any good reasons _why_ the TNC continues, and should continue, to exist in it's current form? Or does anyone agree/disagree that it needs a long-due overhaul/rewrite?



Offline mictlantecuhtle

  • Global Moderator
  • Full Member
  • *****
  • Posts: 107
Let me first suggest that coming in here with a frankly supercilious and obnoxious approach to asking things is not going to get you the positive response you're hoping for.

The simple fact is, most people want their files to be human-readable, and TNC provides that. Even if we look at something like MAME which doesn't bother particularly with human-readable filenames for their sets, I suspect most people use a frontend which surfaces that information - that works for MAME which has a large and active contributor base and ecosystem around it.

We are aware that the method of storing information within the filename has limitations - we're not stupid, and if you think we haven't discussed exactly these issues endlessly within the team then you're even more ignorant than you appear at first glance.

Quote
I often find titles that are malformed of missing information. "Adventures of Batman & Robin, The", "Adventures of Batman and Robin v2.1, The ", and "Barman and Robin v2.1" may all be the same title and version

There absolutely are instances where this occurs, and I've spent a lot of time recently working to try and normalise titles across the Apple II database for example. However, there are equally instances where all three of these could represent distinct titles and need to be named differently as such. If you have found malformed entries, by all means please submit a post with this information on the forums and we'll do our best to look at it and fix any issues.

Quote
The TNC also makes it harder for maintenance.

I'm assuming you're basing this on your many hundreds of hours of work put into maintaining TOSEC dats? The filename method does have its limitations, but one of its many benefits is that it is tremendously fast and easy for maintenance compared to editing a complex XML file.

Quote
Tools which use the TOSEC data can also be simplified.

By all means go and tell the maintainers of these tools that. One of the things which has come up when considering using custom XML files is that we would need the major ROM managers (at least ClrMamePro and ROMVault) to support this format. That's not out of the question, but neither is it a trivial ask.

Quote
Other options become availale or made trivial, too. The dat files can become SQLite database files which can be quickly queried. Updates to the database can be done with less work and affect many rows of he database with a single SQL statement. And changelogs can be kept of such statements to permit easier undo/rollbacks.

Another option includes making a web interface to the database and updating it almost trivial.

Again, you're acting as if we haven't thought about these issues before, and they are certainly anything other than "trivial". Would a web-based interface be fantastic and helpful for getting additional people to input and provide corrections? Sure. Who is going to build and maintain it, pay hosting costs, etc. etc.? You have to understand that this is a voluntary effort by a small team and that we can only work with the resources we have.

Quote
So, again, can anyone give any good reasons _why_ the TNC continues, and should continue, to exist in it's current form? Or does anyone agree/disagree that it needs a long-due overhaul/rewrite?

This is the kind of attitude I'm talking about - we as a team have had long discussions about how we might overhaul/rewrite TNC in a way that works for everyone. Phrasing your question this way is unnecessarily hostile and arrogant. We welcome discussion about the project and we're always looking for help to maintain and improve things, but we certainly don't owe you any explanations.

Offline Casteele

  • Newbie
  • *
  • Posts: 6
I put this aside for the last 24 hours to carefully consider my response. I will not argue that my phrasing and presentation could have been better--but I will point out that this is text. There is no "tone", and "attitude" can easily be misinterpreted--which is the case here. I continue to wonder why the TNC exists and is applied as it is. That is genuine, and I gave many of my reasons why I question it.

On it's own, it is not really a major point. We all have conventions and standards we follow daily. My problem is that the TNC seems do go beyond a convention, and applied as if it was a requirement that I must follow on my personal computer, or if I disagree and want to use my own file names, then I have to do extra work to do so. I have continuously been frustrated with this for several years. And that is also a significant reason why I have not contributed my own many *years* of maintaining data sets that include data beyond just the TOSEC sets. As a human, I can manually and visually read the names and process items one by one. But with a couple thousand rows of data, I am looking for a way to make it machine readable and automate it--in minutes or hours instead of months and years.

I will not say "human readable" is any less important--but if the data is machine readable, it can be translated by machine in to whatever format you prefer on your computer. I can use the format I prefer. John Doe can use an entirely different format. No one is dictating to anyone else about what they can and cannot do. Technically, neither is the TNC directly mandating the naming scheme, but indirectly by the way it uses it to format the data files.

Also note that my focus is _not_ about the filenames. I could not care less what the file names are. It is the data itself that I want. Once I have the data, I can format it according to the TNC rules far more easily than trying to parse it back in to data. Add in the complications of malformed names and other "edge cases", and it quickly escalates. And to be honest, if I were to write some kind of ROM manager, I would use the TNC as-is simply because trying to use or suggest anything else... Well, I look at this thread... I have been directly and personally attacked--told I appear ignorant, and accused of being arrogant--which is your interpretation based on emotionless text. Far less hassle if I just just use it as-is, instead of trying to improve it or do it differently.

But what if you offered something better? Even if you use the TNC names in the data files, but offered a libtosec.so/.dll to handle the processing and such, so that the other tool writers can focus on their own tool, instead of needing to learn all the complexities of parsing complex data. This is the kind of thing I say makes writing tools easier. Nearly every programming language in popular use has libraries for working with XML data. JSON is also fast becoming a popular format. CSV, INI, and so on, are not that difficult to find a library for, or "roll your own".

That is also why I say maintenance would be easier. You stated you do not want edit complex XML data. You do not have to. There are many generic tools available to assist you on focusing on the data itself, and not worry about the underlying structure of the XML. How do you edit the files currently? Do you have and use an such tools? Do any exist specifically for TOSEC dat files and TNC names? I believe if you make it easier to work with the data _programmatically_, someone will likely start writing tools to assist.

A real case scenario with several of these ideas at play: FTP. How often do you see or use FTP clients (and servers) anymore? Web browsers have long included the ability to work with the FTP protocol. And they made it easier. Users no longer have to worry about the different sockets (control and data), data modes (text or binary), and in many cases, even line endings (CR on Mac, LF on Un*x, CRLF on MSWin) are handled transparently. That is what I mean by "making it easier"--If it comes to looking at and working with raw XML files in a text editor, I would agree and say "I'd rather get stabbed in the eye with a rusty spoon!"

The same ideas regarding making web-based tools to assist. You mentioned limited resources, which I will address first, because that is one of the major points in m mind: The TOSEC dat files are _HUGE_, even with compression. They also containt a lot of repeated and redundant data:

<game name="this extremely long name is repeated multiple times">
<description>this extremely long name is repeated multiple times</description>
<rom name="this extremely long name is repeated multiple times.rom" ...></game>

1. The name is repeated, taking up space and bandwidth--even with compression. Zip/gzip/etc are no magical, and can only do so much.
2. Every time someone wants/needs to edit par, hey have to make sure they edit it in all three places.
3. If the name occurs on multiple files and multiple dat sets (for example, the Narional Geographic 100 years anthology on Mac and Win), you have muliplied our work by three times as many entries to find and fix across multiple files
4. Back to the immediate topic: all that data uses bandwidth. And bandwidth generally costs money, as well as computing resources.

Legend has it that a commercial airline once saved millions of dollars a year by simply putting one less olive in each passengers' salad. That is the kind of thing I am thinking about here. Not only to reduce resources used, but also to get the optimal use from unavoidable resource usage. Those few bytes saved may very give enough flexibility to redirect towards that "web based interface" which you agree would be great to have.

So I will ask you again, "why does the TNC still exist", and this time I will clarify myself to avoid further misunderstandings: I see so many advantages to, at the very least, downgrading it to a "recommendation" that only applies a the file system level. It need not (and in my opinion, should not) apply within the dat files which are files, not file systems. However, there may be other reasons I neither see nor are aware of--So I am asking "why" to find out. For example, you claimed "many benefits"--what are those benefits? What am I missing/not seeing? And yes, you do not "owe" me an explanation or answer. You could simply say "because we like it this way", and I would have to accept that. Or you could provide an answer (even something like "we as a team are unable to reach a consensus") that I can either understand, or potentially provide something more useful to address it--such as maybe offer my own tools I already use and maintain to parse (or attempt to parse) the names and put in to a SQLite database. Or the one that takes that data and makes nicely formatted web pages, parsing error reports, detection of malformed names or problem names... Or even the one that constructs fully TNC-compliant file names.

If you still feel _I_ am being hostile or negative, then please help me to understand how/why/where so I can correct that.

Offline mictlantecuhtle

  • Global Moderator
  • Full Member
  • *****
  • Posts: 107
If you don't understand how coming in to a project and immediately telling the people working on it that they're doing things wrong and demanding that the standard they use be "obsoleted" can be taken to be arrogant or ignorant, there's very little I can do to help you. I would also note that it is entirely possible to convey both tone and attitude through the written word - we've been doing it for millennia - and that the tone of your initial enquiry was rude.

The simplest answer as to why the TNC exists and will continue to exist for now in its current form is that, while we are aware of the limitations, we are equally aware of the difficulties in implementing and working with some other system for doing things. Put bluntly, the TNC and the way we put things together works for 99% of use cases, and is pretty resilient to boot. Do we occasionally have issues with malformed file names, or where it's hard to convey a particular bit of information? Absolutely. But across a project which has a database of over a million files, these are vanishingly small occurrences.

The other answer is that we have tried a lot of the things you've suggested before, and for one reason or another they have not worked well or in some cases have led to data loss. We've rolled our own tools before, but IIRC this was stymied by lack of time/technical knowledge - see my point previously about this being a small and voluntary effort. We even had a web-based database to deal with TOSEC-ISO specifically back in the day (before my time), but I believe that went spectacularly wrong. The system we have might be simple and inefficient in places, but it is one which does not have a single point of failure - you can use "off the shelf" tools right now to start updating and editing TOSEC DATs.

As far as the point about the naming convention being enforced, you'll also notice that every other project out there, whether that be MAME, Redump or No-Intro also has a naming standard which is enforced by their DAT files. To be honest I'm not sure how you visually represent to the end user that they have a given file without presenting that information in some kind of human-readable fashion which is standardised across all users.

To address some of your other points below:

Quote
<game name="this extremely long name is repeated multiple times">
<description>this extremely long name is repeated multiple times</description>
<rom name="this extremely long name is repeated multiple times.rom" ...></game>

1. The name is repeated, taking up space and bandwidth--even with compression. Zip/gzip/etc are no magical, and can only do so much.
2. Every time someone wants/needs to edit par, hey have to make sure they edit it in all three places.
3. If the name occurs on multiple files and multiple dat sets (for example, the Narional Geographic 100 years anthology on Mac and Win), you have muliplied our work by three times as many entries to find and fix across multiple files
4. Back to the immediate topic: all that data uses bandwidth. And bandwidth generally costs money, as well as computing resources.

1. This is literally part of the standard format used by the major existing ROM managers (see https://github.com/SabreTools/SabreTools/wiki/DatFile-Formats#logiqx-xml-format).
2 & 3. This can be an issue at times, although it's quite easily addressed in my experience by using find/replace all in tools like Notepad++
4. While I appreciate efficiency can be good, in the grand scheme of the modern internet we're not talking about that much bandwidth

As far as the offer of using tools you've developed yourself, I'd certainly be interested in seeing what these are and how they work. Part of the issue when I talk about limited resources is simply lack of time and technical knowledge to do things like this. I can't promise we'll end up using anything you give us - I'm just one of a team - but at the very least I'm interested in seeing what is on offer.

Offline Casteele

  • Newbie
  • *
  • Posts: 6
Quote
As far as the point about the naming convention being enforced, you'll also notice that every other project out there, whether that be MAME, Redump or No-Intro also has a naming standard which is enforced by their DAT files. ...
That part alone now confirms to me something I have suspected but was not certain of. First, not one of the tools enforces any format in which I have to name my own files. They enforce file formats so that they can read and parse the data within the data file, but that is a format, not a file name. They _may_ enforce the file names of the data files (as in, it must be named "tosec.dat" and placed in your home directory) they need to find and read for operation, and the format of such a file, but they do not force me to name my other files in any way.

When I extracted the TOSEC dat file archive, I was able to extract it where I wanted it, specifically, under a "tosec" folder in my "emu" folder. Would you feel you can tell me I must extract it to a folder named "TOSEC DAT FILES", or anything else? That was the impression the TNC gave me--that I had to use a properly approved name.

Using ClrMAMEPro's example from their documentation, a data file entry looks like:

set (
name pacman
cloneof pacman
description "PuckMan (Japan set 1)"
rom ( name namcopac.6e size 4096 crc fee263b3 md5 3f84d78d59147b9b3c816da72110e55f)
sample shot.wav
sampleof galaxian
)

I am guessing that ClrMAMEPro is fully capable of renaming files to follow TNC, since the data above has most of the information needed. (I am on Linux, so I cannot use CMP under my system wine or mono configuration.) But no where in that example does it even show what the TNC name _would_ be. It clearly separates the name, description, and other data instead of pushing it all together into a single name, without dictating to me how the final composed name must be.

Because of the way the TOSEC dat files treat the _text_ inside of the dat file as if it were an actual existing file on my hard drive, that is what has frustraed me and made me believe that TNC is being forced and enforced. I may not even have that ROM image on my hard drive, so there is no file with that name and no file that can/should be renamed to that.

So I am not talking about an actual existing (or not) file on my drive. I am talking about a line of _text_ that I am trying to parse to identify the data about that ROM, if it were to exist, possibly under a different name. In TOSEC dat files, I have to write code to figure out which part of the name is which, by reversing the TNC rules. It would be simpler if the TOSEC dat files had name="title" date="date" publisher="publisher" extension="rom", which I could then do something far more simpler (JavaScript-like pseudo code):

var name = game.getAttribute("name");
var date =  game.getAttribute("date");
var publisher =  game.getAttribute("publisher");
var extension =  game.getAttribute("extension");
var newName = name + " (" + date + ")(" + publisher + ")." + extension;

Five simple lines of code that even a non-technical person can probably figure out what it does.

But the TOSEC dat files encode the name in the dat file, even though it's just a line of text: <rom name="game (date)(publisher)extension" ...>

For that, instead of five lines of simple string concatenation, I first have to write several lines of code to figure out which part is the date. Is "(date)" the date? Or is "(publisher)"? As a human, we can immediately see it and know. A computer program, however, has to do many small steps to figure it out. TNC rules state that the "(date)" must be the first parenthesized part--except if there's a "(demo)" tag which proceeds it. So I have to write several more lines of code to check if there is a "(demo)" tag, and if there is, then it has o assume the second ()'d item is the date. I have to check if the date matches the date format: numerical text instead of alphabetical text--except in the case of a single "-" or an "x" to indicate an unknown number at that position. Some dates may use "JAN" instead of "01", so I have to write more code to handle that. Some may use YYYYMMDD, some use YYYY-MM-DD, some use YYYY.MM.DD (despite the TNC giving specific rules about how to format dates), adding even more code to check for those possibilities.

And some entries are missing the date tag/component entirely. More code to check for that.

Start adding in all the other possible tags/components, and how to handle them if they are present, missing, or malformed... And the code has grown to several thousand lines. And all that code takes time to execute. The most recent dat file set has nearly 875000 <game ...> entries. if the code takes one second to fully parse and process the name from the dat file, it would take a little over 1 week and 3 days to process the 875K entries in the dat files.

If the code takes 1/10th of a second per name, it would still take a little more than 24 hours to process the entire set. At 1/100th of a second per name, about 2 and a half hours. Every bit of additional code slows it down even more.

But now I see there is a misunderstanding. The names in the dat files are _not_ actual file names, they are text data encoded in an XML format and encoding. They are only what the names _would_ be _if_ I actually had the file and named it according to the TNC. I have no issue with that, as I pointed out--only five lines of code (a few more, actually depending on how many tags/flags/components there are, but still far lass than hundreds of lines needed to parse and decode the name), which would execute very quickly, and only on the files I actually have.

I am not even certain I am expressing this very well.

Here is a line from "./TOSEC/Atari 8bit - Games - [XEX] (TOSEC-v2014-10-30_CM).dat":
"Winter Wally (1987)(Alternative Software)(GB)[h Paul Foster]"

Are either of hose lines the actual file? Did I have to upload them in order to copy/paste them in to this post? No, hey are just text. Likewise, the data in the dat files is not the files, it's just text data about he files--metadata.

But the fact that the text data is encoded in the TNC format is what has continued to make me believe that the TNC _must_ be used, without exception, and that it exists primarily to enforce that. That is what I am opposed to, why it makes me feel as if the TNC dictates to me how I must name my files, _and_ _must_ format any metadata about the file. I am _not_ allowed to:

<file name="Winter Wally" date="1987" publisher="Alternative Software" region="GB" flag-h="Paul Foster">

Nor am I allowed to name the file something else on my hard drive. "The text in the dat file follows the TNC, so must everything else."

If our positions were reversed, and you were under the same understanding I was, would you not also oppose it? Would you not come in here asking why?

( And I am still not certain I am being clear... )

Offline Kirkland

  • Newbie
  • *
  • Posts: 1
Ok, I'll take one for the team.  ;D
First, a little trip down memory lane.
Twenty some odd years ago, there was chaos.  We were living in a world where random Joe Blow would cobble together a hardware device to dump the images of his favorite video game cartridges.  It may have been for his own personal archive, or as part of a release group to spread his fame and notoriety. and from here, mistakes were made.
First, it was probably using a command line interface, or was hooked to some janky device that only recognized 8.3 filenames.  And thus was born "SPRMARIO.SMC"
But there was a problem.  Jane Doe also had a backup device, and she also created a file called "SPRMARIO.SMC" to run on her favorite emulator.  But those images of the cartridges were vastly different.
Jane Doe just wanted to play her game on her favorite emulator.  She renamed it to "Super Mario Brothers.smc" and slid it into a roms directory under her emulator.  Jane was in the United States, and thus her rom was created from a cartridge meant to run on NTSC systems.
Joe Blow on the other hand, considered himself a "l33t hax0r" and lived in Germany.  His cartridge came from England, and was of the PAL format.  He patched his rom image to work on NTSC systems so it would play on his favorite emulator.  He called it "SMBros.bin" and gave it to his buddy in France.  The guy in France says, "It won't run on my PAL-based hardware copier."  Because it had been patched to NTSC.  So the guy in France gives a copy to his friend in Italy, who realizes this is a NTSC hack.  He renames it to "Super Mario Bros (NTSC fixed).smc"  And so it sits on his hard drive until he sends it to a guy in the United States.  Another "L33t HaX0r" goes in and hard-codes the game genie code for unlimited health and calls it "SMB1-Unlimted Health hack.bin"  And this is the crap we were handed 10 years later to decipher.
So 20 years ago, fresh off of assisting Cowering with GoodN64 (and a fair amount of renaming SNES roms), Grendel approached me about creating a standardized system for renaming roms and discs that would be resilient enough to handle all the variations of fuckery that was afoot.  So for the next 8 months I dug through system after system, poring of personal collections of the old hacking scene to try and narrow down all the possible combinations and permutations to be able to accurately define just what game image we were looking at, and thus TNC v1 was born.
Was it perfect? Far from it.  But what we needed was a Kingdom/Phylum/Class structure.. a librarian's "Dewey decimal system" to be able to document and classify what was out there.
Now at the time, Cowering had his GoodTools- which would scan your directory and based on the crc of the file would rename it based on his internal database.  Now the rest of the world didn't have that ease-of-use luxury.  The only tools that were available at the the time were ClrMame Pro and RomCenter- which both exclusively worked only with MAME.  The way they worked was that if you executed "MAME -listinfo", it would output all sorts of information about the roms (and rom sets) it was expecting to have available to emulate a game.  Since I saw the potential, I fired up Turbo Pascal and created a dummy MAME.exe that output the contents of my custom rom.dat file which had Atari 2600 rom name and crc info and after a little convincing Roman Scherzer that people could and would create databases, rom dat files were born.  RomCenter support soon followed.
TOSEC stepped up and individual contributors that were passionate collectors in their fields of expertise started to comb through 30 years of cartidge and disk information, trying to make sense of the 8.3 filenames and custom naming in the hopes of getting a standardized one-size-fits-most baseline.
So rolling back to where we started.  Jane Doe has a simple "Super Mario Bros. (1989)(Nintendo).bin" while Joe Blow's rom is now "Super Mario Bros. (1989)(Nintendo)(PAL)[f NTSC fixed][t Unlimted health].bin - but before it had the unlimited lives added, it was "Super Mario Bros. (1989)(Nintendo)(PAL)[f NTSC fixed].bin" and when it was originally dumped by Joe it would have just been "Super Mario Bros. (1989)(Nintendo)(PAL).bin"
That's the whole Kingdom/Phylum/Class structure.  So it might seem like overkill, but that's what it takes to differentiate these copies in a generic to most descriptive manner.  This also allows us to group sets of multi-game discs that have similar attributes (all hacked by a guy, or all that have unlimited money patches), keeping them separate from the original images what have no such alterations.
You ask if it should be obsoleted- I think you might not grasp how much effort has gone into being able to correctly and consistently classify a vast array of disparate systems under one naming umbrella.  There will always be fringe case outliers, but as we've proven, most images can be renamed to fit the TNC's structure.
I suggest you take a look at the actual naming convention, and where you think there can be improvements feel free to make suggestions, and you will quite likely find that similar arguments have been raised before and sufficient counter-arguments have shown where there are other system/instances that would break your idea.  Like I said, there are outliers (those pesky Amiga guys) where there needs to be per system adjustments, but overall, a standard is a standard.
Now circling back again, you had mentioned before about the redundancy of the names being in the dat files 3 times over.  This has nothing to do with our naming and is solely based on how Romcenter stores its information.  As a lot of collectors enjoy the ease of setup and interface of RomCenter (ClrMame is trickier to setup), we are shoe-horned into Eric's way of listing rom information in dat files  for *HIS TOOL*.  This involves the name displayed in his graphical interface (the first instance), the name of the zip file your rom will be placed into (the second instance), and the name of the individual rom file (the 3rd instance).  I think you might be incorrectly assuming we have anything to do with how his data files are formatted.  ClrMame can parse Romcenter's .dat files, so we go with the lesser evil as opposed to maintaining two sets.
So rolling back around  to your "name/date/publisher/extension" XML idea.  We've been down this road a couple of times, and the main issue was database integrity and the fact that there are so many possibilities.  We tried a GUI once, but once you got beyond a simple name/date/publisher, it got weird with PAL/NTSC, what's the version number? Is it cracked? patched? so there was just an overwhelming number of options and it scared people away.
Most maintainers simply load a rom, play it, copy down the relevant info from the splash screen and physically rename the rom on their hard drive.  When it comes time to update, they use dir2dat to just generate a dat file and it gets posted.  There are no special tools- just good old fashioned file renaming.
Most of the backlash you are getting is you are telling guys who have put hundreds of hours into a hobby they are passionate about that "your shit is obsolete and shouldn't exist" and they take that personally.  I think you are aiming more at the tools we are forced to use to get accepted by the masses.
As for making it better?  Feel free.  This is all at-will- pick up a shovel and dig in territory.
Do you *have* to use our naming?  Well, no one is forcing you to.  Rename your files to FuckAll.Wankers.bin- trust me, I don't care.  This is like a library.  We try to sort and organize to the best of our ability in the free time we have to contribute.  We'll sort and rename everything to our own Dewey Decimal system (do you guys even have that overseas?) but you are completely free to throw your books on your personal bookshelf in whatever haphazard, non-standard way that suits you.  But don't be surprised for some backlash when you come into our house and start pointing fingers about how we've done things and improved upon them for 20 years.