Hey,
I've been quite busy so my reply might be a bit short. I see you have a lot of work in there but if you have the time, go ahead.
There are many information sources indeed. Some might be less credible, here we also have sets that are badly named and always improving (see all the updates to Commodore lately).
Some time ago we tried to implement some of the things you talk there but the common problem to all of us is always the lack of time so the ideas stopped (and some are only paused). The issues i've mentioned are related with out naming convention, and are there since the creation / introduction of such flags (dinosaur Cassiel or others might now the reasons). For instance, TOSEC sets all are renamed according to those rules and there is a flag named "Media Label", used to input the name / text in the label of the disk/disc/tape/whatever when needed (e.g. "Installation Disk"). The problem with such flag is that it must accept any text inside and as a result any typo in other flags will (normally) end up being parsed as media label.
[Example: "Title (1999-10-10)(Publisher)(US)" is a correct TNC name, "Title (1999-10-10)(Publisher)(Us)" is still correct but Us now represents a media label. This is a limitation of TNC and generally of using strings/set names to save the information. We can't save everything there easily. We can parse that automatically but some of the flags may end in the wrong field. Another example are the dump flags, many support the flag info (ex. modification info) and also the author of such thing. Still, if you have [f PAL] you cannot automatically or even manually know 100% what it means. You can have a set of rules and use common sense. For instance, in this case most will say it means a fix to work in PAL systems but in rare cases it could also mean a fix, without description, done by some group or guy named PAL]. We have tools that can parse the setnames, check and generate them. They suffer from the limitations and issues explained before, caused by the complexity of our naming scheme. We still hope to find time to solve such issues and bring some nice/new things to the project
Using XML (or other such format) is an old goal but haven't been done yet. One of the major issues with such thing is the lack of time to update/create tools for that in our side but specially the way TOSEC works. For renamers, it is way more pratical to pick sets, play with the files (emulators, hex editors, disassemblers or other tools) and then rename them accordingly. On previous discussions they always hated the idea of playing with an extra tool or form.