Hey Kodoichi,
That is a complex topic and something that we should improve in the future years. There are at least 2 distinct parts of it that we need to solve. The language/charset of the information (set names, documentation and others) and also the language/charset of the data (this means mostly rom/file names).
First we can not forget that, since the begining, english language has been used as the
lingua franca of computing.
Citing some random
wikipedia page:
Due to the technical limitations of early computers, and the lack of international standards on the Internet, computer users were limited to using English and the Latin alphabet. However, this historical limitation is less present today. Most software products are localized in numerous languages and the use of the Unicode character encoding has resolved problems with non-Latin alphabets.
Taking that, we can start thinking about the first point as something possible. Due to the support to Unicode in most recent OSes (Linux, Win7), it is possible to have documentation and even to use setnames in their original form. Still, this carries a lot of drawbacks (mainly the setnames part) and few benefits imho. Having setnames in different charsets makes them unpractical to use (from harder to impossible). For instance, even in Win7, although i can rename files with utf8, these chars don't appear correctly in the console. That's one of the reasons why we look at MAME and other projects and see the romanization of most non english titles. Typing (renaming, launching, searching, ...) titles / files that use chars most keyboards do not have and people don't know or has no idea how to type is a real trouble.
Point 2: We all want to have roms/files from each set correctly renamed with their *original* names. That is something essencial to preserve them and, in many cases, make them work on the original hw. With media types that were correctly dumped into images this is always (i guess) preserved. The main problems are related with multi rom sets, where the set is just a zipfile containing the content of some directory or disk/disc. In this case, while rebuilding with any tool, the files can be renamed, timestamps can be altered and all that kind off agressive things to data preservation. This is our main problem currently. Until recently, cmp would not work correctly with those sets since the charset was not specified in the datfiles (even in xml dats). The latest builds seem to support it now but i haven't tested. There may be also problems at a lower level, with the correct support from OS or other software such as packers (zip/rar), etc.
Point 2 is the main reason why some of the romnames were renamed, something that IS incorrect and may broke them. Unfortunately there is no easy solution as those kinds of sets just suck
