Author Topic: Why is Tosec so English-centric?  (Read 4667 times)

Offline Kodoichi

  • Full Member
  • ***
  • Posts: 162
Why is Tosec so English-centric?
« on: November 27, 2011, 10:26:48 AM »
Any special characters of non-English languages have been removed from the filenames in the dats and it is assumed that English is the primary language in the computer world. "Foreign" games get the according language flag, but English ones don't.

I noticed that Tosec is going more and more into a direction where only English people have a say and everyone else has to adapt to their history/archiving rules.

:(



Offline Diaboł

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 204
Re: Why is Tosec so English-centric?
« Reply #1 on: November 27, 2011, 11:15:49 AM »
There was / is a problem with using DAT files containing specific diacritic marks. It was tricky to create one and it was problematic to scan / rebuild files.

Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1574
    • Email
Re: Why is Tosec so English-centric?
« Reply #2 on: November 27, 2011, 04:26:30 PM »
Any special characters of non-English languages have been removed from the filenames in the dats and it is assumed that English is the primary language in the computer world. "Foreign" games get the according language flag, but English ones don't.

Personally, I agree with your criticism. I think the Language flag should ALWAYS be defined (even if English), and only language neutral software should have a a missing language flag. However, the group decision was to have it set up as it is now.

I noticed that Tosec is going more and more into a direction where only English people have a say and everyone else has to adapt to their history/archiving rules.

Well that's a huge assumption on your part, don't you think? You know as well as any of the diverse geographic locations and primary languages of the key players in this project. Not exactly ten white, middle class, English speaking only Americans, hmmm?

As for going "more and more into the direction", the way the TNC has ALWAYS been is no 'special characters', such a diacritics etc. This is an (archaic) ASCII issue and not a 'political' one.

Regardless, the ASCII only issue is likely to be resolved soon once I can find some time to dedicate to testing some Unicode DATs (yeah, I know I've been saying that for a while!).

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: Why is Tosec so English-centric?
« Reply #3 on: November 27, 2011, 08:01:24 PM »
Hey Kodoichi,

That is a complex topic and something that we should improve in the future years. There are at least 2 distinct parts of it that we need to solve. The language/charset of the information (set names, documentation and others) and also the language/charset of the data (this means mostly rom/file names).

First we can not forget that, since the begining, english language has been used as the lingua franca of computing.

Citing some random wikipedia page:
Quote
Due to the technical limitations of early computers, and the lack of international standards on the Internet, computer users were limited to using English and the Latin alphabet. However, this historical limitation is less present today. Most software products are localized in numerous languages and the use of the Unicode character encoding has resolved problems with non-Latin alphabets.

Taking that, we can start thinking about the first point as something possible. Due to the support to Unicode in most recent OSes (Linux, Win7), it is possible to have documentation and even to use setnames in their original form. Still, this carries a lot of drawbacks (mainly the setnames part) and few benefits imho. Having setnames in different charsets makes them unpractical to use (from harder to impossible). For instance, even in Win7, although i can rename files with utf8, these chars don't appear correctly in the console. That's one of the reasons why we look at MAME and other projects and see the romanization of most non english titles. Typing (renaming, launching, searching, ...) titles / files that use chars most keyboards do not have and people don't know or has no idea how to type is a real trouble.

Point 2: We all want to have roms/files from each set correctly renamed with their *original* names. That is something essencial to preserve them and, in many cases, make them work on the original hw. With media types that were correctly dumped into images this is always (i guess) preserved. The main problems are related with multi rom sets, where the set is just a zipfile containing the content of some directory or disk/disc. In this case, while rebuilding with any tool, the files can be renamed, timestamps can be altered and all that kind off agressive things to data preservation. This is our main problem currently. Until recently, cmp would not work correctly with those sets since the charset was not specified in the datfiles (even in xml dats). The latest builds seem to support it now but i haven't tested. There may be also problems at a lower level, with the correct support from OS or other software such as packers (zip/rar), etc.

Point 2 is the main reason why some of the romnames were renamed, something that IS incorrect and may broke them. Unfortunately there is no easy solution as those kinds of sets just suck :)

Offline Kodoichi

  • Full Member
  • ***
  • Posts: 162
Re: Why is Tosec so English-centric?
« Reply #4 on: December 03, 2011, 01:48:25 PM »
the way the TNC has ALWAYS been is no 'special characters'
No, I remember older dats had German umlauts (ä/ö/ü), in the last two releases they all were changed to "ae/ue/oe".

You know as well as any of the diverse geographic locations and primary languages of the key players in this project.
Yes, and I don't understand why anybody else, whose primary language isn't English, didn't bring up that topic. We make sure that every flag and other info is absolutely correct, but when it comes to the name of the game/tool, it's not that important?

Projects like Musicbrainz have names of "foreign" musicians/bands both listed in the original language (for example Japanese/Kanji) and in plain English in the database. In their MP3 tagging software (Picard) there's an option to choose in which language the names should be saved on your harddisk.

Would it be possible to do the same with the entries in the Tosec dats? This means that renamer tools like ClrMame and RomCenter etc. would have to support that language option in the future.

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: Why is Tosec so English-centric?
« Reply #5 on: December 04, 2011, 07:39:26 PM »
The same issue has been lifted before by some members (TKaos, Diabol and gorski for instance) and it is quite complex as said. In your example (the umlauts) it might look like we did gave a step back but imho we haven't, nothing about that changed in TNC yet. Those changes (and a few others) were just to clear out the problems/inconsistencies that existed for years in our dats (and current TNC).
Since i entered the project, TNC stated that only low ascii chars were accepted. As many of the rules stated there, this was not checked at all over the years and this lack of enforce let some renamers insert 'non acceptable' chars in dats.

This leads to problems when scanning - files are renamed differently from what expected, and also while creating dats (with dir2dat for instance) - some chars were not accepted and replaced with strange, even worse chars (such as "??").

We rediscovered this issue recently due to some new Diabol dats. He added some Texas TI roms with strange chars, this dat did not work as expected and since the renamed sets would have differently named roms the software might not work. Since then cmp was updated so i don't know how it works now.

Bottom line, there's no easy solution. We can not save all that information to a single line which needs to be used as a filename. We do have some ideas but it takes something we currently lack: lots of time :D.

Offline gorski

  • TOSEC Member
  • Jr. Member
  • **
  • Posts: 74
Re: Why is Tosec so English-centric?
« Reply #6 on: December 05, 2011, 08:15:46 PM »
Please all stay away from  non-english chars for good :)
beacuse i will add броненосец and many more and this will be trouble .. :)
I try new unicode xml dats and i'm scared and trust me they not rebuild fine on my hdd (clrmame)  (Bulgarian localization on system win7 64 bit) and not visualized fine too .. :)
Engrish ruLEz :)

Greets,
Gorski
p.s.
[Kodoichi]
I noticed that Tosec is going more and more into a direction where only English people have a say and everyone else has to adapt to their history/archiving rules.
[Kodoichi]
80% from tosec workers have not english for native language but used english in dats.. :)
here is free world :) if you want to make Japanese or China  Branch (разклонение) of TOSEC this is not bad idea :)