Author Topic: TOSEC Dat Explorer  (Read 16622 times)

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: TOSEC Dat Explorer
« Reply #15 on: January 18, 2011, 04:41:26 PM »
Although the tool has several problems, this time it is not my fault :P

The problem is in the dat itself, there is actually a ³ rather than a ł.
If you open in in a text editor you will see:
Code: [Select]
<author>Diabo&#179;</author>
rather than
<author>Diabo&#322;</author>

#179 represents the entity ³, it should be a #322. (see here and here).

Just try writing the author tags here without the [ code ] tags and you will see what i mean.


Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1574
    • Email
Re: TOSEC Dat Explorer
« Reply #16 on: January 18, 2011, 05:02:12 PM »
I really wouldn’t bother looking at this…

IMO (which I’m sure will come as no surprise) we should move over to using XML exclusively in future releases anyway… this issue (and the others) will simply fall away. PandMonium can simply amend the TDE to parse XML DATs instead (which will displayed all these characters correctly…)

(well providing he still willing of course! sounds like I’m dumping a major re-development project in his lap   :)  )

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: TOSEC Dat Explorer
« Reply #17 on: January 18, 2011, 07:33:16 PM »
I guess you didn't notice but we are indeed talking about XML dats :P
The tool already opens both [although you have to say which type they are, which sucks since this could be done automatically]. Diabol was reporting a bug, when opening his XML dats, the name still appear as "Diabo³" instead of "Diaboł" on the interface, similar to deprecated format dats.

In this case the problem is in the dat itself, it has a ³ instead of ł there. This raises new questions, Diaboł must have created it with cmp, inserting Diaboł in the author field. Why the ³ then? Is this a bug in cmp? Without an explanation for this i have doubts xml dats is the answer to texas problem :P

Offline Diaboł

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 204
Re: TOSEC Dat Explorer
« Reply #18 on: January 19, 2011, 07:58:16 AM »
I did ask Roman about that "feature". Lets hope it's just a simple mistake. The funny thing is that if you load the DAT into cmp and click "Show info" you will see ł displayed correctly.

Offline Diaboł

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 204
Re: TOSEC Dat Explorer
« Reply #19 on: January 19, 2011, 11:16:21 AM »

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: TOSEC Dat Explorer
« Reply #20 on: January 19, 2011, 04:05:39 PM »
Nice, he seems helpful so lets see if the problem can be sorted, whatever it is. Still, it would be good to have an option to support utf-8 by default.

This take us yet again to the non low ascii chars issue, something we need to discuss properly. Using it in roms may be needed to preserve some  weird pieces of software, as for setnames it may/will cause several unneeded problems. :P

Offline Diaboł

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 204
Re: TOSEC Dat Explorer
« Reply #21 on: January 22, 2011, 01:17:13 PM »
I guess we can skip the UTF idea for now. Looks like there are no tools that can handle it properly. It makes the whole TI situation a bit complicated. Maybe I will have to skip all the sets containing files with characters different than "low ascii". Very annoying.

Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1574
    • Email
Re: TOSEC Dat Explorer
« Reply #22 on: January 26, 2011, 05:39:34 PM »
This character encoding is all becoming very frustrating…..  >:(

I think we need to do some basic testing ourselves, because what Roman is saying contradicts what he said to me before. And when I did some rudimentary testing re TI-Nspire I didn’t notice any of these issues.

From what I understand or have observed:
- When creating either a legacy DAT or an XML DAT (will refer to these as ‘DAT’ and ‘XMLDAT’ from here for clarity), CMP does not automatically declare the XMLDAT as UTF-8 (“Unicode”). This is to be expected. XML doesn’t automatically mean UTF-8, and you can ‘declare’ an XML document to be encoded as pretty much anything. This is what I originally mentioned here: http://www.tosecdev.org/index.php/forum/index.php?topic=191.msg2283#msg2283. This flexibility is kind of the point, and this encoding declaration is something you will see in XML based documents (like on the web, docx, ini’s etc). I do agree that having a simple toggle option in CMP DAT2DIR module would be useful however, but hey-ho….
- When creating an XMLDAT thru CMP, the “name”/”rom” with High-ACSII characters are created correct (from when I have tried, and Diaboł has tried - subject to the XMLDAT having the correct header).
- Diaboł is saying that the header is not created correct (with High-ASCII authors)? I don’t know about this since I didn’t realise anyone even used this part of the DAT2DIR. I always leave blank then open the new DAT in a text editor (Notepad++) and copy the existing header into the new, simply increasing the date counter.

When I get a chance I’ll create a test XMLDAT and test group of Low/High-ASCII files I think. Would be very useful if everyone tests it and gives feedback. Since pretty much everyone here (the ‘regulars’ I mean) are from different parts of the globe, this should be a very effect/broad test!

I know this all may be a bit painful/learning experience in the short term, but long term I really think it will benefit us.

This is all far from impossible of course…. MAME as been using XML for years and the latest MESS Software XMLDATs even include roms with Japansese Kanji!

We can get this licked too (for our uses)…..

Offline Tim2460

  • Newbie
  • *
  • Posts: 28
Re: TOSEC Dat Explorer
« Reply #23 on: January 26, 2011, 07:01:04 PM »
When I get a chance I’ll create a test XMLDAT and test group of Low/High-ASCII files I think. Would be very useful if everyone tests it and gives feedback. Since pretty much everyone here (the ‘regulars’ I mean) are from different parts of the globe, this should be a very effect/broad test!

Count me in if you need some testers !

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: TOSEC Dat Explorer
« Reply #24 on: January 27, 2011, 03:29:11 AM »
Indeed that needs to be checked as i already said.
There are 2 distinct parts to address:
1) to create dats with non low ascii chars with the existent tools (in sets and romnames)
2) test the dats to see how they work and problems with them and OSes. You already know my point and setnames with non low ascii chars, in romnames it may be necessary for some weird rare old stuff unfortunately :P

As for the mame / mess xml part, AFAIK their lists are created manually (in case of mame) or with other tools in mess and they use kanji and others in several fields (description for example) but not in filename obviously, sets and romnames are always low ascii and not so long they enforced the 8.3" (not relaxed) for MAME.

We all already know that it is possible to use (almost) any set of characters by using UTF(-8), for example what happens in this post:
 浅き ТУФ זה כיף סתם ל υγμία ζω tę łódź Áḋaiṁ пошкодить कोई पीडा नहीं होती أنا قادر ม่ทำให้ฉันเจ็บ 않아요 傷身體 ້ຍເຈັບ איך קען میں کانچ کھا வராது. ি হয় না। नाही. ನ್ನಡವೇ ನಿತ್ಯ माम् აველი

So, it is not hard for me to manually create a XML with that kind of information. The problem here is trying to use files in any OS, read them to create a datfile and later use that datfile to rename it somewhere else. Something that seems to be tricky / hard / impossible(?) to guarantee (100%) and i really don't feel comfortable with, unfortunately it may be needed (in romfiles, not sets!) for some weird cases.