Author Topic: Encoding error in XML?  (Read 934 times)

Offline SpaceTaxiInvader

  • Newbie
  • *
  • Posts: 3
Encoding error in XML?
« on: October 01, 2015, 10:53:40 AM »
Hi

In TOSEC/Atari/ST/Applications/[ST]:


        <game name="Desktop-Icons f^Ár TOS 2, 3 und 4 v2.1 (1993)(-)(de)">
                <description>Desktop-Icons f^Ár TOS 2, 3 und 4 v2.1 (1993)(-)(de)</description>
                <rom name="Desktop-Icons f^Ár TOS 2, 3 und 4 v2.1 (1993)(-)(de).st" size="901120" crc="c3ca70cf" md5="98a43a95f618a43c5cac2dca36c7aacd" sha1$
        </game>

     
There seems to be an incorrect character. (multiple times). It is supposed to be an "ü" (German umlaut)



Offline tomse

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 118
  • Amiga ISO
    • Retro Commodore
Re: Encoding error in XML?
« Reply #1 on: October 01, 2015, 10:57:21 AM »
I can confirm this. checked in several applications opened the file as UTF-8 (Linux)

« Last Edit: October 01, 2015, 11:11:58 AM by tomse »
Amiga ISO Maintainer:
Dumping using Pioneer DVR-111D

Offline SpaceTaxiInvader

  • Newbie
  • *
  • Posts: 3
Re: Encoding error in XML?
« Reply #2 on: October 01, 2015, 11:17:16 AM »
Which program are you opening the xml file with? and is it set to read the right encoding?

I used clrmamepro and did a rebuild + scan and got a strange filename.

"nano" does not display the umlaut correctly (using UTF-8 encoding)

Using "hexedit" I see that the bytes of the character are 0xC2 0x81. This looks like UTF-8 encoding (what I would expect for XML). It would result in Unicode code point U+0081, which is a control character. The correct character would be U+00FC

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1303
Re: Encoding error in XML?
« Reply #3 on: October 03, 2015, 05:27:45 PM »
Hummmm strange. When generating dats in XML the proper character is used, right?

Offline SpaceTaxiInvader

  • Newbie
  • *
  • Posts: 3
Re: Encoding error in XML?
« Reply #4 on: October 06, 2015, 05:55:17 PM »
Hummmm strange. When generating dats in XML the proper character is used, right?

I do not know how TOSEC dats are created.

But I tested to create a set with clrmamepro and the Umlauts were correct.

Maybe the ROM's filename itself had a wrong encoding when they were scanned? This can easily happening with a not 100% compliant zip packer or by copying files over network etc.

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1303
Re: Encoding error in XML?
« Reply #5 on: November 01, 2015, 03:58:25 PM »
I suppose it can be that too. AFAIK most dats are created with cmp, mirroring folder contents or edited manually in some text editor.