TOSECdev Forum

TOSEC Project => TOSEC Tools => Topic started by: Cassiel on May 03, 2010, 07:23:13 PM

Title: TOSEC Dat Explorer
Post by: Cassiel on May 03, 2010, 07:23:13 PM
Think I might have spotted a potential bug.

When trying to batch extract all [more info] flags from a folder of DATs, nothing is output despite there definitely being some [more info] flags in the DATs.
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on May 03, 2010, 08:08:15 PM
You are right indeed, missed something there :P
Anyway it is fixed now, you can get it in the same place.
Title: Re: TOSEC Dat Explorer
Post by: Cassiel on May 03, 2010, 08:43:02 PM
Much appreciated....
Title: Re: TOSEC Dat Explorer
Post by: Cassiel on May 03, 2010, 08:52:57 PM
Just tested, works great...

(and there's no [more info] issues in my DATs... woohoo!)
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on May 18, 2010, 05:16:31 PM
I have a little request. The Explorer works good but it doesn't display my nickname correctly :D Can that be fixed?
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on May 18, 2010, 06:39:09 PM
Hi there,
it is possible/works indeed, but it is just extracting the name from the datfile, if you open your datfiles in a text editor you will see (at least for me) "Diabo³".
I just edited the file and saved it with "Diaboł" in UTF-8 format instead of ANSI/ASCII. This one obviously displays correctly in the tool, it is something (format) related with dats and their creation and not me this time ;D
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on May 18, 2010, 07:08:47 PM
Looks like UTF-8 support in clrmamepro is enabled only for DAT files created in the XML format. When I open any of my DATs in a text editor (just checked notepad and notepad++ in Windows7 EN) I can see the nickname displayed properly though... Not a big problem anyway... Actually not at all :D
Title: Re: TOSEC Dat Explorer
Post by: Cassiel on May 18, 2010, 08:57:40 PM
The original ClrMamePro 'DAT' format is old and depreciated, that's why XML has been the default for some time now.

IMO we should really be using modern XML based DATs anyway, but I know others find them more awkward to manually edit (though I have made the point before that there's a number of very good XML editors out there  ;) ).
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on May 18, 2010, 09:48:21 PM
What would be the advantage over the old format? Lets skip the UTF for now since it can be easily implemented to the old format if Roman want to do that.
Title: Re: TOSEC Dat Explorer
Post by: Cassiel on May 25, 2010, 11:10:36 AM
He won't - the old DAT format is unsupported/depreciated.

All my point is (and has been) is that the old DAT format is exactly that, old (and unsupported).

It’s not a massive thing, but if we have the two options (old/depreciated format or modern/current format) and it’s exactly the same amount of ‘effort’ needed to create/host either, then why not go with the modern?

- More flexible/extendable since XML based
- Who knows when Roman will decide to dump old format DAT creation in CMP
- Fully Unicode compatible
- Gives more flexibility for future, especially changes in project direction/expansion
- and, at the end of the day: why use an outdated format when a modern standards based one is already available and in use?

Like I say, it not a massive ‘issue’, it’s just about being a little forward thinking... and being forward thinking always good (even for a project obsessed with the past).
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on May 25, 2010, 06:19:13 PM
I guess I agree with Cassiel about the reasons for using new format for our DAT files. Anyway I think I found a bug in the explorer. After I load and check a DAT I want to highlight all sets with an unknown publishers but for some reasons all sets gets yellow background and the statistics says that all sets are wrong. Obviously right after check all of them were OK. It's not very strange now that if I click Settings-->Show Sets-->Wrong Only all of my sets are highlighted.

One more thing, the app doesn't remember last used path.
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on May 25, 2010, 06:36:34 PM
Well the unknown publishers and scenegroups and so on shouldn't work very well (or at all). They idea is/was to use the xml lists in the lists\ path, you may try to put them there, if none there won't be any values present when you go to view -> something list.
Unfortunately i'm yet to finish that and i don't even remember how i left it so i can't tell you what will happen using the lists :P

The path part is true and is annoying.
Title: Re: TOSEC Dat Explorer
Post by: Symmo on January 07, 2011, 07:30:42 AM
Hi
For anyone with linux no need to use wine to run it .
I installed mono (.net runner) from my package manager and from a cli type mono Dat Explorer.exe from were u installed.
Or from desktop add to open with so u can select mono on .net exes or to open with .dat.
Works fast and stable (under mono), a must for checking names in your when u are starting.
cya
 
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on January 07, 2011, 03:17:21 PM
I guess everyone should take a look on it from time to time, not only starting.
Still, i would not call it a 100% and stable app since it started as just a test :P
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on January 18, 2011, 04:04:08 PM
This is going to be about UTF once again  :P I have checked one of my XML DAT and I see all the names being displayed correctly so it looks like TDE can read UTF DATs fine. The problem is in the "DatFile Details" section where UTF is not displayed correctly (see attachment).

It would be nice if you could implement some sort of automatic DAT type recognition so we won't need to choose between (old) and (xml) while exploring menu. That would also allow to do a batch job on a folder with both types of DATs.
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on January 18, 2011, 04:41:26 PM
Although the tool has several problems, this time it is not my fault :P

The problem is in the dat itself, there is actually a ³ rather than a ł.
If you open in in a text editor you will see:
Code: [Select]
<author>Diabo&#179;</author>
rather than
<author>Diabo&#322;</author>

#179 represents the entity ³, it should be a #322. (see here (https://secure.wikimedia.org/wikipedia/en/wiki/List_of_XML_and_HTML_character_entity_references) and here (http://www.texaswebdevelopers.com/examples/xmlentities/xml_entities.asp)).

Just try writing the author tags here without the [ code ] tags and you will see what i mean.

Title: Re: TOSEC Dat Explorer
Post by: Cassiel on January 18, 2011, 05:02:12 PM
I really wouldn’t bother looking at this…

IMO (which I’m sure will come as no surprise) we should move over to using XML exclusively in future releases anyway… this issue (and the others) will simply fall away. PandMonium can simply amend the TDE to parse XML DATs instead (which will displayed all these characters correctly…)

(well providing he still willing of course! sounds like I’m dumping a major re-development project in his lap   :)  )
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on January 18, 2011, 07:33:16 PM
I guess you didn't notice but we are indeed talking about XML dats :P
The tool already opens both [although you have to say which type they are, which sucks since this could be done automatically]. Diabol was reporting a bug, when opening his XML dats, the name still appear as "Diabo³" instead of "Diaboł" on the interface, similar to deprecated format dats.

In this case the problem is in the dat itself, it has a ³ instead of ł there. This raises new questions, Diaboł must have created it with cmp, inserting Diaboł in the author field. Why the ³ then? Is this a bug in cmp? Without an explanation for this i have doubts xml dats is the answer to texas problem :P
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on January 19, 2011, 07:58:16 AM
I did ask Roman about that "feature". Lets hope it's just a simple mistake. The funny thing is that if you load the DAT into cmp and click "Show info" you will see ł displayed correctly.
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on January 19, 2011, 11:16:21 AM
http://www.emulab.it/forum/index.php?topic=441.0
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on January 19, 2011, 04:05:39 PM
Nice, he seems helpful so lets see if the problem can be sorted, whatever it is. Still, it would be good to have an option to support utf-8 by default.

This take us yet again to the non low ascii chars issue, something we need to discuss properly. Using it in roms may be needed to preserve some  weird pieces of software, as for setnames it may/will cause several unneeded problems. :P
Title: Re: TOSEC Dat Explorer
Post by: Diaboł on January 22, 2011, 01:17:13 PM
I guess we can skip the UTF idea for now. Looks like there are no tools that can handle it properly. It makes the whole TI situation a bit complicated. Maybe I will have to skip all the sets containing files with characters different than "low ascii". Very annoying.
Title: Re: TOSEC Dat Explorer
Post by: Cassiel on January 26, 2011, 05:39:34 PM
This character encoding is all becoming very frustrating…..  >:(

I think we need to do some basic testing ourselves, because what Roman is saying contradicts what he said to me before. And when I did some rudimentary testing re TI-Nspire I didn’t notice any of these issues.

From what I understand or have observed:
- When creating either a legacy DAT or an XML DAT (will refer to these as ‘DAT’ and ‘XMLDAT’ from here for clarity), CMP does not automatically declare the XMLDAT as UTF-8 (“Unicode”). This is to be expected. XML doesn’t automatically mean UTF-8, and you can ‘declare’ an XML document to be encoded as pretty much anything. This is what I originally mentioned here: http://www.tosecdev.org/index.php/forum/index.php?topic=191.msg2283#msg2283. This flexibility is kind of the point, and this encoding declaration is something you will see in XML based documents (like on the web, docx, ini’s etc). I do agree that having a simple toggle option in CMP DAT2DIR module would be useful however, but hey-ho….
- When creating an XMLDAT thru CMP, the “name”/”rom” with High-ACSII characters are created correct (from when I have tried, and Diaboł has tried - subject to the XMLDAT having the correct header).
- Diaboł is saying that the header is not created correct (with High-ASCII authors)? I don’t know about this since I didn’t realise anyone even used this part of the DAT2DIR. I always leave blank then open the new DAT in a text editor (Notepad++) and copy the existing header into the new, simply increasing the date counter.

When I get a chance I’ll create a test XMLDAT and test group of Low/High-ASCII files I think. Would be very useful if everyone tests it and gives feedback. Since pretty much everyone here (the ‘regulars’ I mean) are from different parts of the globe, this should be a very effect/broad test!

I know this all may be a bit painful/learning experience in the short term, but long term I really think it will benefit us.

This is all far from impossible of course…. MAME as been using XML for years and the latest MESS Software XMLDATs even include roms with Japansese Kanji!

We can get this licked too (for our uses)…..
Title: Re: TOSEC Dat Explorer
Post by: Tim2460 on January 26, 2011, 07:01:04 PM
When I get a chance I’ll create a test XMLDAT and test group of Low/High-ASCII files I think. Would be very useful if everyone tests it and gives feedback. Since pretty much everyone here (the ‘regulars’ I mean) are from different parts of the globe, this should be a very effect/broad test!

Count me in if you need some testers !
Title: Re: TOSEC Dat Explorer
Post by: PandMonium on January 27, 2011, 03:29:11 AM
Indeed that needs to be checked as i already said.
There are 2 distinct parts to address:
1) to create dats with non low ascii chars with the existent tools (in sets and romnames)
2) test the dats to see how they work and problems with them and OSes. You already know my point and setnames with non low ascii chars, in romnames it may be necessary for some weird rare old stuff unfortunately :P

As for the mame / mess xml part, AFAIK their lists are created manually (in case of mame) or with other tools in mess and they use kanji and others in several fields (description for example) but not in filename obviously, sets and romnames are always low ascii and not so long they enforced the 8.3" (not relaxed) for MAME.

We all already know that it is possible to use (almost) any set of characters by using UTF(-8), for example what happens in this post:
 浅き ТУФ זה כיף סתם ל υγμία ζω tę łódź Áḋaiṁ пошкодить कोई पीडा नहीं होती أنا قادر ม่ทำให้ฉันเจ็บ 않아요 傷身體 ້ຍເຈັບ איך קען میں کانچ کھا வராது. ি হয় না। नाही. ನ್ನಡವೇ ನಿತ್ಯ माम् აველი

So, it is not hard for me to manually create a XML with that kind of information. The problem here is trying to use files in any OS, read them to create a datfile and later use that datfile to rename it somewhere else. Something that seems to be tricky / hard / impossible(?) to guarantee (100%) and i really don't feel comfortable with, unfortunately it may be needed (in romfiles, not sets!) for some weird cases.