Author Topic: New version of TiM  (Read 28475 times)

Offline sp33dy

  • Newbie
  • *
  • Posts: 10
Re: New version of TiM
« Reply #15 on: January 04, 2010, 07:57:04 AM »
Time is the critical issue! I just wish I was a student again!!! All that spare time I used to have, but way before the internet thing. Lots of hacking on the Amiga, but it was just that; playing and wasting valuable energy....

I think the database is the critical item in all of this, after all, this is what you guys are kindly creating (I just don't have the time to do this. Although, I've got approximately 300 amiga disks in the loft to dump to pc. Many backups of demo disks [SAE #1->#100's] and a whole load of others. For some reason I liked to collect these, not sure if any are corrupt now). If the database is designed well, then the rest will fit around it.

I don't see the new versions of releases a problem. In fact, I believe I'd already solved this problem!!! It's one of the driving reasons why I've been considering doing this. My theory/Logic? If I load v0.01 of a dat, the database in my view will have one table that knows a single copy of every rom, other tables know about sets and linkage between the two. When v0.02 dat is loaded (this could be the same set [i.e. Amiga games] or could be by a different renamer [i.e. Good amiga set]). When that set is loaded into the database, it creates a new set entry in the appropriate table and then as it inserts the roms/disks, it checks unqiueness for CRC32/MD5/SHA/Size and either adds new version or makes an entry in a history/alternative version list. In this way, it will be easy to maintain different sets with cross overs, history (i.e. rollback sets) and other issues. For me, this is very very important. I may not have explained well enough and I may have to draw a few diagrams to explain. Howevver, I really think it's a simple problem to resolve, but would make all our lives easy enough.

Your description below of how/what you are doing to search the database is exactly what I would like to do. I'm soooo geeky with games that I'd love to find the author who has written the most games etc. However, I'd also love to have ratings from both magazines, popular websites and also open up to the community to rank and rate.

Touching on community, you've had the same thoughts as I ! This tool in my opinion should be two fold. A local tool that performs the rom renaming etc. It also needs to be a server version to be online. Professional Websites are my daily job and hence why I know Java insite and out (just the technology the company uses). This is why I started the local tool in java with an open source database and have had a play with some JSP to also make it available via a server. It works very well. This reinforces one other point. The database technology is irrelevant! It doesn't matter if I use HSQLDB, MySQL, DB2, Oracle or any other database language, the important point is the database schema. It'd be very easy to allow the user to select the database of preference. Building in one free database (HSQLDB in my view) makes sense as an out of the box, bundled version ready to go makes life easy for those that aren't bothered by what they prefer/have license for.

The more I type, the more I want to make this work. Although, I've got a local school's website and a membership system for snooker club to write in my free time before this. I've also got real life and a lot of house DIY to complete... I've started planning to see what I could do over the next 6 months..

Regards

Sp33dy


Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: New version of TiM
« Reply #16 on: January 04, 2010, 07:38:36 PM »
(don't have much time to detail some parts a lot atm :P)

I think you already understood that time is the key and since this is a hobby for all of us it is hard to start something very complex, it has to start with small steps (at least this is my view and what i've done). The next huge problem is how TOSEC and other collections work out, you will have to pick info from dats that may change and have mistakes in most of the cases, you can't control most of the rules and change them when they are bad and a small change somewhere may end up forcing a schema change or something that takes time.

The most important part is planing it right but in my opinion the deep using on it and complexity depends on the team developing it, in this project i did some planing but nothing huge or always documented because it is just me and i would end up wasting all my time planing something and doing zero, so i started with a big idea but some aspects not well defined, was just like a big test that was getting more and more updated adding some parts to it, now i've come to a point that shows me that i should just rewrite this core properly so i can use it as a base for something and not keeping rewriting and duplicating code in an adhoc development :P

Now, a bit more about your last post, i'm not sure i understood it all (my english isn't great):
Storing releases information (and so datfiles history) is a great idea that i haven't ended yet and don't have any idea if i will ever have time to do so (i grabbed some older packs, have some olders tncs, tugids, etc but for now just have small stats about these packs and not full information).
The main problems with that idea that i've recall and can remember just now go from the huge size it would take in the DB, to the millions of errors you find in old dats and also information that is impossible to parse.

I mean, TNC changed a lot and the older sets have flags not recognized now, information wouldn't be parsed there until a parser for that was done (and there is no great documentation about the older rules + they weren't always followed).
Next, if by any mean we happened to extract all information from flags you would end up with a TON of invalid values, from invalid dates that are easy to check to completely invalid groups of information like: inexistent publishers that are just garbage and have been fixed, descriptors, countries and language codes that are invalid and don't even exist, scene groups, persons with name not written accordingly "(Last Name, Name)" and so on, resuming a ton of errors. In a db that covers all details of a set, you will need to have this kind of information, having older sets details too means all typos and errors that got once in one of these flags needed to be added, having a list of scene groups where more than half would need to be tagged as invalid/errors.

In my view that is a little too much, too much work, too much information (half unneeded) and so my planned approach would be having that many details only for the current/actual datfiles, older versions of them should only have datfile details and stats, eventually also the existent setnames and roms but not covering flag details for each of the older sets.

After that you have to take in consideration that datfile names change in time, to fix company or system names that were wrong, changes that need to be manually tracked (you would need to tell that COMPANY System128 - Dat (2000-00-00) was an older version of "Company System 128 - Dat (2001-00-00) where the datfile was renamed, or correctly marking category changes, datfiles that were merged or split because they are not always (and we can see that along the time) a direct update where only version changed. Identifiying this automatically is near impossible or will be dangerous, causing errors.

Datfile updates like you described would be easy but would also generate an huge db with tons of duplicated sets, we have near 350.000 setfiles now, each one with AT LEAST one image (software image, not picture), one title, year, publisher, crc, md5/sha1, filesize, filename, extension and tons of extra info, having it all that info for the last release already takes tons of table entries and MBs, thinking about dozens of releases where all this info would be immense.

IMO this will lead to an idea of storing datfiles and setfile names, with setdetails only for the latest dats (or sets that didn't change much). To avoid such duplication, identify changes would be needed, so adding 2 versions of the Amiga Games ADF dat would not add 2x 20000 sets but 20000 sets first and then 1000 new ones + 500 renamed ones and so on (+ rom changes + bla bla). Even that is complex already for the available time of most, dealing with datfile renames and separation in more than one dat, set moves between dats and so on is also complicated.
It is not that hard to figure out setfile renames or moves on single rom sets just by comparing hashes+size, but in multi rom sets things get harder and harder cause of shared files between sets (crackintros, readmes, something else), redumps of some files and so on that make it impossible to always know if sets are related or not.

Adding even more stuff community related will just make it even more complex, you can have those goals but you need to define what is the basic, important part and what should be done later if still a good idea.

My view is that, currently the important part for me and the project is a way to easily browse and relate information existent (systems, companies, publishers, groups, etc, etc) in the latest existent datfiles so they can be checked and renamers can fix the identified errors, if possible adding a bit of information on older releases (+ thinking of a WIP system easier for renamers one day), this is what i've done lately (when i had more free time, currently it is in need of urgent rewrites so i don't waste so much time by repeating stuff + make it securer).

I will not talk about any technical aspects of tools or so, just answered SQLite for an app because it seemed good to use with options like Qt & c++, that is not relevant for now anyway.

...and note that i didn't even talked about the problems with using datfile values, for example with publishers you just have a string there (name), there are a lot of publishers that may have shared same name, this is really bad when happening in the same system (duplicated person names, sceners, and so on), also adding details for setfiles is complicated when you later will end loading newer setfiles.


That's it, hope you can get something out of this pile of text, also if you like i can show you what we've got now, just pm me.

Offline Diaboł

  • TOSEC Member
  • Full Member
  • ***
  • Posts: 204
Re: New version of TiM
« Reply #17 on: January 06, 2010, 06:37:34 PM »
I really like the idea of having a real TOSEC db. I read the discussion and it just came to my mind that using names for DATs and files on... say a low level of the whole structure (?) can be a cause of many troubles. It would be easer to operate on some kind of ID strings. A name of a file is a result of a few values (flags) and could be easy created by getting all informations from DB. Final name of a file should be the last step of the whole process. It can change from time to time but if there is an unique ID you will always know what file you working on. I guess it could be similar with the DAT names. Changing a DAT name is sometime confusing for people who try to rebuild / update their sets so having an unique ID along with the names sounds good to me. Actually we already have sort of ID for files: CRC/MD5.

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: New version of TiM
« Reply #18 on: January 06, 2010, 07:16:36 PM »
Hi Diabol,

Yes, the idea is not new and has been discussed a few times, actually for the setnames it already could be done in TIMs renamer tool, where info could be entered in each field (even if it sucked anyway) and in the end the setfile would be generated.
The same idea also happens (afaik) in iso part, the toseciso guys add info like that, inserting new dumps and filling the information (it doesn't handle all flags but the only need a few of them anyway), a (+-) similar system is used in some other projects systems like dat-o-matic for example i think.

The idea is indeed great and would help in a lot of aspects (that is the main reason i defended it long ago), the main problem is for renamers since it will give them a LOT more work (or a complex / powerful solution would need to be found), unlike iso, where sets are being dumped and added from start, with disks and old stuff we have already a gazillion files, the main work left is fixing all the errors and not adding new stuff, the process of fixing it on a db and relating the correct file/set with the right set in a db, plus adding new sets, with details as size, hashs and all those flags is way more time expensive than just editing setnames in files or in the dat itself and later check the dats again and again (what i +- have to do now in our tool for validation purposes).

Finally, the datfile ids for dat renames could be useful, indeed but cmp doesn't work like that and so just warning that dat X = day Y now is enough, it wasn't done this time because changes were too much to know. Also a great part of dat changes (most i guess) are not simple renames, but separations or other, so it would generate new ids, like dat id 1 -> id 2 and 3 and so on.

Thanks for the suggestion anyway, i also like the idea in some points :)

Offline sp33dy

  • Newbie
  • *
  • Posts: 10
Re: New version of TiM
« Reply #19 on: January 10, 2010, 04:43:03 PM »
As a quick update (seeing you responded to my other thread offer). Although I've not got any real serious time. I've had a dig around some of the free database designers. This one took my eye:

http://www.dbschema.com/

I like it as it's java based, free (to a given restriction) and works very well. It's also compatible with many different databases, including HSQLDB ! I got it hooked up over the weekend and am currently creating a few schemas whilst modifying some of my java code to load it. It's extremely good at showing the relationships of the tables and the data within!! So you can actually see the table foreign keys, blah blah blah.

Just thought i'd let you know. You never know, I might create a 1st pre alpha schema for a pre pre pre alpha tool for general loading of dat and doing renaming (zip and possibly 7z level only)... At least this would save me having to load each dat profile into clrmame and then doing a scan to see what I do/don't have..

Incidentally, what's the process for reporting having lots of roms that a set doesn't recognise? I know from many years ago I pulled down lots of roms that seemed interesting but didn't have time to deal with. CPC stuff springs to mind. Know idea if the stuff i have is of any interest to anyone. Just wondering whether there is a process to post it (say newsgroups) or a way to report the files. This probably isn't the right part of the forum to ask... so I'll accept a flame here...

Regards

sp33dy

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: New version of TiM
« Reply #20 on: January 10, 2010, 06:04:15 PM »
Hi,
don't know that tool but i might take a look since i was looking for some alternatives too. Until now all my small tools and tests i've done were using PowerDesigner, guess having a free (and equally good alternative would be cool. Hope i manage to have time and will one of these days to test it out, maybe in the next month :/

About the 2nd question, i guess i didn't understood it very well, if they are unidentified files you just need to list those files, just have them in a sepparated folder and get a list of it, you can also try to scan them using other dats so you might discover what you have there (see unrenamed / garbage dats over romshepherd.com), if they are of interest i'm sure you can put them somewhere to wait until someone can check them :P

Offline sp33dy

  • Newbie
  • *
  • Posts: 10
Re: New version of TiM
« Reply #21 on: January 11, 2010, 01:35:11 PM »
The more I play with dbschema, the more I like it.

I agree with some of your previous points. Ultimately, I think the database tooling should be client/server based. The tool in my view (being a java dev) would be easy to create in a JSP application that could be deployed to any given J2EE container (i.e. Tomcat, WASCE, etc, etc). I've always assumed the tool should be a standalone app (i.e. Java Swing GUI based or of such design). The problem with this is the time it takes to do all the GUI elements. My background is web based solutions, so this fits in better with where I am.

I'm currently looking (time allowing) at making the renaming of roms with the current sets priority 1 as I want to point a directory of lots of different images to a database of dats.

If/When I get this up and running. I'm then going to look at the naming tool. I also figure that using a client/server based solution would lend itself better to the TOSEC site. I.E. it could be hosted, in order for a collaborative means of renaming/submission....

Again, wild thoughts and where I am today.

Regards

Sp33dy

P.S. What is the problem with the TNC issues? I remember coding a TNC interpreter of the files once (unfortunately lost code due to a hard drive crash). It was very simple to do and for the life of me, can't see what problems there might be.. Except if flags are reused for meaning. I.E. if B-Flag for bad was changed to B-Flag for bootleg (lame example). This is the only problem I see.
« Last Edit: January 11, 2010, 01:41:08 PM by sp33dy »

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: New version of TiM
« Reply #22 on: January 11, 2010, 03:23:43 PM »
Hi there,

In my opinion you're having ideas that are way more than the needs of the project currently and the time we have to do such a thing :P
Really, my goals currently are to gather time to do some small apps to help me/renamers or work on some sort of db that few can want to use to manage some info, i had way more free time before and still was short of time for it all, thats why i think you may end up planning something you will not have time to finish, using some heavy tech when we probably don't need such an enterprise solution, someday we will be talking about J2EE, SOA, BPEL, Hibernate, Oracle and so on ;D And also running such technologies usually costs more and needs more power (tomcat, glass fish, dedicated servers etc.).

Even if the idea is something small, we should have attention to our time and needs before trying to test some technologies and talk about servers, clients and integrations, and what the project really needs and people are willing to use, the last "too big" idea was TIM :P

For now i guess i will keep trying to find a bit of free time between days in order to advance with the small apps / ideas i have, i really don't have a lot more time unfortunately and so the db work i've planned is halted a bit, anyway feel free to have these ideas, i will always give you my opinion, answer possible questions and give you my support when i can.

About TNC:
The issues are in the last page of the pdf i think, not sure if they are online. It is known that our naming conventions way far from "very simple" and still we have tried to improve it. The issues are related with the complexity and rules of all flags put together, there isn't anything wrong, just that it isn't anything robust too (or even close to that).

Some examples from what i recall:
Will not talk of existent errors from wrong use of tnc before (eg.: dates where someone decided to use YYYY-YY to symbolize something i don't know, ending with 1989-91, but in cases like 1998-01 we will never now what they are now without checking again) because those aren't even tnc errors, just lack of checking before.

Version flag is too weak, just expecting " vSOMETHING" isn't quite great, forcing a number after that will left out " vA.0" or some strange cases, revisions aren't parsed at all and are part of title, no rule for that but i don't like the idea of adding more and more rules to something that is already complex and in some cases broken.

Most of the flags have some fixed values considered "valid", media label is the last ( ) flag and is also a flag that can have any value - the label present in the media doh :P, this means that any wrong flag that appears before this is catched as a valid media label (imagine "Diak 1 of 2", a valid media label) and so errors are not easy to automatically check without analyzing all parsed media labels...

Dump flag rules aren't precise too, cases of fix and modification flags can have 2 types of values mixed together (they have descriptors + groups): [f PAL] = you will never know if PAL was some scene group or the fix descriptor. Currently we use fixed lists of allowed descriptors that were gathered from existent values in the db, anyway it still is broken because new values need to be identified and inserted + there could be cases where values aren't a descriptor but will be identified as one.

These are the ones i recall and the worst ones when trying to parse stuff, i'm not saying it isn't possible to parse it all, it IS indeed possible and not that hard (wouldn't say very easy too :P), but it will also be a 'Okay' parser, not something really robust that will get all flags separated correctly with 100% precision always, just what happens now :)

Offline sp33dy

  • Newbie
  • *
  • Posts: 10
Re: New version of TiM
« Reply #23 on: February 02, 2010, 08:12:08 AM »
:(

Just to say I'm still lurking, but life really has gotten stupid again. I can see the end of the tunnel, but nothing moving here. I wish I was a millionaire!!! In fact, not quite the same, but I'm looking at getting out of the IT industry and into teaching. That would give me more free time for doing other things!

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: New version of TiM
« Reply #24 on: February 02, 2010, 03:58:39 PM »
Hey, i've been really busy too for the last weeks, finally have some free days now but will use most to do other stuff, real life comes first :P
As for teaching, at least here in Portugal it is really hard to get in and grab a good job/position unfortunately but good luck with your goals :)

Offline TKaos

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 539
Re: New version of TiM
« Reply #25 on: February 02, 2010, 10:55:22 PM »
It seems like we are all a little bit busy atm, I didn't rename a single file since 2 weeks now but should change soon. :)

Offline Kodoichi

  • Full Member
  • ***
  • Posts: 162
Re: New version of TiM
« Reply #26 on: February 05, 2010, 10:18:18 AM »
Bit off-topic. I barfed out this mock-up for a very simplistic renamer:

http://i50.tinypic.com/214tdf6.png

I tried coding it in Just Basic and Rebol, but gave up after a few minutes as I'm too stupid for these coding languages. If anybody likes my mock-up, feel free to code it :)

Offline PandMonium

  • Administrator
  • Hero Member
  • *****
  • Posts: 1332
Re: New version of TiM
« Reply #27 on: February 05, 2010, 05:59:26 PM »
Thanks for your suggestion :)

Tools to aid renaming of files and such are something debatable because most of the renamers tend to dislike it and prefer to rename manually, some not and if there was something at the same time really powerful and helpful but yet really simple to use that wouldn't add a few more steps to rename one file i guess most would think about it.

Anyway i've done something that can highlight / verify set names according to our tnc, it is not specific to rename sets but is useful sometimes to avoid obvious errors (currently not content errors :P):
http://www.tosecdev.org/media/tde05.png
http://www.tosecdev.org/media/tncchecker_20091113.png

It has a different purpose than what you proposed really, i like the idea of having a tool to help renaming, at least to aid new renamers that usually have questions about flag options and order. Unfortunately i (we) am really busy currently with rl projects as well as a ton of different tosec ideas and other projects :(

Offline sp33dy

  • Newbie
  • *
  • Posts: 10
Re: New version of TiM
« Reply #28 on: February 11, 2010, 08:27:27 PM »
Keep these coming. I've nearly finished the private website for local school and the java app for snooker club. Once these are done, this is going to receive some more of my attention. The TNC checker is exactly what i'd expect in the tool.

Offline sp33dy

  • Newbie
  • *
  • Posts: 10
Re: New version of TiM
« Reply #29 on: March 22, 2010, 10:55:02 AM »
I'm still around, but overloaded with RL.. However, someone on Pleasuredome has put a few screenshots of a tool he's writing.

http://forum.pleasuredome.org.uk/index.php?showtopic=17382&st=0

Not sure if you have a logon here (if you don't and need an invite, let me know).. Certainly seems interesting. The guy is looking for some web hosting help. I recommended he posts details here, incase anyone has the time and energy to help..

Regards

Sp33dy