TOSECdev Forum

TOSEC Project => TOSEC Naming Convention => Topic started by: peo on July 28, 2018, 01:42:23 PM

Title: TOSEC-like naming convention for documents and books
Post by: peo on July 28, 2018, 01:42:23 PM
I were not able to find any suggestions for naming documents, books, magazines and videos using the TOSEC-style naming convention.

This is partially done in TOSEC-PIX, but not documented for adding information like ISBN and individual author(s).

Is there any updated information on naming the above using the TOSEC-style naming conventions ?
Title: Re: TOSEC-like naming convention for documents and books
Post by: Maddog on July 29, 2018, 01:55:45 PM
No, there's no such standard developed. Our current ideas about renaming docs etc is what you see in PIX.

However TNC is versatile enough that you can use it to add this type of information.
ISBN could easily be added in a [more info] flag at the end of the filename. Something along the lines of Title (Year)(Publisher)[ISBN 12345678] or even Title (Year)(Publisher)[12345678].
Individual authors is a little more difficult if you want to use a different publisher at the same time. Otherwise, Author would probably make sense to occupy the Publisher field, since it's fundamentally similar.
Like Title (Year)(Author).
If you are looking to use both author name and a company as publisher, then it could be done either by abusing the Publisher system as Title (Year)(Publisher - Author) or as another [more info] flag, which probably is the best way IMHO.
Title: Re: TOSEC-like naming convention for documents and books
Post by: peo on July 31, 2018, 10:43:02 PM
Thanks for the reply and the ideas on how to name document files.. I was (almost) going for the abuse-variant of the Publisher field, but agree that it is better to put the authors as "more info" at the end of the file name.

Just thought of another idea, which might be messy or not..
Based on the "Title (Year)(Publisher)", how about adding "more info" fields nested in Publisher, like:
"Title (Year)(Publisher [Author - Author])"
or, as the example in TNC 2015:
Legend of TOSEC, The (1988)(Delphine - U.S. Gold [Smith, R. - White, P.S.])

It's a machine parseable name, and allows both the publisher (company) and the individual author(s) as specified by TNC..
Title: Re: TOSEC-like naming convention for documents and books
Post by: peo on August 25, 2018, 10:09:16 AM
Going forward with my own idea above (using "extra info" for individual authors inside the author/publisher field). Just to mention, not renaming anything current in TOSEC (or actually not TOSEC related, but books and video tutorials for other collected stuff).

How about parenthesis as a part of a title ?
Do the current parsers take care of this in some way ?

Also, what's the advantage (if there is any) of reversing the author's name:
If individual person names need to be used, these should be entered in the format "Surname,
First name" or "Surname, Initials".

Not knowing every surname in every country of the world will make room for possible errors on where to place the comma between surname and first name (comparing to a name given with only spaces as separators).
Title: Re: TOSEC-like naming convention for documents and books
Post by: Maddog on August 30, 2018, 09:29:50 AM
@PandMonium may be able to explain better than me, as he's the wizard behind coding TOSEC tools and other technical stuff. I am a simple renamer peon.  ;D

The way I get it, content of parenthesis matters when parsing a name, you don't only look at the existence of the parenthesis.
The long version as defined by TNC is Title version (demo) (date)(publisher)(system)(video)(country)(language)(copyright status)(development status)(media type)(media label). So if the first parenthesis on any TNC-compliant name doesn't contain the word "demo" or the (mandatory!) date in any of the acceptable forms, then the parser should be able to assume that this specific parenthesis is part of the title. I have no idea how something like that might be coded though, as I said I am not a coder.

For names, it makes sense to use surname first, as for virtually all languages this is more important and official than the name. The end result for TOSEC naming is not terribly different in most cases, even if you go with Name-Surname sequence, I agree. I am not sure how the decision to use the Surname, Name approach first came to be. Probably lost in the mists of time by now...
Of course as with everything, it's not possible to know everything and we make mistakes. Whenever any such is identified, it gets corrected on a later version of the dats.

Luckily, most of the software and other stuff relevant to TOSEC is done by Westerners or Japanese and we are mostly familiar with the rules for these type of names.
For weird cases, we would use Google to try and find more info, eg Wikipedia lists extensive info for Vietnamese:
Thankfully, not many Vietnamese have coded for the Amiga, Megadrive etc, so we have some slack despite our ignorance.  ^-^
Title: Re: TOSEC-like naming convention for documents and books
Post by: PandMonium on August 30, 2018, 03:20:41 PM
@Maddog already gave a good explanation but just to answer very briefly to the two questions:

1) parenthesis, both () and [], have been adopted in TNC as the separations between flags. Due to that, they are not a great idea to use inside these flags (as the flag content). We have a parser to check for TNC errors but it's not strictly enforced atm and TNC actually has some issues that make it hard to set strict rules to extract each field correctly (hint: distinguish between media labels or errors, distinguish if the hack flag content is a descriptor, the name of the hacker or both).
Still, there are original titles containing parenthesis and such sets exist in some sets (e.g., C64 Demos). If i scan the sets with the parser it will report it as an error but they are still in the datfiles. It's something without an easy solution: a) remove them from title and thus compromise the accuracy of the information or b) adapt the parser to those cases and increase its complexity. It's something still in my TO DO list but as said, there's no easy solution.

2) The authors name surname, name usage reminds me of what's common in more scientific literature (or any other) and in English/British traditions. Don't see much advantage or disadvantage in using it. It's just the way it has been designed and I guess it helps sorting persons by their surname.
Title: Re: TOSEC-like naming convention for documents and books
Post by: NLS on August 31, 2018, 07:54:25 AM
Let me pop in just to add a vote to "surname, name" notation!  0:)
I also support "The", "A", "An" going to the end (as people are never sure if a title contains it).

As for () vs [] (and vs {}) I would love if TNC separated their use. Having three different ways to split things is always better than considering them one.

If I was to redesign TNC (and please don't take this the wrong way), I would packetize almost everything in {} (which is VERY rarely used in anything) and have ALL flags mandatory, with 0 for information that is not valid for that file and X for information missing for that file: "File, A {1995,NLS,0,EN,0,0,X,H}.zip"
This would make a parser WAY less error prone.

BTW, I wish someone updated TOSEC tools...

Sorry to pop in the thread like that.

Title: Re: TOSEC-like naming convention for documents and books
Post by: Maddog on September 02, 2018, 07:31:51 PM
No need to be sorry for "popping in", opinions are welcome.

However, IMHO making filenames the way you propose would create a monstrosity, actually making them very hard to read for humans. We need to keep a balance between what a parser needs to work easily and what a human needs to actually use the file easily. TOSEC has already been "accused" for overly long and cryptic filenames, even without making ALL flags mandatory. :)

Would you be able to remember that the 7th X in a filename equals "information about file being a [h] is not available", while the 9th zero in a file means "file is not an alternate [a]". Or anything along these lines? Only a computer would be happy with this, humans just need to code more intelligent parsers and "dance" around the needs...  ;)