TOSECdev Forum

TOSEC Project => TOSEC Tools => Topic started by: chyyran on July 21, 2021, 07:43:58 AM

Title: shiratsu-naming, a complete implementation of the TNC (in practice) in Rust.
Post by: chyyran on July 21, 2021, 07:43:58 AM
Now in use for a while on my own personal projects as well as others, I thought I'd post here about shiratsu-naming (https://crates.io/crates/shiratsu-naming), a Rust library that implements the TNC (as well as other naming conventions) as seen in practice. shiratsu-naming not only supports TNCv4, but also legacy names including ZZZ-UNK- and publishers beginning with by. It has been tested on the 2021-02-14 to parse all but a handful of really degenerate names successfully, including ZZZ-UNK-, successfully extracting data such as the title, publisher, date, and everything else.

API docs are here (https://docs.rs/shiratsu-naming/0.1.3/shiratsu_naming/naming/tosec/index.html). shiratsu-naming is a fully lossless and zero-copy parser, keeping track of trivia where needed to reconstruct the input stream as is. Violations of TNCv4 are also indicated as warning trivia (https://docs.rs/shiratsu-naming/0.1.3/shiratsu_naming/naming/tosec/enum.TOSECWarn.html), and can be (on a best-effort basis; see caveats in documentation) turned into a TNCv4 conforming name mechanically (https://docs.rs/shiratsu-naming/0.1.3/shiratsu_naming/naming/tosec/struct.TOSECName.html#method.into_strict).

shiratsu-naming can also parse unambiguous multi-set names (https://docs.rs/shiratsu-naming/0.1.3/shiratsu_naming/naming/tosec/struct.TOSECMultiSetName.html). Ambiguous multi-set names will be eagerly parsed: 'Tom & Jerry & Other' will always be parsed as three separate titles.

shiratsu-naming is licensed under MIT, so anyone can take a look at the parser  (https://github.com/SnowflakePowered/shiratsu/blob/master/shiratsu-naming/src/naming/tosec/parsers.rs) and port it to another language. I am currently working on a C# port called bunkai (https://github.com/SnowflakePowered/bunkai) with slightly different goals (bunkai will not support warning trivia and will not keep the entire input string reconstructable).

I hope this will be useful in developing tools to extract data from and verify TNC names. A few of the top threads were looking for a tool to do exactly that, and while the TNC is deceptively complicated to parse (especially including violations that exist in the dataset), I hope my parser can serve as a complete enough implementation to rely upon programmatically.
Title: Re: shiratsu-naming, a complete implementation of the TNC (in practice) in Rust.
Post by: Cassiel on August 02, 2021, 09:13:28 AM
Very cool, will check it out. Thanks for the update.