Author Topic: shiratsu-naming, a complete implementation of the TNC (in practice) in Rust. (Read 1411 times)

chyyran · « **on:** July 21, 2021, 07:43:58 AM »

Now in use for a while on my own personal projects as well as others, I thought I'd post here about shiratsu-naming, a Rust library that implements the TNC (as well as other naming conventions) as seen in practice. shiratsu-naming not only supports TNCv4, but also legacy names including ZZZ-UNK- and publishers beginning with by. It has been tested on the 2021-02-14 to parse all but a handful of really degenerate names successfully, including ZZZ-UNK-, successfully extracting data such as the title, publisher, date, and everything else.

API docs are here. shiratsu-naming is a fully lossless and zero-copy parser, keeping track of trivia where needed to reconstruct the input stream as is. Violations of TNCv4 are also indicated as warning trivia, and can be (on a best-effort basis; see caveats in documentation) turned into a TNCv4 conforming name mechanically.

shiratsu-naming can also parse unambiguous multi-set names. Ambiguous multi-set names will be eagerly parsed: 'Tom & Jerry & Other' will always be parsed as three separate titles.

shiratsu-naming is licensed under MIT, so anyone can take a look at the parser and port it to another language. I am currently working on a C# port called bunkai with slightly different goals (bunkai will not support warning trivia and will not keep the entire input string reconstructable).

I hope this will be useful in developing tools to extract data from and verify TNC names. A few of the top threads were looking for a tool to do exactly that, and while the TNC is deceptively complicated to parse (especially including violations that exist in the dataset), I hope my parser can serve as a complete enough implementation to rely upon programmatically.

Cassiel · « **Reply #1 on:** August 02, 2021, 09:13:28 AM »

Very cool, will check it out. Thanks for the update.

News:

Author Topic: shiratsu-naming, a complete implementation of the TNC (in practice) in Rust. (Read 1411 times)

chyyran

shiratsu-naming, a complete implementation of the TNC (in practice) in Rust.

Cassiel

Re: shiratsu-naming, a complete implementation of the TNC (in practice) in Rust.