Author Topic: shiratsu-naming, a complete implementation of the TNC (in practice) in Rust.  (Read 261 times)

Offline chyyran

  • Newbie
  • *
  • Posts: 1
    • Email
Now in use for a while on my own personal projects as well as others, I thought I'd post here about shiratsu-naming, a Rust library that implements the TNC (as well as other naming conventions) as seen in practice. shiratsu-naming not only supports TNCv4, but also legacy names including ZZZ-UNK- and publishers beginning with by. It has been tested on the 2021-02-14 to parse all but a handful of really degenerate names successfully, including ZZZ-UNK-, successfully extracting data such as the title, publisher, date, and everything else.

API docs are here. shiratsu-naming is a fully lossless and zero-copy parser, keeping track of trivia where needed to reconstruct the input stream as is. Violations of TNCv4 are also indicated as warning trivia, and can be (on a best-effort basis; see caveats in documentation) turned into a TNCv4 conforming name mechanically.

shiratsu-naming can also parse unambiguous multi-set names. Ambiguous multi-set names will be eagerly parsed: 'Tom & Jerry & Other' will always be parsed as three separate titles.

shiratsu-naming is licensed under MIT, so anyone can take a look at the parser and port it to another language. I am currently working on a C# port called bunkai with slightly different goals (bunkai will not support warning trivia and will not keep the entire input string reconstructable).

I hope this will be useful in developing tools to extract data from and verify TNC names. A few of the top threads were looking for a tool to do exactly that, and while the TNC is deceptively complicated to parse (especially including violations that exist in the dataset), I hope my parser can serve as a complete enough implementation to rely upon programmatically.
« Last Edit: July 21, 2021, 07:46:53 AM by chyyran »



Offline Cassiel

  • Administrator
  • Hero Member
  • *****
  • Posts: 1561
    • Email
Very cool, will check it out. Thanks for the update.