Now in use for a while on my own personal projects as well as others, I thought I'd post here about
shiratsu-naming, a Rust library that implements the TNC (as well as other naming conventions) as seen in practice. shiratsu-naming not only supports TNCv4, but also legacy names including ZZZ-UNK- and publishers beginning with by. It has been tested on the 2021-02-14 to parse all but a handful of really degenerate names successfully, including ZZZ-UNK-, successfully extracting data such as the title, publisher, date, and everything else.
API docs are here. shiratsu-naming is a fully lossless and zero-copy parser, keeping track of trivia where needed to reconstruct the input stream as is. Violations of TNCv4 are also
indicated as warning trivia, and can be (on a best-effort basis; see caveats in documentation) turned
into a TNCv4 conforming name mechanically.
shiratsu-naming can also parse
unambiguous multi-set names. Ambiguous multi-set names will be eagerly parsed: 'Tom & Jerry & Other' will always be parsed as three separate titles.
shiratsu-naming is licensed under MIT, so anyone can take
a look at the parser and port it to another language. I am currently working on a C# port called
bunkai with slightly different goals (bunkai will not support warning trivia and will not keep the entire input string reconstructable).
I hope this will be useful in developing tools to extract data from and verify TNC names. A few of the top threads were looking for a tool to do exactly that, and while the TNC is deceptively complicated to parse (especially including violations that exist in the dataset), I hope my parser can serve as a complete enough implementation to rely upon programmatically.