Indeed you should PM Cassiel or just wait for him to show up and post here too.
I'm not sure if he has any WIP dats, probably you can start with the latest public ones. Using cmp just to create a datfile is not that hard really, have the roms all in the folder, point cmp dir2dat to it, fill the fields and it's done. Still you can do it that way too, editing the dats and adding game/rom entries manually (carefully
).
As for NES headers (and others), i think Cassiel used only clean header files but wait for his answer (you can then clean them with ucon64).
If two roms exist with different hashes it means they are not the same so indeed we tend to catalog them both, figuring out what is the difference (is it an hacked version? a different version of the same software? different language? a bad dump or other kind of (un)intentional modification?).
Generally, the plan is to get a copy of those files [in the right format, right headers and other details Cassiel will share
] in a folder and scan them with tosec dats, leaving only the unknown (cmp has an option to remove matched source files, romvault does this automatically (i think) if you put them in tosort folder. Then you start with those left, checking what they are and where they belong, renaming them accordingly to TOSEC and then: a) put them in the right folder and generating the dats in the end; or b) editing the right datfile and editing / adding the respective entry with correct size, md5, crc and sha1.
Generating the dats automatically might save you time and trouble with manually adding hashes and time by just dropping the new sets in the right folder. Still, real renamers might have better tips