Yes i understood the process and i find it really useful. Even more if it you can do it for other file types of other systems too but that would need an extra amount of time to understand the existent image formats and i'm not sure all are well documented.
My last sentence is just that even if the process will dramatically improve dats consistence it wont mean the information itself was correct from the start. It is highly needed but combined with manual verification too of many of these sets in the future.
An example from C64:
With this instead of having "Barmy Bills Flight of Fun (198x)(Publisher)" and "Barmy Bill's Flight of Fun (1984-10-20)(Publisher)" you could fix both and have all the sets with "Barmy Bill's Flight of Fun (1984-10-20)(Publisher)", still that does not mean it is correct and in the end it could be: "Barmy Bill's Flight of Fun, The (1994-10-20)(Publisher)[cr Oracle]"
Anyway i really like the idea, TOSEC is full of title (and other fields) variations in different dats, any step forward improving quality is good, specially on these easily noticeable errors.
TKaos also makes a valid point, one more reason to be careful with it. Still, since you say you're comparing the content of each image file by file it will match only exactly equal software on different formats, so you will catch a lot of inconsistencies with it. I can find a lot of these easily in C64, even in the same dat, these 2 seem a lot similar (at least the setnames):
Commodore C64 - Games - [T64]
Duomato - Wheel of Fortune (1995)(Assassin Software)[cr F4CG]
Duomato Wheel of Fortune (1995)(Assasin)[cr F4CG][tr en]