I created a short script to list duplicate entries based on SHA1 and CRC just for fun. I cannot post in specified topic so do it here now.
I am attached the output for all Sinclair*.dat files in TOSEC DAT 2020-10-31. The columns are SHA1, CRC, file size, MD5, number of rom entries within the game entry, dat file:rom entry name.
So each duplicates with one rom entry per game entry are very likely a true duplicate. With more than one rom entry you likely have a multipart game entry where duplicates might normal (same GIF or HTML file in multiple game).
I cannot attach file so only text part remains...