Cross Comparison of documents
< Next Topic | Back to topic list | Previous Topic >
Posted by Wojciech
Feb 22, 2020 at 09:26 PM
I am sorry for such an off-topic question, but I could not find the answer by myself :)
I have dozens of documents in doc/docx, pdf, txt files. These are my notes, summaries, quotations, full articles and shorter drafts etc. I want to compare them to find repetitions, borrowings, content I used in full or just in part and so on. Of course, there is a Word feature ‘Compare documents’, but it allows for a comparison of only two documents while I need to compare each file with the rest of them. There is also ‘Plagiarism Checker’ software that has an option ‘Bulk Search’, but it only shows the percentage of duplicate content while I want to know precisely which parts are duplicated and which are not. Do you possibly know the solution to such a problem? Maybe there is an outliner that makes it possible to import all these documents and cross-comparing the content of all of them against each other?
I will be grateful for any help and advice.
Posted by Andy Brice
Feb 22, 2020 at 10:52 PM
My favourite file comparison tool is the excellent ‘Beyond compare’ by Scooter Software. I think it will only let you compare pairs of documents or pairs of folders though.
Posted by Prion
Feb 23, 2020 at 08:46 AM
If you need to get an overview about similar documents (i.e. one against many) the excellent Devonthink is a strong contender. It does not show you which parts of the documents are similar or identical but you get a list of documents ordered by their similarity with the document you are comparing against. It does give you a list of terms that contribute to the similarity although this is only possible after selecting a document in the above list of candidates (i.e. a pairwise comparison).
macOS only I am afraid, there is an iOS companion app but it lacks this “see also” feature.
Posted by satis
Feb 23, 2020 at 09:46 PM
You can compare one document to another with the ‘diff’ tool via the Mac command-line:
https://osxdaily.com/2018/02/06/use-diff-compare-files-command-line-mac/
There are apparently graphic front-ends for it:
https://stackoverflow.com/questions/7871702/is-there-any-graphical-binary-diff-tool-for-mac-os-x
Here are some apps that do diff:
https://www.git-tower.com/blog/diff-tools-mac/
I’ve done diff on text/html documents using BBEdit, which has a free tier; not sure what is or isn’t in the free tier, though it’s a great app. (Currently $10 off, to $39, with coupon SMILEWORTHY)
https://www.barebones.com/products/bbedit/
Posted by Lucas
Feb 23, 2020 at 10:19 PM
A bit of googling turned up this free (Windows) option, which might be worth a try:
https://plagiarism.bloomfieldmedia.com/software/wcopyfind/
“WCopyfind is an open source windows-based program that compares documents and reports similarities in their words and phrases.”
See the instructions page:
https://plagiarism.bloomfieldmedia.com/software/wcopyfind-instructions/
“When the process finishes, a browser window will open, allowing you to examine the pairs of matching files. You can click on the files individually for ease of printing or you can click on the “side-by-side” option to display the pair of file together in adjacent panels of new browser window.
“When you view the files side-by-side, all the matching phrases are actively linked between the two files. If you click on a matching phrase in the left file panel, the corresponding phrase in the right file will move to the top of the right panel, and vice versa.”