Cross Comparison of documents
< Next Topic | Back to topic list | Previous Topic >
Posted by xtabber
Feb 27, 2020 at 12:22 AM
I don’t think file comparison software is likely to be very helpful for your purpose.
If you can define the text patterns that you are looking for, even approximately, a good full text search program will allow you to search for them across multiple files and return both the number of hits and the files where the patterns occur.
File Locator Pro and dtSearch will both do that, but both require Windows. There may be Mac solutions available too, but I wouldn’t know about them.
If you want to analyze files to isolate text patterns that you might then want to search for, you should be looking at text mining software. There are many text mining systems targeted to the marketing research and legal professions, but they tend to be both very pricy and complex.
Posted by MadaboutDana
Feb 27, 2020 at 10:40 AM
This is a really interesting issue, I must say.
Translation software uses the “concordance” feature to identify matching phrases in multiple documents, but it wouldn’t really work for the use case you’re describing.
The trouble is, most of the free/cheap concordance software you’ll find online only shows occurrences of individual words or predefined phrases; after that, as far as I can tell, you’d have to tackle the Very Expensive stuff lawyers love to use.
However, this might be a useful starting point. There’s an interesting list of free (Windows) software here, for example: https://listoffreeware.com/best-free-concordance-software-windows/
I’m not sure how many of them work across multiple documents, however. That’s where the Very Expensive analytics software comes in, I fear.
Another option would be to talk to the nice gentleman who developed, but then shelved, a Windows concordance app that was clearly very popular in the day: http://www.concordancesoftware.co.uk
There are a few open-source concordance programs about, it appears, but I have no idea how good they are. The Wikipedia article on “Concordancers” might provide you with some interesting starting points? https://en.wikipedia.org/wiki/Concordancer
Actually, I’ve just found an interesting page on all kinds of concordancers: http://martinweisser.org/corpora_site/concordancers.html
Might be worth a look!
Cheers,
Bill
Posted by MadaboutDana
Feb 27, 2020 at 10:48 AM
Alternatively, as xtabber suggests, you could use a good search engine.
My favourite on macOS is FoxTrot Pro, and I use it for trawling through my own corpus of research documents; you can customise indices to a considerable degree if you wish - and it’s not ridiculously expensive. dtSearch is an amazing tool, but only available on Windows (or as a web server).
Cheers!
Bill
Posted by MadaboutDana
Feb 27, 2020 at 11:10 AM
Now this looks more like the kind of thing you need (Windows only, unfortunately): https://tshwanedje.com/corpus/
There’s also the respected WordSmith (again, Windows only), here:
And an interesting open-source tool that looks pretty serious: http://www.ddc-concordance.org (available for Windows and Linux).
This might actually be your best bet - development looks lively!
Posted by Alexander Deliyannis
Feb 27, 2020 at 05:54 PM
MadaboutDana wrote:
>Now this looks more like the kind of thing you need (Windows only,
>unfortunately): https://tshwanedje.com/corpus/
Apparently it also works on MacOS with some limitations.
>There’s also the respected WordSmith (again, Windows only), here:
I assume you mean this:
https://lexically.net/wordsmith/index.html
Interesting, thanks!