Mining E-mail Newsletters for Interesting Topics

Started by David Garner on 7/20/2020
David Garner 7/20/2020 8:08 pm
I've subscribed to a lot of e-mail newsletters through the years. Recently, I've discovered a bunch of them have pointers to interesting projects on the web. Unfortunately, I am getting so many newsletters, that I can't keep up with them. I'd like to automate the location and extraction of the data, but until I can get my brain around enough NLP (Natural Language Processing), ML (Machine Learning), and other AI (Artificial Intelligence) technology, to create something to do this, I should probably figure out if someone has already done this.

I've not seen any mention of this kind of software in the Open Source community, but that does not mean that it does not exist. Anyone know of anything that might address even part of this effort?

Thanks.

satis 7/21/2020 2:47 am
If you used a Mac you could put the files into DevonThink or use DevonSphere Express, which stores and organizes files/emails/scans then indexes, auto-groups and auto-classifies info using AI - basically a sophisticated algorithm that examines the text in a given document, compares it to the text in all other documents, and mathematically computes the degree of similarity for each one.

In another post you indicated you were using Windows so you perhaps could check out DevonThink, see if it's relevant and if so research any Windows alternatives.

That said, this really has nothing to do with outliner software or writing productivity, which is what this forum focuses on, so I'll stop. You might want to check out a forum more closely related to this subject, perhaps https://www.reddit.com/r/productivity/
Franz Grieser 7/21/2020 7:35 am
At the end of this thread someone recommended Ultrarecall for working with Outlook emails:
https://www.outlinersoftware.com/topics/viewt/9101/15

MadaboutDana 7/21/2020 8:05 am
It’s not as sophisticated as DEVONthink, but has certain advantages of its own: UltraRecall is very capable.

I’d argue that e-mail clients do actually fit the definition of “outliner”; as for information mining/retrieval, that’s an area where e-mail clients (as outliners) are notoriously unreliable. Hence the recent spate of messages about transferring data out of e-mail clients and into some kind of info management app (or outliner).

There was an all-in-one app for Windows that acted as e-mail client/file manager/basic notetaker, but for the life of me I can’t remember the name. I remember experimenting with it a little while before we switched to Apple and being mildly impressed, but it doesn’t seemed to have gained any traction.

I think the transformation of e-mail clients into fully fledged information managers is well overdue. Some interesting efforts have been made, but the most impressive ones all appear to be aggregating online systems (i.e. web clients that suck data from all your e-mail accounts into a central web client and then allow you to process e-mails from the individual accounts). For a wide variety of reasons, I hate that idea!

I’ve tried some of the more promising recent entries to the field (for macOS/iOS) such as Airmail, Readdle Spark, Canary Mail, etc., but even these supposed paragons of efficiency tend to fall down on the most important function of all: finding and retrieving information. The most efficient tool I’ve found for handling very large e-mail repositories is MailStore (an e-mail archival tool by a German developer; only runs on Windows). It’s an outstanding tool with an outstanding search function, and there is a free version.

Cheers,
Bill
MadaboutDana 7/21/2020 8:06 am
Sorry, “aggregating online systems” = “online aggregators”
Leib Moscovitz 7/21/2020 9:11 am
I suspect that you might be thinking about Zoot.

Earlier iterations of UR enabled one to use it as an email client (by using embedded Outlook), but this does not work in the most recent release, and it apparently doesn't work in earlier releases with the May 2020 update of Windows 10 installed.

Other admittedly imperfect Windows options at integrating email with PIMs include Essential Pim (probably the best such option for Windows), Leadertask (POP mail only, IIRC), and Harmony PIM (now DoogiePIM, haven't tried the DoogiePIM version yet).

Also, the latest version of EM Client (Pro version) confers a certain amount of PIM functionality, although nothing that could compare with a regular PIM.
Dr Andus 7/21/2020 11:32 pm
David Garner wrote:
Unfortunately, I am getting so many newsletters,
that I can't keep up with them. I'd like to automate the location and
extraction of the data

This is an interesting problem and probably analogous to any other situation where more information is coming your way then is humanly possible to deal with.

Without discounting your genuine need for a solution for this (I don't know much about it but sounds like Zoot might be relevant for this?), the question arises: what is the benefit of being exposed to such a constant flow of information (or rather data) as opposed to looking for that information only if and when you actually need it?

I have a similar problem with a service called academia.edu which sends me interesting research articles it thinks I might be interested in on a daily basis, and it ends up cluttering my Outlook inbox at work because many of the titles do look very interesting but in the end I never manage to look through them all because there is just not enough time in the day.

On the other hand all these emails simply are the results of automated searches that the service sends me, and I would probably find much more relevant information if I did a purposeful search on their site, where all this content is hosted and continues to exist.

There is the issue of serendipity, and that's the only reason I haven't unsubscribed from it yet, as I might discover some unexpected stuff occasionally. But then wouldn't I do that anyway if I did a purposeful search? Serendipity may still happen.

I realise not all content that arrives in newsletters will forever exist on the server that sent it, so there is archival value in it, but it also could be that one day all content will be discoverable again on the net, even deleted information, in a similar way that decades (and even centuries) old cold cases are now solved because of improvements in DNA analysis.

Anyway, I haven't got any answers, just problems, sorry... :)
Chris Thompson 7/22/2020 8:26 pm
As satis suggested, DevonThink is good for this, in that you can let it ingest newsletters without having to read them, and when you're writing about a topic or looking at a note you've written, it will identify ingested newsletters you've stored that also discuss the topic. It also supports auto-classification of material according to a taxonomy of topics you identify. It also can automatically ingest material from RSS feeds. This is most useful if the feeds include abstracts or full text, rather than just links.

If you find any solutions like this for Windows, please share them. I'm surprised there aren't more applications to fill this niche. DevonThink is one of the last non-cross-platform tools I'm using these days.
Jon Polish 7/23/2020 1:02 pm
Ultra Recall gets my vote too. I have used it in the past to archive vast amounts of email and the occasional drag/drop I needed for projects I was working on. Search and retrieval is versatile and robust.

A long time ago, I used askSam with varying degrees of success for this purpose. Others were far more successful. Fookes Software used to sell an excellent mail archiver, Mailbag Assistant. They just released Aid4Mail 5 which I understand is the replacement for Mailbag Assistant.

www.fookes.com

Jon
Jon Polish 7/23/2020 2:05 pm
Another option is MailStore. The home version is free.

https://www.mailstore.com/

Jon
Leib Moscovitz 7/31/2020 8:22 am
Don't know how well suited this would be for your needs, but X1 Search can be used for various types of email searches - might be worth giving it a try.
mdlynam 7/31/2020 1:26 pm


Leib Moscovitz wrote:
Don't know how well suited this would be for your needs, but X1 Search
can be used for various types of email searches - might be worth giving
it a try.


+1 on X1 Search.

Another option could be dtSearch (dtsearch.com) What's neat about it? You can set it to return results synonymous with your term, set 'fuzzy' parameters, etc.. This can help uncover concepts.

And--- a 3rd option -- noggle search (noggle.online) It makes use of Carrot2 (carrot2.org) clustering.
MadaboutDana 8/1/2020 2:11 pm
And if you want something friendly on about the same level as X1, you can try Copernic Desktop Search, which is the same vintage as X1 (actually, I think it predates it by a bit). Pleasant, but dtSearch is the real powerhouse (and much more expensive than either X1 or CDS). All these are for Windows.

Oh, and there’s a new kid on the block that apparently indexes and searches e-mail: Lookeen. It integrates with Outlook, apparently – the website (www.lookeen.com) is pretty interesting. The desktop version (for Windows) is very reasonably priced.

FoxTrot Pro is the bees’ pyjamas for macOS, of course.

Cheers,
Bill
eurobubba 8/3/2020 3:21 pm
Interesting. Can you (or anyone else here) comment on how it compares to HoudahSpot?

MadaboutDana wrote:
FoxTrot Pro is the bees’ pyjamas for macOS, of course.

Nomatica 3/22/2021 10:16 pm
David Garner<
I know this is an older thread, but I am curious if a solution was found? Interesting angles suggested in this thread.

Original post:
"I’ve subscribed to a lot of e-mail newsletters through the years. Recently, I’ve discovered a bunch of them have pointers to interesting projects on the web. Unfortunately, I am getting so many newsletters, that I can’t keep up with them. I’d like to automate the location and extraction of the data, but until I can get my brain around enough NLP (Natural Language Processing), ML (Machine Learning), and other AI (Artificial Intelligence) technology, to create something to do this, I should probably figure out if someone has already done this.

I’ve not seen any mention of this kind of software in the Open Source community, but that does not mean that it does not exist. Anyone know of anything that might address even part of this effort?

Thanks.

Cyganet 3/23/2021 7:31 am
I would also be interested in hearing about solutions, e.g. is Zoot a good option?

Related to the desktop search programs mentioned above, there's a huge range in pricing so perhaps also in functionality.

Can anyone speak from experience about how they compare? I'm on Windows and would want to search inside OneDrive, OneNote, office documents, emails and PDFs.

Thank you.
MadaboutDana 3/23/2021 2:39 pm
Sorry, @eurobubba, I’ve only just seen this – many apologies!

I’ve used HoudahSpot too, but there’s no real comparison between it and FoxTrot Pro. The basic version of FoxTrot Search does more or less the same thing as HoudahSpot – in fact, I’d say the latest version of HoudahSpot is rather more sophisticated. But once you move over to FoxTrot Pro, there really isn’t any comparison. Notably the ability to include/exclude specific indexes is amazingly useful, as is the word clusters function (this term near that term, to a specified degree). The one thing I’d love FoxTrot Pro to have is HoudahSpot’s rather neat “folding” capability (i.e. HoudahSpot folds up all paragraphs between one search “hit” and the next search “hit”, so you can see them close together but also in context – neat).

It really depends on your needs – if you’re a heavy text producer/user for many different clients, you’re going to go for FoxTrot Pro. If you’ve got broader/less specialised needs, you’re going to opt for HoudahSpot. I also occasionally use PDF Search, which I can thoroughly recommend. The limitations are effectively stated in the name ;-)

There’s a useful rundown of the differences between the different versions of FoxTrot Pro here: http://foxtrot-search.com/compare.html

Cheers!
Bill

eurobubba wrote:
Interesting. Can you (or anyone else here) comment on how it compares to
HoudahSpot?

MadaboutDana wrote:
>FoxTrot Pro is the bees’ pyjamas for macOS, of course.

MadaboutDana 3/23/2021 2:43 pm
Oh, the only other thing to mention is that HoudahSpot effectively uses Spotlight indexing behind the scenes (it amounts to a vastly more flexible version of Spotlight, really), whereas FoxTrot Pro creates its own indices.

Cheers!
Bill
Smithers 3/23/2021 9:42 pm
There's a service out there, I can't remember it's name, that was specifically designed to map autoresponders.

It was designed for internet marketers—i believe. It was expensive. I used it once, to map the autoresponders of one of my client's competitors.

Now I've gotta see what it was! I can't remember the name!