Outliners indexing external links

Started by Anthony on 4/20/2024
Anthony 4/20/2024 10:11 am
In recent days, I've experimented with several outliners that, to my knowledge, were capable of indexing files. However, I only tested them with external links. The concept was to connect my ebooks and articles (mainly PDFs and ePubs) to the outliner and search their contents within it.

The testing was limited to a few programs using a PDF file. It's possible that I overlooked some features in the options. Therefore, please correct me if I'm mistaken, and please suggest additional outliners capable of this function.

Here are my findings:
MyInfo: no indexing
RightNote: no indexing
MyBase: no indexing
OneNote Desktop: no indexing (In the latest versions, Microsoft even removed the 'link to file' pop-up option when a PDF is dragged and dropped into OneNote)."
Anthony 4/20/2024 12:02 pm
Let me partially correct, and complete the outocome on:

RightNote: No indexing if a link is placed inside another note. Yes indexing (as an option) if the note is created directly from an external link. The RightNote search captures the presence of the searched word in the linked file, but not its precise place within the file.
Robin 4/20/2024 12:07 pm
Hi, super interesting topic. Isn't this related conceptually to the idea of providing a preview of linked URLs, for example in messenger apps? I.e., the outliner would fetch some data from the (local or remote) URL and include it inside the document, for example to provide preview or for search. Could imagine several ways of exposing or using such indexed information.
Cyganet 4/20/2024 1:30 pm
What an interesting question! I decided to check with the outliners that I have access to. My assumptions about your use case
- you're on Windows
- you would like to find a PDF that is linked (not embedded) in the outliner
- you would like to have a highlight of the matching text inside the PDF

Here are my findings:

Yes
===
Zotero: will find text inside a PDF that is linked to a Zotero item. Will not show you the matching text
UltraRecall: if you drag a PDF into it, it turns it into plain text (all lowercase) note that is searchable and the search term is highlighted. UR retains a shortcut to the original PDF in its URL field

No
===
UpNote
InfoQube
ConnectedText
Zim-wiki
The Journal
TheBrain
CherryTree
Trilium Notes
Obsidian

Non-outliners
===========
Windows explorer (search file contents) - will show the PDF but not highlight content
OneDrive search (on both web and Android) - will find the PDF but not highlight content

I hope this is useful for your search.

Pierre Paul Landry 4/20/2024 2:55 pm
Hi cyganet,

InfoQube can index web pages and PDF both local and online, do Item > Index linked file to perform indexing
but it seems broken in the current version. This will be fixed for the next release

Pierre Paul Landry
IQ Designer
Anthony 4/21/2024 7:45 am
@Robin @Cyganet @Pierre thank you for your great feedback, testing and clarification. And yes I missed to state that I was referring to Win OS.

About indexing Urls: I recall mybase v.5.x at the beginning of the millennium. It was well integrated with Internet explorer, showing, saving and updating web pages within the outliner. No longer so.

Very interesting tests Cyganet. I knew the ability of Ultrarecall in capturing the text of PDFs. Thanks to confirm that with the latest version. However I guess this is not exactly "indexing", but importing the whole PDF text, although the result is similar, or even better.
Using iFilters to search PDFs probably would take less space than importing the whole text. Pierre could probably say more on that given his wonderful skills.

Let me say something more about RightNote. It does overall a good job, in matching searched words.
- It shows in its results 2 words before and 2 after the matched word.
- As I already noticed it does not open the Pdf in to the right place: that should not be impossible since many PDF readers take attributes from a link so to open to an exact page.
-It does not index epub though, just PDFs if you flag the option.

The two best programs, that are not outliners, able to index files in my opinion remain (in the Win world) dtSearch and Archivarius 3000.
Slartibartfarst 4/21/2024 4:42 pm
From memory, OneNote (as part of MSOffice 2016/7) could do this if integrated on the Desktop with Windows Search.
It will also index decipherable speech in audio files dropped into OneNote.

However, Qiqqa might be what you are looking for: It also works with ePubs etc. too and OCR's AND indexes any imaged .PDF files it finds.
https://en.wikipedia.org/wiki/Qiqqa - with all its features, it can be a really useful research tool.

Hope this helps or is of use.
- Slartibartfarst.
MadaboutDana 4/23/2024 8:27 pm
... and on Mac, I suspect you’d be looking primarily at EagleFiler and DEVONthink Office (or on the pure search engine front, FoxTrot Pro or even HoudahSpot).
satis 4/24/2024 9:59 pm
If I wanted to use an app to ingest, index and search PDFs on a Mac I'd switch back to DevonThink (from EagleFiler). Between its OCR tech, concordance, and semantic and associative statistical data processing of custom indices, it's better built for searching and summarizing PDFs and finding relations between documents during search. I don't know of a Windows app that can do what it does.

My own needs for a 'shoebox app' are not that sophisticated so in 2016 I switched from it to EagleFiler (which appears to be nearing a new version this year), finding it easier and more pleasant to work in. DevonThink has advanced over the years and bugs I found annoying in it have been addressed, and it seems to have advanced its UI.
Anthony 4/28/2024 1:53 pm
@Slartibartfarst thanks for adding some suggestions.
You can set from control panel in Windows what folder to index, but windows search is far from ideal, and I think at that point there is no connection to the fact that a file is linked or not with Onenote. Moreover, in recent versions to foster the use of clouding, Microsoft has disabled in Onenote - as I already noticed in a previous post - a good feature present until v. 2010, i.e. drag&drop pop-up menu with the option of "linking the file". Now there is only "attached the file". To link a file you need to use only menus, and several clicks.

About Qiqqa, which belongs to a similar league of Zotero, Mendeley, Endnote: it is good to know that it became open source. But the latest version in github dates 2021, with one big drowback. After indexing thousand of files, the exe goes "out of memory", and becomes unusable (as explained here: https://github.com/GerHobbelt/qiqqa-open-source/releases To correct the error it would be necessary probably to turn the app 64bit but there is no sign of it.

In looking for similar software, I came across to Komga (https://komga.org/ It organizes (so no search contents I believe) comics, but I read it is good also for ePUBs and PDFs. I just wonder if someone has ever used it, being also multi-plaform.
Paul Korm 4/28/2024 8:17 pm
I would add MarginNote to the list. (Mac and Apple mobile platforms, only.)

MarginNote can index, in its internal library, any number of PDF, ePUB or other documents. The library can be searched quickly, with search terms shown in context. A related feature, "title link", will automatically highlight in PDFs or in outlines terms that match the tile of any note in the database. So you can hover the highlighted text and view the related notes in a floating popover. The related notes can be edited in that display. Notes viewable as an outline or mind map, or both at once.

I've used MarginNote for years, but I am not sure if it will someday stop working. The non-Chinese version seems to be abandoned by the owner, but it keeps running year after year.
nathanb 5/14/2024 5:04 pm
This topic always makes me take another look at https://www.tagspaces.org/

It's a different approach that is text-based metadata files alongside your files to be able to have stable tags, links, and notes that live natively with your files.