AI-infused

Started by satis on 9/15/2025

satis 10/5/2025 6:26 pm

Dr Andus wrote:

satis wrote:

>
>Dr Andus wrote:
>
>>every time I tried to sign up and have read through their privacy
>>policy during sign-up, it seemed to me that they were asking me to
>allow
>>them complete and absolute access to all my data and everything that
>I'm
>>doing
>
>I don't think that's accurate.

I
I was referring to the various
agreements during the signing-up process specifically, which gave me
this impression.

I think you got a misimpression. The documentation and links show pretty clearly that 'complete and absolute access to all my data and everything that I'm doing' does not comport with all the clear documentation about user privacy and user control and ownership of their data.

Dr Andus 10/19/2025 11:26 am

I'm very late to the AI party, in fact I'm still just looking in through the window, wondering whether to go in...

But I did find the latest AI extension for RoamResearch intriguing, especially the feature (3. Ask Your Graph Agent) where you can ask the agent to query and summarise your notes in the Roam graph:

https://roam-research.kit.com/posts/commentarii-roamani-roam-live-ai

I haven't used Roam as a knowledge base (I'm mainly using it for organising my daily todos and as a journal of what I've got done), but with this extension I can suddenly see the appeal of Roam as a Zettelkasten.

While even this extension can offer many different uses of AI, my main interest is to be able to query my knowledge bases (such as my reading notes, my own research notes, and my personal diary entries), to gain insights which otherwise would be difficult or very time-consuming to get or ones that I might not even think of.

I gather that NotebookLM offers this capability, plus I've seen other options where you can create a library for an AI solution to query locally (such as the HP AI Companion) or point an AI agent some other way to query a local folder on one's hard drive.

So I'm shopping around for a solution where I could accomplish the above, in a way where I could control what level of privacy I'm willing to give up for each query (e.g. HP AI Companion does give you the option to keep the query on the machine, or to use a cloud AI).

For instance I wouldn't want to share my personal journal entries with an online AI (though I could query them with a local small language model on my laptop), but I'm less precious of my reading notes.

Interestingly, the above Roam extension does allow for specifying different privacy levels, but it's hard to know what that means and in any case Roam data is already in the cloud.

Anyway, I'm wondering what your experience has been with this sort of "querying your own notes" use of AI agents, if anyone here has some experience with all this to share.

satis 10/19/2025 3:52 pm

I've experimented with querying my own notes using Perplexity Pro, and recently started using Perplexity's Comet web browser. (Full local file integration is not yet publicly available in Comet, only in dev builds.)

You can point Perplexity to a local folder and have it analyze the contents, but this requires using the Model Context Protocol filesystem access feature in the Perplexity Mac App, with specific setup requirements.

https://www.perplexity.ai/help-center/en/articles/11502712-local-and-remote-mcps-for-perplexity

https://www.perplexity.ai/comet/gettingstarted

https://apps.apple.com/us/app/perplexity-ask-anything/id6714467650

I've used it to look at folders of text files and PDFs to extract insights and summaries and overviews. You can say things like "Summarize this document" or "Explain the findings on pages 10 to 15". For some arcane PDFs I can get clear and simple explanations of complex topics, and can translate materials and explore related topics on the web. I've had it proofread documents and I've experimentally used it to get feedback, brainstorm ideas, and refine/rewrite my writing - though it usually rips all personality out of the rewrites.

(ChatGPT is often a lot better for rewriting suggestions, though it too churns out bland academic text.)

Perplexity lets you select up to 10 files at once (up to 40MB each) to perform cross-document queries and comparisons. It analyzes short files entirely, while "longer" files are processed to extract "the most relevant parts for your specific query," meaning it sometimes loses context or misses meaning in larger documents.

I just copy-pasted the above into Comet/Perplexity and asked it to rewrite for clarity, and it instantly gave me this mild rewrite (with its usual tic of inserting a '—' throughout its text):

I've been testing Perplexity Pro's ability to query my personal notes, and I recently started using their Comet web browser. Note that Comet's full local file integration is currently limited to developer builds—it's not yet available to the public.

The Perplexity Mac app can analyze local folders through the Model Context Protocol (MCP) filesystem access feature, though this requires specific setup steps detailed in their documentation.

I've used it to analyze folders containing text files and PDFs, extracting insights, summaries, and overviews. You can ask questions like "Summarize this document" or "Explain the findings on pages 10 to 15." This has been particularly useful for dense PDFs—I can get clear explanations of complex topics, translate materials, and explore related information on the web. I've also used it for proofreading and experimentally for feedback and brainstorming, though the rewrites typically strip out all personality. (ChatGPT generally handles rewrites better, though it also produces bland, academic-sounding text.)

Perplexity supports batch processing of up to 10 files simultaneously, with each file capped at 40MB. This enables cross-document queries and comparisons. While it analyzes short files in their entirety, longer documents are processed selectively to extract "the most relevant parts for your specific query"—which means it can sometimes lose context or miss meaning in larger files.

Dr Andus 10/19/2025 11:12 pm

satis wrote:

Perplexity lets you select up to 10 files at once (up to 40MB each) to
perform cross-document queries and comparisons. It analyzes short files
entirely, while "longer" files are processed to extract "the most
relevant parts for your specific query," meaning it sometimes loses
context or misses meaning in larger documents.

Thanks for sharing your experience.

I came across an article today which talked about the need to combine hundreds of Obsidian files into a single file, so that NotebookLM can read them:

https://www.xda-developers.com/move-obsidian-knowledge-vault-to-notebooklm-with-plugins/

Your mention that Perplexity also limits you to 10 files.

It makes me wonder if it's better to combine thousands of files into a single file when I'm importing them into Obsidian then, if I wanted to use AI at one point to analyse them.

For instance I have probably around 5000 individual diary entries in ConnectedText, and I was planning to import them into Obsidian, but now I'm wondering if it's better to combine them into a single text file.

Well, as I'm typing this here, I'm thinking 'why am I bothering my fellow forum members with this question, when I could probably ask this directly in one of the AI "answer engines"'...

We're getting to a situation where one could be just spending the entire day posting one query and prompt after another, as one is going through their day...

Andy Brice 10/20/2025 12:44 pm

Just this morning I searched "Capri" on Google. The AI overview told me it is a fictional island. But it's very much a factual island.

Amontillado 10/20/2025 1:26 pm

Just curious - how would one concatenate all the markdown files in Windows.

Linux and Mac are easy - in a terminal window, cd to the Obsidian vault and run find . -name \*.md -exec cat {} >> bulk.txt \;

I really need to learn more things like Powershell. I'm stuck in a bash world.

satis 10/20/2025 2:36 pm

Andy Brice wrote:

Just this morning I searched "Capri" on Google. The AI overview told me
it is a fictional island. But it's very much a factual island.

Before I finished typing the 5-letter word the Google suggestions bar offered "an island located in the Tyrrhenian..." with a tiny thumbnail from Wikipedia. When I finished I got the Wikipedia page as the first link (with thumbnail) a list of "People also ask questions" like 'Is Capri very expensive?", a "Things to know" section that has dropdowns for population, culture, pronouncination, history and location, and a sidebar on the right with a photo carousel, current weather, travel pricing link, and a maps link.

satis 10/20/2025 3:23 pm

Dr Andus wrote:

satis wrote:
>Perplexity lets you select up to 10 files at once (up to 40MB each) to
>perform cross-document queries and comparisons. It analyzes short files
>entirely, while "longer" files are processed to extract "the most
>relevant parts for your specific query," meaning it sometimes loses
>context or misses meaning in larger documents.

I came across an article today which talked about the need to combine
hundreds of Obsidian files into a single file, so that NotebookLM can
read them:

Your mention that Perplexity also limits you to 10 files.

It makes me wonder if it's better to combine thousands of files into a
single file when I'm importing them into Obsidian then, if I wanted to
use AI at one point to analyse them.

Even with concatenation I do not think you will get the full analysis you want on those files. Definitely not for free, and probably not in 2025 as a consumer.

LLMs, used direct or through an Obsidian plugin, have problems limited to context length (how much text the LLM can “see” at once). For big vaults the plugin or LLM fetches and summarizes what it thinks are the most relevant segments for the model to use and won't fully evaluate all your text.

NotebookLM lets you upload up to 50 sources (300 on paid plans) but each file has a 500,000 word and 200MB cap. If you combine thousands of documents into one massive file, you risk hitting formatting and organization problems. Its processing depends on its context window and can't analyze everything in a giant aggregation fluidly. It sometimes only covers the chunk or segment currently loaded, missing details outside its window, and breaking up citations and logical divisions. Merging files can lead to wrong citations and unreliable answers for large-scale projects or large numbers of files or large files.

Perplexity Pro, which costs $200/year and has access to several premium AI models like ChatGPT5 and Claude Sonnet 4.5, specifically states that short files are "usually" analyzed in their entirety and Perplexity can directly check the whole content to generate answers. But, Perplexity tells me, "Long files or large documents are processed differently—Perplexity will extract and analyze the most important sections or highlights, focusing on the parts most relevant to your query. This means it does not always use the entire content for its response, especially if the file is extensive."

One option for hobbyists and SOHO and academics has been installing (lesser) open source, locally-run models like Llama or Gamma on a PC, but you'll need a pretty beefy computer with gobs of RAM and GPU for that, and performance can vary radically based on model and model size. Off the shelf prebuilt PCs that are ready for local AI processing are in the $4,000 range and have RTX 4090 or 5090 GPUs in them and 64Gb RAM.

So, absent your own computer running your own managed high-end PC and LLM, concatenating files probably won't give you a complete analysis of your files.

It might yield decent results but they will be incomplete, and you won't really be sure where.

Andy Brice 10/21/2025 9:24 am

satis wrote:

Before I finished typing the 5-letter word the Google suggestions bar
offered "an island located in the Tyrrhenian..." with a tiny thumbnail
from Wikipedia. When I finished I got the Wikipedia page as the first
link (with thumbnail) a list of "People also ask questions" like 'Is
Capri very expensive?", a "Things to know" section that has dropdowns
for population, culture, pronouncination, history and location, and a
sidebar on the right with a photo carousel, current weather, travel
pricing link, and a maps link.

I just typed "Capri" into the same app on the same device. This time it gave me a completely different answer (correct this time).

Dr Andus 10/21/2025 5:32 pm

@satis, thank you very much for the thorough answer, it clarified a lot.

Well, I got myself an education discount and bought one of those high-end PCs (they are cheaper in the UK than the US, strangely, I guess because of the whole tariff situation)...

Maybe I'm a complete fool for spending still a lot of money on it, but my previous HP mobile workstation lasted me 10 years, so I'm hoping this one will also last for a while.

It comes with LLM Phi 3.5 installed on it, but I'm only just beginning to play with it.

So you can put me down in that category of "hobbyists and SOHO and academics" :-)

satis wrote:

One option for hobbyists and SOHO and academics has been installing
(lesser) open source, locally-run models like Llama or Gamma on a PC,
but you'll need a pretty beefy computer with gobs of RAM and GPU for
that, and performance can vary radically based on model and model size.
Off the shelf prebuilt PCs that are ready for local AI processing are in
the $4,000 range and have RTX 4090 or 5090 GPUs in them and 64Gb RAM.

So, absent your own computer running your own managed high-end PC and
LLM, concatenating files probably won't give you a complete analysis of
your files.

It might yield decent results but they will be incomplete, and you won't
really be sure where.

Dr Andus 10/21/2025 5:38 pm

In the article I linked to, the author used an Obsidian plugin that did the job (the Vault 2 Book plugin):

https://www.xda-developers.com/move-obsidian-knowledge-vault-to-notebooklm-with-plugins/

Amontillado wrote:

Just curious - how would one concatenate all the markdown files in
Windows.

Linux and Mac are easy - in a terminal window, cd to the Obsidian vault
and run find . -name \*.md -exec cat {} >> bulk.txt \;

I really need to learn more things like Powershell. I'm stuck in a bash
world.

Dr Andus 12/1/2025 2:09 pm

I realise now why I got this impression. I confused Gemini sign-up with the NotebookLM sign-up process.

It's the Gemini terms that seemed too invasive to me. I just tried to sign up again and was confronted with this:

"What activity is collected

Google collects your chats and what you share with Gemini (like files, videos, screens and photos). It also collects transcripts and recordings of your Gemini Live interactions, your feedback, info from websites that you visit with Gemini, product usage and location info (including your device's general location, IP address or Home or Work addresses in your Google Account)."

And you either accept this or you can't use Gemini. This is what seemed way too invasive to me.

satis wrote:

Dr Andus wrote:

>every time I tried to sign up and have read through their privacy
>policy during sign-up, it seemed to me that they were asking me to
allow
>them complete and absolute access to all my data and everything that
I'm
>doing

I don't think that's accurate.

NotebookLM does not use user data to train its AI models. Google baldly
states, "NotebookLM does not use your personal data, including your
source uploads, queries, and the responses from the model for training."

satis 12/1/2025 9:13 pm

Google offers a setting called “Keep Activity” and turning it off prevents new chats from being stored long-term and excludes them from model-training and human review. You can also remove precise-location permissions in your device settings; in that case Gemini receives only coarse/general IP-based location info.

I keep location services disabled for nearly all my apps in iOS, and the only effect is that an app will request one-time access when it actually needs it (if I allow it to ask in per-app settings) such as the Starbucks app asking for location so I can select a pickup store.

You further reduce data exposure by leaving connected-app integrations disabled and not using features like “Gemini in Chrome,” which tend to funnel browsing information into the service.

With business/enterprise/government Gemini accounts (eg Google Workspace) conversations remain within the organization and data is not used for external model-training by google... but retention and deletion follow the organization’s administrative policies.

Most competing large-language-model services like ChatGPT and Claude have data-privacy provisions that are broadly similar to those of Gemini in that they collect user inputs, metadata, and (depending on settings) may use user data for model training, with opt-outs in (occasionally obscure) settings.

1 2 3