RFC - New Software Project: Infosqueezer
< Next Topic | Back to topic list | Previous Topic >
Posted by Lothar Scholz
Sep 11, 2019 at 11:15 AM
Hello,
I want to announce that I will start in October working two full-time years (that’s now full financed) on my Information/Knowledge Application. I spend the last years spare time to implement some prototypes and made enough failures to now be wise enough to create something great.
This Request For Comments is for you to provide me with your ideas, critiques and suggestions as i’m going to implement a few totally new concepts and combine it with well known stuff.
It will be a 1, 2 and even 3 pane outliner for information, a pdf+html reader with qualitative text analysis capability, a web clipper to provide this html and some kind of file management tool.
Development will happen on macOS and Linux first. I keep an eye to make it portable to Android and iOS but this will be coming later. The same goes for Windows because Microsoft GUI development currently totally suck and until they have decided about their direction I postpone all work on that platform.
*** Cards and Panes
Information is stored on little index cards which contain markdown text and optional an represented file (purpose as described later)
The card can link to other files but only represent at most one file. Wiki style linking among cards is possible.
There are two kinds of cards. Free Floating „Knowledge“ cards and cards that are the nodes of an outliner. Free floating nodes can be displayed in multiple outliners and multiple locations in one outliner or they can be outside of any outliner organization and only be revealed by search and browsing features.
The left pane outline is a full featured outliner with columns like Omni Outliner. It allows clones and mark+gather operations and all the features discussed and found useful on this board in the past. Each node in the outliner can also reference a free floating card, which is showing in the 2nd pane.
Compared to other 2pane outliners the concept makes the outline a full data document and not just the table of content of a collection of cards. This is a unique feature. The reason is easy to understand. For example if you write a book, you can use the outline pane to organize your chapters and add comments about the progress and todos while the cards in contain the content of your book. This decoupling is important because an organized set of content items is much more then the sum of its content and often information make only sense because of the combination of the items. No outliner so far could model this.
I just want to mention that there will be no difference between folders and items in the outline. This was concept taken from the technical implementation of file systems and I don’t know why so many outliners just copied it.
Third pane shows cards that have relations to the current card shown in 2nd pane.
This can be the destination/source cards of links on the current card or what looks similar to some AI algorithm.
Also each outline node can contain an explicit list of related cards. In the book writing example you would use this list to attach cards to a node with knowledge that you want mention in the text you write in the 2nd pane. This is a feature I’ve seen in IBM Doors Requirement Engineering Tool where you keep and reference the source material from there, like law requirements or technical specs. Sure you could just add them as intrinsic links on the outline node but having a separate list of documents is IMHO cleaner.
The 2nd and 3rd pane will show multiple cards at once.
*** Data Fields and Sections
As a huge fan of the abandoned Asksam and the ability to add data fields directly inside the document wherever you want I have to add this feature.
And in the current design it has turned out to be the key element.
The design rule is different from normal database that the user and entering information has priority over correctness of the database. If you have a year field in an art database you don’t need to enter a 4 digit year, you can also notate it as „painted during his orange phase living in Washington“ . This will be stored but underlined with just an error line. In the end its better to write it down then to forget it.
Fields can be added one or multiple times. Content of fields can contain multiple data values like comma separated list or a ZIP code/city/country combination that is automatically broken into data pieces by a pattern matcher.
For example if you write the following on a card in your movie database it will create a „Movie“ Record (declared by the @@ line) with the fields
@@Movie:
@Title: Star Wars: Episode IV - A New Hope
@Director | Writer: George Lucas
@Year: 1977
@Rating: 10
@Genre: SciFi
@Actor: Mark Hamill
@Actor: Harrison Ford, Carrie Fisher, Alex Guiness
@Synopsis:
*Luke Skywalker* joins forces with a [[Jedi Knight|Jedi]], a cocky pilot,
a Wookiee and two droids to save the galaxy from the Empire’s world-destroying battle station,
while also attempting to rescue Princess Leia from the mysterious Darth Vader.
As you see it’s possible write condensed like the @Fieldname | Fieldname: syntax or the comma separated list.
There are tons of more things and fine tunings but this explainnation should be enough to understand the idea. A a field can contain any markdown text, including images and links. In fact they just split the text into different sections.
Now back to the outline, you can add children to a node automatically, for example you have your curated outline of movies and a node „Best SciFi“ then you can add „@@Movies@Rating > = 9 AND @@Movies@Genre == SciFi SORT BY @Rating“ and the outline will automatically fill the child notes with the top movies.
„@@Movies@Rating GROUP BY @Genre, @Year“ would fill the outline three levels deep and create items like
Movies
- SciFi
- 1977
- Star Wars: Episode IV
If you know databases this is how the SQL Grouping clause works for selecting data. It will be automatically inserted into the outline when you collapse the item containing this query (and then stay this way until collapsed or explicitly refreshed).
I have so far never seen or heard of any outliner who automatically generates the data shown as children.
If you add nodes automatically you can’t add any individual text or attach cards to them (There might be ways but I don’t think about this at the moment). But you could add a description of the Database search you do with the top-level „Movies“ node so you know what happens when you collapse the node.
*** Outliner and Columns
The data fields can be used to fill columns in the outline. Either fields taken from the referenced free floating knowledge cards or the outline node card. In the book example you could add a Progress column and add a „@Progress: mostly done“ line to your outline node and it will show the field value as column/row value.
Each pane will have a slider at the bottom to control how many lines of the markup are shown. So you can compact the outline to one line of text and then use the columns to find to get an easy status overview.
*** Tags
Outlines and Tags are complementary ways to organize notes.
Infosqueezer will support tags in an autogenerated tag tree view (I’ve seen it as a PhD sometimes ago but can’t find the reference anymore, it has never been implemented in real products so far AFAIK). This is not the almost useless idea that Bear is doing but really smart and makes it possible to provide very good browsing through a document collection. Studies have shown that browsing and scrolling is still by far more popular to find items than direct search.
If you add multiple hashtags tag to your card say #Politics #Trump #BrExit to a card. This will generate a tag tree/outline with the following nodes:
- Politics
- Trump
- BrExit
- BrExit
- Trump
- Trump
- Politics
- BrExit
- BrExit
- Politics
- BrExit
- Trump
- Politics
- Politics
- Trump
This means all permutations of tags are created in the tag tree. Each level means that cards must have at least the tags specified by the current node and all nodes above.
You can filter the tags used to build the tree based on fields and search queries. For example you base the tag tree on every tag found inside @@Movies@Synopsis field. This opens endless opportunities to fine tune your database
*** Qualitative Text Analysis
This is another unique aspect i have never seen anywhere but will fit nicely and easily into the overall structure of Infosqueezer.
You can add pdf files or captured HTML webpages into the database. Each file gets it’s own free floating card. It is represented by this card.
This allows you to easily add meta data like bibliography fields to the files. If you use the normal PDF Annotation feature, the marked text and your own added text is added automatically to the card as a field inside an @@Annotation record. This will use the block quote and cite source syntax from the MultiMarkdown specification.
You can add hashtags to all each annotation and use the tag tree growing and all the other powerful methods above.
*** Multiple Pages for Card
Each card has multiple pages. Currently I think about: Foreground, Background, References and Annotations.
The use case for a foreground and background page can be easily seen in Wikipedia where you have the knowledge page and the discussion page for each topic. Or in IMDB where the main page contains an overview (with a selection of the cast members) and the background page can contain the full very long data list.
It makes sense to add another References page for footnotes (becoming endnotes) and pdf based cards become the annotation page.
In Markdown its easy. We have the horizontal ruler syntax already (three dashes). A page break will be specified by three tilde characters followed by the page name like
~ ~ ~
This is main page
~ ~ ~ Discussion
Lets talk about it
~ ~ ~ References
[^bible]: Isaiah 66:11 That ye may suck and be satisfied with the breasts of her consolations
I think this is enough for the first presentation pitch.
Posted by Alexander Deliyannis
Sep 11, 2019 at 05:40 PM
Quick reaction: I’m very impressed. There’s much in there that I miss in existing software (or find it scattered in different programmes but never all at once).
More detailed feedback will follow.
Lothar Scholz wrote:
>I think this is enough for the first presentation pitch.
Posted by Franz Grieser
Sep 11, 2019 at 07:28 PM
Sounds interesting. And you have a track record of creating and maintaining a stable app :-)
I will surely dig deeper and comment on some points.
First question(s): You talk about a database. Will the data be stored in a proprietary file format? What about the PDF files and HTML data that can be added? Where will they be stored? And what about images, audio, video, equations…?
Posted by Lothar Scholz
Sep 11, 2019 at 09:12 PM
> Sounds interesting. And you have a track record of creating and
>maintaining a stable app :-)
No. I do not. But i think we all learned to buy software never on promised features and timelines but always on what is available right now.
Products come and go, single developer or big corporation. This is not a kickstarter project.
I don’t ask for your money in advance, i ask for your thoughts.
I’m an experienced programmer and do it since i’m 14 years old and now have hit my 50ths birthday a few months ago.
So i think i’m at least more qualified then the guy from Polywick Storyserver.
Oh yeah, my german computer science master thesis was writing a search engine for usenet news. It was used by the once popular german search
engine called “Fireball” in the early days of the internet in 1998. And my interest for information processing never stopped afterwards.
>First question(s): You talk about a database. Will the data be stored in
>a proprietary file format? What about the PDF files and HTML data that
>can be added? Where will they be stored? And what about images, audio,
>video, equations…?
There is no “database”. I like the NoSQL “movement” because they have shown the world that SQL and relational databases are not the only way to do things.
I have developed a preprocessed format to store the markup text and index the data field / hashtag parts. This is good enough. The markdown of cards and outlines will be keept completely in memory (mmapped so it can be swapped out by the system on memory pressure) without special indexes. The data size is hardly a problem. Let this be a few hundert megabytes but even a few gigabytes will be ok. Just remember all threads and messages in this board have less then 20 MB in size. So people often overestimate this a lot.
By the way exactly this question was why in feb this year i asked here: https://www.outlinersoftware.com/topics/viewt/8580
The data itself is written generational, so only the modified delta is stored to reduce write operations on SSD.
Because the program will run purely in single user mode on your own local database on your SSD there is no need for database optimizations. We have disks with transfer rates of 3GB/sec now and CPUs with a 40GB memory throughput with 8 and more cores in mainstream desktops and even phones. It’s time to use them.
The program will not be cloud based but i want implement a Peer2Peer synchronization feature or an on premise synchronisation server.
PDF and HTML will be stored externally and so will any full text index. HTML snapshots are stored in a proprietary format to eliminate duplicate items.
I know very well that some people here love to have their data in the file system as normal markdown so that it can be accessed via Spotlight etc. Therefore i thought about storing a duplicate of the data in the filesystem or the very overengineered but fun idea to implement a custom user file system that gets mounted via FUSE and could provide very interesting access pattern to the stored data. Just for the case anyone want to run a script on them or import them elsewhere. Anyone old enough to remember the MH mail client? That was nerd fun. But there is no FUSE on windows so i doubt it will happen.
Video and audio ... they will be implemented as simple file links, nothing else on the agenda at the moment.
For equations, i looked at the way how ConnectedText handles Latex. It is opensource and i think i could integrate that. But it’s not on my agenda at the moment either, but i say it has a much higher probability to get on my agenda then many other features. In the second round of the markdown editor tables and equations will be added. But this is 2+ years in the future.
Posted by Franz Grieser
Sep 11, 2019 at 09:40 PM
Thanks for the clarifications.
>Lothar Scholz wrote:
>> Sounds interesting. And you have a track record of creating and
>>maintaining a stable app :-)
>
>No. I do not.
Oh, sorry. I confused you with Christian Tietze who (co-)created The Archive and a Markdown table editor (https://zettelkasten.de/tools/).
To follow up on my “database”/file storage question: So the data will be stored in a proprietary format on disk? No way to get the data out when I decide to stop using your app or when you (one fine day) stop updating the software?
But i think we all learned to buy software never on
>promised features and timelines but always on what is available right
>now.
>Products come and go, single developer or big corporation. This is not a
>kickstarter project.
>I don’t ask for your money in advance, i ask for your thoughts.
>
>I’m an experienced programmer and do it since i’m 14 years old and now
>have hit my 50ths birthday a few months ago.
>So i think i’m at least more qualified then the guy from Polywick
>Storyserver.
>
>Oh yeah, my german computer science master thesis was writing a search
>engine for usenet news. It was used by the once popular german search
>engine called “Fireball” in the early days of the internet in 1998. And
>my interest for information processing never stopped afterwards.
>
>>First question(s): You talk about a database. Will the data be stored
>in
>>a proprietary file format? What about the PDF files and HTML data that
>>can be added? Where will they be stored? And what about images, audio,
>>video, equations…?
>
>There is no “database”. I like the NoSQL “movement” because they have
>shown the world that SQL and relational databases are not the only way
>to do things.
>
>I have developed a preprocessed format to store the markup text and
>index the data field / hashtag parts. This is good enough. The markdown
>of cards and outlines will be keept completely in memory (mmapped so it
>can be swapped out by the system on memory pressure) without special
>indexes. The data size is hardly a problem. Let this be a few hundert
>megabytes but even a few gigabytes will be ok. Just remember all threads
>and messages in this board have less then 20 MB in size. So people often
>overestimate this a lot.
>
>By the way exactly this question was why in feb this year i asked here:
>https://www.outlinersoftware.com/topics/viewt/8580
>
>The data itself is written generational, so only the modified delta is
>stored to reduce write operations on SSD.
>
>Because the program will run purely in single user mode on your own
>local database on your SSD there is no need for database optimizations.
>We have disks with transfer rates of 3GB/sec now and CPUs with a 40GB
>memory throughput with 8 and more cores in mainstream desktops and even
>phones. It’s time to use them.
>The program will not be cloud based but i want implement a Peer2Peer
>synchronization feature or an on premise synchronisation server.
>
>PDF and HTML will be stored externally and so will any full text index.
>HTML snapshots are stored in a proprietary format to eliminate duplicate
>items.
>
>I know very well that some people here love to have their data in the
>file system as normal markdown so that it can be accessed via Spotlight
>etc. Therefore i thought about storing a duplicate of the data in the
>filesystem or the very overengineered but fun idea to implement a custom
>user file system that gets mounted via FUSE and could provide very
>interesting access pattern to the stored data. Just for the case anyone
>want to run a script on them or import them elsewhere. Anyone old enough
>to remember the MH mail client? That was nerd fun. But there is no FUSE
>on windows so i doubt it will happen.
>
>Video and audio ... they will be implemented as simple file links,
>nothing else on the agenda at the moment.
>
>For equations, i looked at the way how ConnectedText handles Latex. It
>is opensource and i think i could integrate that. But it’s not on my
>agenda at the moment either, but i say it has a much higher probability
>to get on my agenda then many other features. In the second round of the
>markdown editor tables and equations will be added. But this is 2+ years
>in the future.
>
>