Re: File Searching Blues
< Next Message | Back to archived message list | Previous Message >
Note: This message is from the outliners.com archive kindly provided by Dave Winer.
Outliners.com Message ID: 1707
Posted by ureadit
2003-10-28 14:00:28
Well, I sent my comments to five Search Software and eight PIM/Outliner vendors/authors. None of the search program authors responded. Two PIM vendors sent meanngless replies. MyBase indicated that it had a plug-in or similar for its data base structure. The author of KeyNote, Marek Jedlinski, responded to each of my ideas with the following (he gave permission to display this). By the way, KeyNote is a great (and free) PIM. See
http://www.tranglos.com/free/index.html
ME:
“> 1. PIMs would compress graphics, but the compression of text would be a user-chosen option.”
MAREK:
“This is only possible if text is stored separately from graphics. Any program that uses the Windows standard RTF editor (e.g. KeyNote, but not TreePad) cannot do this, since in RTF data images are stored
together with text, you can’t easily separate one from the other.
“Additionally, you can’t always use a search engine even with RTF text, because some special characters (national characters for non-Western languages, and special codes such as © or ô, are encoded. So a search engine needs to be able to fully parse RTF data and interpret it correctly - which is not a simple task.”
ME:
“> 2. Some software standardization. In particular, if developers of
> PIMs (or other software used for text information storage) could
> create doll, plug-in files that would “tell” search programs how to
> read their compressed files, and if file search developers would
> modify their software to accept such plug-ins, then the problem
> would be solved.”
MAREK:
“I’m pretty much convinced this isn’t going to happen. I often receive requests for KeyNote to import data from this or that PIM, but it’s usually not possible, because almost nobody publishes the file format specs. Some authors do (e.g. TreePad’s file format is publicly available), but they’re a minority. Is it because authors are lazy, or is it because they are not interested in making their files easy to work with for competing applications? Shareware market is highly saturated. At www there 302 (three hundred and two) products listed in the PIM category only, many of them freeware. When you ask $10 or $20 for a program that takes months to write and years to maintain, you probably don’t want to make life easy for your competitors. Of course I can complain about this, since I’ve only written freeware myself,but that seems to be the case and it’s hard to expect otherwise, even as it inconveniences users.
“But this issue also has technical difficulties. Many PIMs or out
liners are in some ways similar, but sufficiently different to make
conversion tricky. For example, KeyNote can import TreePad files, but
it cannot import the images which TreePad stores in a separate file,
because KeyNote’s RTF editor isn’t capable of inserting these images
in the middle of text (or knowing where they should be inserted). So,
while each program could in theory come with a set of APIs to access
its data, in practice there is no clear one-to-one mapping between
various programs, because any two programs may well have completely
different concepts of the kinds of data they store.
“I once wrote an add-on for KeyNote that can import “email or news
messages into a KeyNote file. It seemed like an obvious thing to do,
and a way to consolidate my email and Usenet archives, but it only
works to a point, because KeyNote has no concept of attachments,
doesn’t know what to do with HTML-formatted content, and of course has no special provision for header fields like To, From or Subject, so these fields are displayed as plain text and become useless for
organizing the archive.
“Another example: if TreePad tried to import a KeyNote file, it would
have to create many .hjt files for one .knt file, because a .hjt file
can only contain one tree, not many as KeyNote can. This is of course
possible, even easy, technically - but in practice it would be
somewhat messy and potentially confusing. (I’m importing one file and
suddenly I have seven files in TreePad, what’s going on?). Even though the applications we’re looking at are “power tools,” I spend a lot of time answering really basic questions, such as how to open a file in KeyNote or how to change fonts. I believe other authors have a similar experience. So you end up avoiding certain features which, while useful, introduce a complexity that you just know is going to cause a lot of headache for you.
“That said, the first problem is always going to be about the closed
file formats. It’s very nearly impossible to decipher a binary file if no documentation is available, so you’d first have to get many
shareware authors to decide that interoperability is better than
whatever profit, not much, they’re making.
“Best regards,
.marek jedlinski”