Information Archive Suggestions
< Next Topic | Back to topic list | Previous Topic >
Posted by Chris Eller
Sep 27, 2010 at 07:35 PM
I currently have information dating back to the early-90s stored in a variety of formats ranging from Word documents, pdfs, InfoSelect, OneNote, Evernote, and now Personal Brain. (All of my pre-1990s data was stored on 5.25-inch or 3.5-inch floppy disks, with most of it written on my trusty Epson QX computer that ran on CP/M. We won’t even talk about my college papers (1982-1986) written on my Atari 64 that were stored on cassette tape!) Some formats will probably never become extinct (e.g. Word docs and pdfs), while others are already starting to gather dust (e.g. InfoSelect). My experience with InfoSelect leaves me cautious of putting all my eggs in one basket that may someday no longer exist and that has no simple export feature.
Has anyone on this forum thought through these issues and come up with a good long-term archival strategy for information. I am willing to make the effort to assemble and compile all of the various resources into one database, but I want some degree of confidence that it will still be easily accessible in 10 or 20 years. Pdf files are an obvious choice, but GETTING information into a pdf format is tedious and labor intensive and getting information OUT of the pdf format is equally difficult. Moreover, there are few good database programs that handle pdf files well.
Thoughts?
Chris Eller
Posted by Hugh
Sep 28, 2010 at 07:50 AM
There was some discussion of this on the Scrivener forums a while ago. Plain text was the suggestion from people who know about these things better than I. This may not be what you want to hear!
Posted by MadaboutDana
Sep 29, 2010 at 10:29 AM
Interesting question. I’ve given this a lot of thought, as it happens (we have corporate records dating back to 1990, in a similar variety of formats), and I’ve concluded that the safest format is in fact HTML (if you want to preserve any formatting, that is!). The backward compatibility is a key benefit, and the format itself is unlikely to go out of fashion in the near future (although who knows what awaits in the 2030s, once Steve Jobs has managed to entice the entire world into his walled garden!).
The advantages of HTML are significant:
- you can import/export it into/out of a very wide variety of apps
- you can index it easily using a very wide variety of search engines/apps (including some excellent open-source ones)
- it’s nice and flexible - we have a lot of bitexts, for example (translations with source text in LH column, translation in RH column), and HTML handles liquid tables with panache (so they’re easy to view on any size of display)
- it supports Unicode (or any other character set, depending on your needs)
- you can output HTML to more or less anything (including PDF if you want to; we actually generate all our admin documents from our web-enabled admin database through Internet Explorer, using CutePDF - saves us a fortune. And before anybody asks: why don’t you use Firefox or Chrome, the answer is, we do, but they’re both rubbish at formatting print layouts - Chrome much worse than Firefox - whereas Explorer is actually quite competent).
Hope that’s helpful!
Posted by $Bill
Sep 29, 2010 at 11:35 AM
I made the decision to save plain text files where appropiate and to use pdf/a - a file format designed for the long-term archiving of electronic documents, where it is necessary to preserve the formating and/or non-text content. see http://en.wikipedia.org/wiki/PDF/A
Posted by Arnold
Sep 29, 2010 at 05:02 PM
Do not forget:
EMACS, and LaTeX.
Plain text, formatting and when EMACS is used with Org mode good for calendar/tasks.
I have moved most documents to LaTex or OpenOffice .odf format.
Plaintext when combined with document management (check in/out with revisions) may prove the longterm solution.