more idle thoughts - about file metadata
< Next Topic | Back to topic list | Previous Topic >
Posted by jimspoon
Nov 12, 2012 at 04:12 AM
We need to be able to attach user-defined metadata fields to any type of file.
We do have some metadata capabilities with NTFS - we have file comments, and extended attributes, and “alternate data streams”.
The problem is - if this metadata is stored in the file system, rather than inside the file itself - it may be lost whenever the file is copied to another drive or location, or sent to another person, etc.
The metadata needs to travel with the file itself. And it does - for some kinds of files. JPEG images have very good built-in metadata capabilities - you have EXIF, IPTC, and XMP fields to hold all kinds of metadata. And when you copy the file to another location, it is not lost.
Problem - if you added such metadata fields to a simple text file - other programs, other operating systems, etc. would not know how to deal with it. For example - if you loaded such a file into a Linux text editor, it might show up as gibberish, and it could be easily corrupted.
Solution - there ought to be a cross-platform method of joining “application data” and “metadata” parts in one file. The operating system should mediate so that “dumb” applications would see only the “application data” part of the file, while “smart” programs would be able to see and manipulate the metadata section. The file could be sent anywhere without losing its metadata.
Posted by Slartibartfarst
Nov 13, 2012 at 09:12 AM
“We need to be able to attach user-defined metadata fields to any type of file.”
Is this a general requirement?
To what purpose?
Cold you taek advantage of the “pack to go” files/folders, where the NTFS streams are bundled into the package, before the file is sent out somewhere or moved to non-NTFS media?
Posted by Cassius
Nov 13, 2012 at 05:05 PM
Several years ago, I looked into the possibility of using metadata in “file properties” as a means of tagging files (in Win XP). However, I could not get it to work, although I no longer remember why.
Posted by jimspoon
Nov 13, 2012 at 07:45 PM
Slartibartfarst wrote:
“We need to be able to attach user-defined metadata fields to any type
>of file.”
>
>Is this a general requirement?
>To what purpose?
>Cold you taek advantage of the “pack to go” files/folders, where the
>NTFS streams are bundled into the package, before the file is sent out
>somewhere or moved to non-NTFS media?
Well - inevitably a large part of our information is stored in discrete files ... notes, documents, photos, sounds, etc., while other information is stored in structured databases - emails, notes, contacts, tasks, etc. Ideally we would be able to search both types of “information stores” at once, and filter and sort the retrieved information in useful ways. Traditionally we have stored files in the directory tree structure. But as files have many different properties/attributes etc, they could be organized in many different ways. Just for example, you could arrange software programs by the author/vendor (the default method in Windows in the Program Files folder), or by function (the method I try to use in structuring my program files). You could organize your photo files by camera, date taken, event, persons appearing, etc. etc. Now we could use “hard links” or “symlinks” to categorize files in different ways .. so they would appear in several different locations in the directory tree. But if that information is stored in the file or directory system ... rather than the files themselves, it can be easily lost when moved around.
With adequate file metadata capabiltiies, notes could be stored in individual files, and this metadata could be used to retrieve, filter, and sort both notes and other related files.
I think it’s been an ongoing goal of Microsoft to transition from a tree-based structure to a database-type or tag-based structure for a long time ... as far back as Vista. A number of third-party file-tagging programs have appeared - but these suffer from the same problem .. the tags are usually lost when the files are moved out of the particular file system where it resides. Microsoft has come out with a new file system for Windows 8 called ReFS or something like that .. Resilient File System, which allows for much more in the way of metadata capabilities. For now, though it appears to be only for servers.
That’s an interesting idea about bundling NTFS data streams together when the data is sent out of the file system. The receiving system would have to know how to handle the bundled streams, of course. I suppose it’s lack of a standard that has prevent this sort of thing from happening.
Just for example ... when EXIF/IPTC/XMP metadata started to be included in image files, you had the problem that many image viewers/editors did not know how to handle this extra data in the file. Now most of them do, I suppose. Similar standard could be useful for other types of files too.
MP3 files have a well-established metadata system - players/editors on different operating systems know how to handle it.
A related point ... if “hard links” and symlinks were used to implement categorization of files ... that categorization would be lost if files are sent out without the other hardlinks/symlinks. Somehow the metadata needs to be contained in the files themselves.
Posted by jimspoon
Nov 13, 2012 at 07:54 PM
p.s. I have improved a very crude metadata system for files ... i put metadata into the file name. Specifically, I often begin file names with a date in the format “2012.11.13”. Then, using Voidtools Everything, I can instantly retrieve all files in the file system that have “2012.11.13” in the file name, and sort them by filename. I can drag and drop all the found files into a Xplorer2 “scrap container” to browse and view them. Very crude but I use it all the time.
You might wonder why I don’t just search by file creation date / modified date ... those dates are easily modified by the system, and these dates are often not the date I am looking for. Say for example, I have files about items I have purchased, and I want to view these files sorted by date of purchase. The “creation date” and “modified date” fields are not proper locations for this info.