file searcher with user-defined metadata columns (Windows)

Started by jimspoon on 1/18/2021

jimspoon 1/18/2021 4:48 am

Is there a Windows file searcher / indexer / manager / cataloger that allows users to add their own metadata fields, edit the data, and use the data to quickly filter a list of files? It suddenly occurred to me that it would be awesome if there was a file search utility that operated like a spreadsheet for assigning and editing all kinds of metadata.

Voidtools Everything is my go-to file search utility. It's very fast and light on system resources. Unfortunately it indexes only path names, file names, file sizes, etc. and provides only a small selection of columns. It provides no ability to add, edit, or search user metadata fields.

Windows Search does index the content of files, emails, etc. And Windows File Explorer can be configured to show many different file metadata columns. Unfortunately it does not allow a user to add arbitrary fields:values - these are defined as a part of the file type definitions. For example, if I wanted to add a "Purchase Date" or "Vendor" column, there's not way for me to do that.

There are some file managers out there (xplorer2, xyplorer, total commander) that allow "programmable" columns. The value inserted in the column for a given file is determined by a script, based on other properties of the file. But I haven't found one that allows the creation of columns based on data entered by the user.

I've also tried a file cataloging program called WinCatalog. Now here I finally have the ability to add user-defined columns and fill it with my own data values. But it choked when I tried to index a very large number of files and search on the catalog.

Since the user-defined data fields and values I'm looking for have not made it into the Windows file system, I've thought a better solution might be to use my information manager of choice, which is Infoqube. I could search file items together with non-file items (notes etc.) With Infoqube I can create as many data fields as I want, and I can easily associate any single file with a single IQ item. I can assign any field:value combinations I want to each file. But the linking of files with Infoqube has to be done manually - and there is no way to know which files already have a corresponding Infoqube item, and which do not.

Any ideas?

Gorski 1/18/2021 2:01 pm

My first thought was TagSpaces. I have no experience with it though.

https://www.tagspaces.org/

Lucas 1/18/2021 3:07 pm

Good question. Such software would certainly be useful. I've used the excellent Tabbles (tabbles.net) for organizing files on Windows, but I think it's limited to tagging.

As you know, this question has already been discussed at length in this forum:

https://www.outlinersoftware.com/topics/viewt/4455/10

The only other possibility I'm aware of is the enterprise-oriented software Globodex, although I've never used it. For instance, see this article:

https://www.globodox.com/globodox-features/find-files-effortlessly/

From the article: "you can use GLOBODOX to add custom fields such as ‘Verified by’, ‘Amount’, and ‘Client’ to the Invoice document type. When you want to find the invoices as specified above, you can use GLOBODOX’s multi-condition search feature to search for invoice documents by entering the ’Amount’ as less than or equal to $1000, ‘Verified by’ as ‘Joe Smith’, and ‘Client’ as ‘LMN Inc.’ You will find all documents satisfying the criteria in an instant!"

However, being enterprise-oriented software, the price isn't even listed. You have to contact them.

Alternatively, if all of your documents are Microsoft Office documents, it is easy to add custom properties:

https://support.microsoft.com/en-us/office/view-or-change-the-properties-for-an-office-file-21d604c2-481e-4379-8e54-1dd4622c6b75

Achim 1/18/2021 6:14 pm

Hi,

maybe UltraRecall could be worth a try, if you import your files as links. Then you can create arbitrary fields and display them in a grid. Unfortunately no inline editing, this has to be done via input form. But it has a synchronize function and find-as-you-type.

Best regard
Achim

jimspoon 1/18/2021 7:19 pm

Thanks all! I've looked at tagging programs a lot. Although you can filter by tags, you can't sort on a given attribute by value, which I would like to have.

I haven't heard of Globodox, I'll have a look. I didn't know you could add custom columns to Office documents, that helps!

It's been a long time since I looked at Ultra Recall. I'll have a look.

Another option - a indexing / search program could use a named Alternate Data Stream to store attribute:value pairs, and these in turn could be used as columns and values. I haven't found a program that does that. I just looked at Directory Opus and Total Commander for that. Another concern with this is, would the attribute:value pairs be indexed or would a script have to be run anew each time the file list is loaded? I think it was xyplorer that offered such a scriptable column, but the documentation warned it would be slow.

Orchid 1/18/2021 8:56 pm

Maybe this will fit your needs:

http://www.modus58.net/modusdoc/

rafael costacurta 1/20/2021 11:07 am

There is an app on sale in bits Du jour that seens to do what you want:

https://www.bitsdujour.com/software/wincatalog-2020

SmallDog 1/21/2021 7:06 pm

If you use directory opus, pretty sure this fits your description. The data is stored in the filenames. So for example you can have a file named "abc (category=cat1, duedate=20220101).docx" the file will be displayed as abc.docx and have two extra columns containing the values of those two attributes

https://resource.dopus.com/t/column-custom-text-and-regexp/18727

jimspoon 1/24/2021 4:12 am

Thanks very much! In fact I have tried out WinCatalog in the past and I am happy to report that I had a better experience with WC2020 than with WC2018. It was able to find 1,000,000 hits very quickly, I think because of a new database format and use of an index. Sorting the hits didn't work out so well though, and the developer says he'll try to optimize this.

I am interested in the Directory Opus script for sure. Tagspaces puts metadata in the file name, but it can only be used as tags, and this script allows data extracted from the filename to populate columns.

jimspoon 1/24/2021 5:53 am

I've been doing some digging and learned some things. Windows File Explorer columns display file "properties". These "properties" are contained somewhere in the file's data, usually the file's primary data stream. The properties for each file type (distinguished by file extension) can be defined in a .propdesc XML file. Explorer and Windows Search can extract a file's properties using a "property handler" DLL. Microsoft recommends that the properties be stored inside the file's primary data stream, so the metadata can not be separated from the data. I haven't had much luck finding a way for a user to use the Windows Properties System to add user-defined properties to those provided by the developer who defined the file format.

But I have found one interesting effort along these lines, a program called File Meta. File Meta allows a user to store properties inside an "alternate data stream" that is part of the file's data. File Meta provides a "property handler" for these properties, which Windows Search can use to index the metadata. Even so, File Meta by itself mainly seems to provide a way to attach existing Windows properties to files that do not have properties inside their primary data streams. The focus was not on providing a way for users to define their own properties to attach to files. But when encouraged by a user, in 2020 the author of File Meta produced a program called "Custom Windows Properties" for this purpose. The developer seems to have moved on to other things for now, but I would like this idea be further developed.

Some links:

File Meta: https://github.com/Dijji/FileMeta
See screenshot here: https://github.com/Dijji/FileMeta/wiki/Using-the-File-Meta-Association-Manager
Download links: https://github.com/Dijji/FileMeta/releases/tag/v1.6

Custom Windows Properties:
Description and screenshot: https://github.com/Dijji/CustomWindowsProperties
Download link: https://github.com/Dijji/CustomWindowsProperties/releases

Pierre Paul Landry 1/24/2021 3:01 pm

jimspoon wrote:

With Infoqube I can create as many data fields as I want, and I can easily associate any single file with a single IQ item. I can assign any field:value combinations I want to each file. But the linking of files with Infoqube has to be done manually - and there is no way to know which files already have a corresponding Infoqube item, and which do not.

Indeed. As a minor improvement, in v117Pre3, a warning is shown when an item already points to that file/folder/URL and you'll be able to link to the existing item instead of creating a new link.

I already have working code to monitor NTFS folders and and/remove items to sync with the file system, but never got around to implementing it in IQ.
If there is a demand, I may spend the few hours needed to do it :-)

Pierre

jimspoon 1/24/2021 6:41 pm

That would be great Pierre !! I think lots of people - both IQ users and non-IQ users - would be interested in that. For example the people who have taken an interest in the File Meta and Custom Windows Properties projects.

Alexander Deliyannis 1/25/2021 9:00 am

Pierre Paul Landry wrote:

I already have working code to monitor NTFS folders and and/remove items
to sync with the file system, but never got around to implementing it in
IQ.
If there is a demand, I may spend the few hours needed to do it :-)

This sounds fantastic!

22111 1/30/2021 12:38 am

Have you read this thread? They talked the EV developer into creating indexed csv metadata tables (it's in the work); with that, EV will become even more valuable, and even today, it's incredibly valuable.

https://www.voidtools.com/forum/viewtopic.php?t=7007

Do you speak of numeric values, for files? That's not a common use case, but even today, it's feasible to get filtered numerical retrieval, in EV. (sort being another thing, you could do it by scripting, i.e. regex search for the numbers in EV > export into an array > sorting over there - IF there is a way to do it all IN EV, please let me know.)

Regex is compacted logic so to speak, regex has no "AND" and doesn't need it.

Eliminate "thousand dots/commata" (preserve decimal dots/commata, of course). Have a distinct code, e.g. for $e39826.34. [] are classes, with {min,max} occurrences (number of chars/digits).

You then want to search for files above 40 thousand; what does that mean?
- any 5 digits number (digits before possible decimals) with first digit min 4
- OR any min-6 digits number
Thus, you do an "OR" search for both possibilities, and if just one is a find, EV will list the file (no spaces in real code, dollar sign is "escaped"):
\$e ([4-9]{1}[0-9]{4}) | ([0-9]{6,})

You want under 100 thousand? This means max 5 digits, so you have to exclude 6 and more digits, by a negative look-ahead, the "+" meaning "at least one of":
\$e [0-9]{,5}(?![0-9]+)

You want a range? I deliberately make it complicated (instead of e.g. 30-70,000) in order to state how to do it: 30,000 to 750,000 (in fact 749,999, the 750,000 would be another "OR"):
- 30,000 to 99,999 is 5-digits, first digit min 3
- 100,000 to 699,999 is 6-digits, first max 6
- 700,000 to 749,999 is 6-digits, first 7, second max 4
Thus, we have 3 "OR" groups (no spaces or line breaks in real code):
\$e ([3-9]{1}[0-9]{4}(?![0-9]+)) |
([1-6]{1}[0-9]{5}(?![0-9]+)) |
(7[0-4]{1}[0-9]{4}(?![0-9]+))

Once again, in the form it should be fed to EV: \$e([3-9]{1}[0-9]{4}(?![0-9]+))|([1-6]{1}[0-9]{5}(?![0-9]+))|(7[0-4]{1}[0-9]{4}(?![0-9]+))

It does NOT get any more complicated than that for any real-life use case, since for just need one "OR" group for any same-number digit range, plus one each for the "lesser-number digit" range below and above (in this example, for the 5-digit ranges at the ends of the 6-digit span).

It's just the real string which looks horrible, with a little macro for entering the real numbers into the "forms" above, and then deleting the spaces and linefeeds, it should be easy.

The ADS "comments" in DO, x2 and XY (which is 32-bit anyway) are in binary format, i.e. not human-readable, not scriptable / readable / writable from external scripts, whilst ADS which you write by scripts, can be read / retrieved by scripts, and in EV, it's on the ToDo list: https://www.voidtools.com/forum/viewtopic.php?t=7059 : "Indexing and maintaining this information would not be fast or efficient because the contents of named streams are scattered across the volume." - ADS are additional files (big or tiny), whilst everything within the file name (incl. tags, in case) comes from the MFT and as such is automatically indexed in EV.

Without indexing of ADS, forget it, especially if you want to use such tags all over your folders, instead of just for one specific folder, that info has to be retrieved one-by-one, you can watch that in x2 e.g. - it's just not functional. On the other hand, for additional file info which just must be at hand when the file itself is open, ADS is a very valuable means... except if you want to (also) store your files in the "cloud", since most (all?) cloud file space providers just cut them from the file - boo!

jimspoon 2/12/2021 12:50 am

Hi 22111, thanks for your reply, I hadn't just seen it until just now.

I hope that Everything will soon support metadata in CSV tables as the developer has said.

I use Everything all the time and have often used regular expressions to find files and it works great. As you say metadata fields could be placed in the filename and regular expressions could be used to find those files.

Filename and path length limitations might be a concern - although a max path length can be increased to 32K with a registry change, it still seems that that max file length is 255.

Sorting on values returned by a regex expression is not yet enabled in Everything but is on the to-do list. https://www.voidtools.com/forum/viewtopic.php?f=5&t=6656&p=20745&hilit=sort+by+regex#p20745

You're right, an ADS is no different from the unnamed $DATA stream when it comes to where it is stored - depending on length it might be resident in the MFT or non-resident. If non-resident a search program would have to follow pointers to clusters and index the data found there.

If tags could be placed in a "file attribute" (unfortunately named because easily confused with the read-only, hidden, system, archive attributes etc.) that would always be resident in MFT file record segments, perhaps Everything could index them easily. But as far as metadata is concerned Microsoft definitely seems to have settled on the Windows Property System in which properties are stored in the "unnamed" (primary) file data stream, and extracted using "Property Handlers", indexed by Windows Search. Unfortunately the Windows Property System doesn't give USERS the opportunity to define their own metadata fields and values - at least I haven't found it - this is up to app developers.

https://docs.microsoft.com/en-us/windows/win32/properties/windows-properties-system

https://docs.microsoft.com/en-us/windows/win32/properties/building-property-handlers-properties

22111 wrote:

Have you read this thread? They talked the EV developer into creating
indexed csv metadata tables (it's in the work); with that, EV will
become even more valuable, and even today, it's incredibly valuable.

https://www.voidtools.com/forum/viewtopic.php?t=7007

Do you speak of numeric values, for files? That's not a common use case,
but even today, it's feasible to get filtered numerical retrieval, in
EV. (sort being another thing, you could do it by scripting, i.e. regex
search for the numbers in EV > export into an array > sorting over there
- IF there is a way to do it all IN EV, please let me know.)

Regex is compacted logic so to speak, regex has no "AND" and doesn't
need it.

Eliminate "thousand dots/commata" (preserve decimal dots/commata, of
course). Have a distinct code, e.g. for $e39826.34. [] are classes, with
{min,max} occurrences (number of chars/digits).

You then want to search for files above 40 thousand; what does that
mean?
- any 5 digits number (digits before possible decimals) with first digit
min 4
- OR any min-6 digits number
Thus, you do an "OR" search for both possibilities, and if just one is a
find, EV will list the file (no spaces in real code, dollar sign is
"escaped"):
\$e ([4-9]{1}[0-9]{4}) | ([0-9]{6,})

You want under 100 thousand? This means max 5 digits, so you have to
exclude 6 and more digits, by a negative look-ahead, the "+" meaning "at
least one of":
\$e [0-9]{,5}(?![0-9]+)

You want a range? I deliberately make it complicated (instead of e.g.
30-70,000) in order to state how to do it: 30,000 to 750,000 (in fact
749,999, the 750,000 would be another "OR"):
- 30,000 to 99,999 is 5-digits, first digit min 3
- 100,000 to 699,999 is 6-digits, first max 6
- 700,000 to 749,999 is 6-digits, first 7, second max 4
Thus, we have 3 "OR" groups (no spaces or line breaks in real code):
\$e ([3-9]{1}[0-9]{4}(?![0-9]+)) |
([1-6]{1}[0-9]{5}(?![0-9]+)) |
(7[0-4]{1}[0-9]{4}(?![0-9]+))

Once again, in the form it should be fed to EV:
\$e([3-9]{1}[0-9]{4}(?![0-9]+))|([1-6]{1}[0-9]{5}(?![0-9]+))|(7[0-4]{1}[0-9]{4}(?![0-9]+))

It does NOT get any more complicated than that for any real-life use
case, since for just need one "OR" group for any same-number digit
range, plus one each for the "lesser-number digit" range below and above
(in this example, for the 5-digit ranges at the ends of the 6-digit
span).

It's just the real string which looks horrible, with a little macro for
entering the real numbers into the "forms" above, and then deleting the
spaces and linefeeds, it should be easy.

The ADS "comments" in DO, x2 and XY (which is 32-bit anyway) are in
binary format, i.e. not human-readable, not scriptable / readable /
writable from external scripts, whilst ADS which you write by scripts,
can be read / retrieved by scripts, and in EV, it's on the ToDo list:
https://www.voidtools.com/forum/viewtopic.php?t=7059 : "Indexing and
maintaining this information would not be fast or efficient because the
contents of named streams are scattered across the volume." - ADS are
additional files (big or tiny), whilst everything within the file name
(incl. tags, in case) comes from the MFT and as such is automatically
indexed in EV.

Without indexing of ADS, forget it, especially if you want to use such
tags all over your folders, instead of just for one specific folder,
that info has to be retrieved one-by-one, you can watch that in x2 e.g.
- it's just not functional. On the other hand, for additional file info
which just must be at hand when the file itself is open, ADS is a very
valuable means... except if you want to (also) store your files in the
"cloud", since most (all?) cloud file space providers just cut them from
the file - boo!

jimspoon 2/25/2021 10:50 pm

UPDATE: from author of Voidtools Everything:

Display, searching, sorting by and indexing tags is mostly complete.

There's still a little more to do before an alpha release.
I still need to add support for editing tags within Everything which should happen during alpha/beta.

https://www.voidtools.com/forum/viewtopic.php?f=2&t=9689

WSP 2/26/2021 1:12 pm

It's very easy to do this sort of thing in MyInfo. You simply create a new attribute, which then appears (if you wish) as a separate column in the notes list panel. You can then type "done" or "yes" or (in my case) "y" in that column when you complete it.

I use this procedure daily. If I am compiling information for a blog, for example, I set "Blog" as a column so that I can readily spot if an entry has been posted yet. I can of course do the same thing with a tag in MyInfo, but a column in the notes list is better visually.

The "Blog" column, incidentally, can be sorted if I wish. That instantly arranges all my unfinished items together in the list.

22111 2/27/2021 10:58 pm

Hi jimspoon,
you say, "an ADS is no different from the unnamed $DATA stream when it comes to where it is stored - depending on length it might be resident in the MFT or non-resident. If non-resident a search program would have to follow pointers to clusters and index the data found there."

As for the former, I don't know, I have never heard of ANY ADS stored in the MFT, and then it would probably not being named as such?

I suppose though - I might be wrong here - that the NTFS "file attributes" are stored within the MFT; unfortunately, their list is finite, i.e. neither the developer nor the user can create any other.

For "attributes" within the files, you say it's up to the developer, which is true, but most file formats are standard formats, e.g. for pics, audio..., and I think they have a FIXED set of possible (internal) "attributes" which the developer then can "use" (i.e. make available to the user) or not, but the developer cannot (I might again be mistaken here) create further, specific (internal) "attributes"; this doesn't apply of course to new, proprietary file formats but users nowadays tend to avoid those and will certainly not accept them in the pic-vid-audio field.

EV has a (default = "on") setting for EV also indexing those NTFS file attributes; except for your "C:\" system drive which you should leave alone of course, you could use "hidden" and "system", and their combination, as 3 different "ToDo" indicators, which, thanks to EV, could bring IMMEDIATE, sorted "ToDo", "Project" lists, etc. onto the screen, even spanning elements from your whole system - the interest here is that you don't have to fiddle with the file names, and "ToDo" indicators and such should indeed leave the file names alone, tagging within the file names just should be made for non-ephemeral indications / referrals (to avoid link-breaking, among other things); those lists are also exportable, etc. (It's even possible to do it by command-line, i.e. to trigger STORED "filtered and ordered searches"...

As for numbers in file names, in many cases and for private use cases, these will be years, i.e. 19xx to 20xx for movies, CDs, etc., or even years-and-months for pics (e.g. 2004-11); the necessary regexes will be utterly simple.

wsp, I had tried the MI fields in version 5 for "real use", and MI crashed. Then, in MI mid-v6 (i.e. not early 6, not the latest version either), I just "wanted to know", and on a modern PC (where everything else works just fine), I just created some 10, 15 (certainly not 20 or more) items, and with some 5 or 6 fields, with very simple data, numbers without decimals, short strings, simple (not even combined, and I don't remember if that would have been possible) filtering then IMMEDIATELY CRASHED that very tiny, very simple MI file, and I never spent another minute with that.

(I had a full version of v6 and had intended to apply it to a real whilst limited use case, so I was interested in it functioning well, not in confirming some bad prejudice ("it'll crash again": no, that was some 8 years and many minor versions later), but I don't have any use case for a program which crashes less than 5 minutes into its first use with minimal data.)

Now, I'll wait some other months, will then try again with a very simple, tiny trial set-up. Considering that v7 is said to have changed the underlying DB (SQLite now, like UR and RN?), those problems may have been resolved indeed, but prospects allured by those fields should be aware of possible problems before buying hastily though, all the more so since the (possible) use of SQLite is not, per se, a guarantee that all will be okay: It was you who told us here that once you brought the (also SQLite-driven) CintaNotes to its limits, things didn't go so well anymore...

And in v5/6, users complained about the lack of different sets of fields in different sub-folders within MI: Whenever they wanted to use a certain set of fields for a certain set of items, and another set of fields for another set of items, they thus were forced to create multiple DBs. (And no, that's not a limitation inherent to the use of a relational DB in general or of SQLite in particular, it's just lazy programming; I know how it's be done, within a SIMPLE DB setup, NOT by multiplying DB tables or such, which would be as totally unnecessary as it would be totally ugly indeed.)

MadAboutDana (another current thread): Trying just another app? Hoho! There are spouses who've got 300 pairs of shoes within their walk-in wardrobe, but you don't call them "collectors", they swear they needed them, and they will certainly be useful again (both, spouse+shoes). That's an expensive hobby if those shoes are "Prada" and the like - men, on the other hand, would be ashamed to pretend that multiple items of (for the most of them, perfectly) IDENTICAL use would be needed in droves, so they do it all the like, but with pretended DIFFERING use - technical stuff, books, and apps vs applications even are quite cheap, by comparison - let alone the Pradas -, at least before you go into the subscription trap...

It's exactly the same phenomenon though, just in its more manly, less pretty shaping, since we all know that with app x, y and z (and then, it all will begin again with abc, over and over), there will be NO thinking enhancement either...

Newbie alone, in yet another current thread, being excused for earnestly hoping for that. And, to answer the question in which category to file TheBrain, in again another thread: Into the Scambox, naturally! And yes, if it really helped, instead of just pretending while in fact slugging your thinking, it'd be worth it's annual subscription, per month... (But they know that, too, and wouldn't hesitate a sec to charge you accordingly if it worked.)

22111 2/28/2021 10:46 am

I forgot, in my post immediately above this one, and especially since you mentioned the path length problem: There is no dichotomy "ordering vs. searching", both should go hand in hand, ordering should FACILITATE searching, and then, searches are so much more simple and precise.

"Everything" differentiates between "in path" and "in filename" and "suffix": too complicated for me, I search mp4 by .mp4, wav by .wav, and so on. Music is m: and Jazz is m: \j\ and "sax or piano" (just an example) is m: \j\ \s\|\p\ (my EV preference is (default) OR over AND; you can switch that).

For guitar, I could also write, m:\j\g\ but to do the "tagging" by sub-paths allows for searching for these tags even when they do NOT follow precise other path sub-strings, e.g. k: is comedies, \c\ on drive k: is "other comedies", whilst \cc\ is criminal comedies, \cr\ is romcoms, etc., \a\ is anglo, \f\ is France, \i\ is Italy, and so on, then k: \i\ is ALL Italian comedies, whatever their subgenre - note that the real path could be k:\i\cr\... OR k:\cr\i\... or could contain even other sub-tagging, so "tagging by discrete subpaths" is much more powerful than just "search for path substrings" where "wrong order" would then make impossible the retrieval.

"Discrete" being the key-word here, but then, in totally different contexts (films vs. music here, on different drives anyway, i.e. k: or t: (thrillers) vs. m: (or o: for opera and j: for jazz, etc. in your possible, theoretical case), "identical" further sub-tags could be of very different meaning. To bring a real-life example, "\j\" in film is "Japan", whilst Japan in music, for me, is "\jap\", since, for me again, Japan in music is "exotic" whilst in film, it's an important "sub-category" - if I had enough jazz music to fill, say, 4 TB, i.a. a drive on its own, the "j" would be in j: anyway, and I could and would rename the \jap\ to \j\ - "simplify your life" > "simplify your retrieval, by smart tagging".

In jazz again - I bring all those examples for hinting at the general basics which also apply to work environments -, \s\ and \p\ and \t\ (for trumpet there) are the CD's main classifications, and if some ripped \p\ CD (track) is also important for \s\ for example, I put a space£s into the album or file name, more precise searches would then be m: \j\ \s\ | £s i.e. m: \j\ (\s\ | £s).

Ditto for films, think of "black comedies" (if you don't want to create their own sub-genre), or of brilliant screenplays (space£sFamilyname), of outstanding photography (space£pFamilyname), or just of the most beautiful endings, so in "La dolce vita", there is a space£e) - the trick being, once you use such "codes" on a regular basis, you will not have to look them up anymore.

EV also lists any "finds" on currently non-connected drives, which is extremely helpful, but which is a problem for backups whenever you don't search for a tag (and where you would also put in the drive (k: or whatever), but for some "real text string" - name, whatever - e.g. in the fullpaths / file names, since most of the time, you would not want to have "de-doubled" all entries, e.g. the very last part of your path would be the artist name in the form (e.g.) \Nyro, Laura\ (the "\\" obviously not being necessary for retrieval here), and then, EV would list all her tracks on m: AND on the n: backup if you don't also enter the "m:" every and every time...

Similar for all those "\\" WHERE they are necessary for discrete searching, and then, entering those multiple "\\" in "\1-or-2-chars\" again and again is no fun either, and thus, you need just a little macro, with an inputbox to enter some "simplified string", and after pressing return, it would work on your string and amend it in the way that then EV does the "correct" search:

If you enter k: it would automatically add !l: (your backup drive for k:, ditto for t: and !u:, for m: and !n:, etc.: for "k:" in the string, add " !l:" to the string, etc), and for every 1- or 2-char string, it would put the necessary "\" before and after (replace every "[a-zA-Z]{1,2}" by "\$\") - note the regex is in the macro, not in the EV search-string, but you can also, for other use cases, combine regular and regex search within the latter.

And then, any info already comprised, in whatever form, even by just a discrete \1-char\, within the path, will not need to be repeated within further-down folder, or then the file names, even though, in the above music examples (i.e. album names), it sometimes will ("An Evening with ..." or "Rachelle Ferrell Live in Montreux"), even when the sub-folder higher-up (i.e. on the "artist" level) already says "Ferrell, Rachelle", on the other hand, with the encore-higher-up folder naming "\j\v\w\b" instead of of "\Jazz\Voice\Women\Black\", it's again 15 chars less than "written out", and that also and foremost applies to retrieval...

(And yes, that album is a very special one... and no, the "Black" is for the very different vocal character - it's almost "another instrument" in most cases -, and not for (easily alleged here) "racial" reasons, and then, in jazz and other (non-classical) music, around 3/4 quarters of my (now, ripped) CDs, voice and other instruments combined, are "black", so...)

22111 2/28/2021 10:49 am

Sorry, that would have been \$, not $.

22111 3/2/2021 8:46 am

My remarks above also show the difficulties in switching from a db data set (physical CD collection, etc. where not needed db fields in some subcategories are simply filled with the default "null" values and don't disturb) to a file-and-folder system (NTFS) where similar info (e.g. the "country" one), in different "sub-trees", either finds itself on different "indentation" levels of the subfolders, or you would have to insert ugly, empty, intermediate "0" folders in-between - what nobody does to my knowledge -, and even then, the relevant subfolder DATA (i.e. further subfolders (here: A, E, F... for "country" but which would be the name of their PARENT folders, or then the files) would not be "identified" by their names (A, E, F...), but only (and to some degree only) in combination with their (common) direct parent folder ("C" or "Country" in this example); these are problems unknown to DBs...

You could try some redundancy, by repeating the parent-folder name (the parent-folder itself being needed to contain the subfolder in question AND its "siblings") before the subfolder-name, e.g. the "country subfolders" would then be named CA, CE, CF instead of just A, E and F, or you would have to fiddle with \C\A\, \C\E\ and \C\F\; in any case, strict folder naming conditions would be needed in order to ensure "discretion".

In other words, whilst my CD and DVD DBs for the physical things are unambiguous, my corresponding HDD collections seem "orderly" but in fact their current organization is not optimized since NTFS simply does not offer the necessary elements to do that.

Which brought me some other ideas but without a straight "heureka!" effect yet - btw, for servers and the like, MS has created a new file system, ReFS, but from what I have read about it, it has NOT been amended, compared to NTFS, in all their (now common) "it's not more db-like" failings; in some other aspects, it's even worse than NTFS in some respects, depending on your POV...

For the time being and unless I get further with one of my new ideas on the matter, I suppose that the above-described repetition of the parent-folder name in the sub-category folder name, with (e.g.) \c\ca\ instead of \c\a\ (but also instead of a technically possible just \ca\ without the \c\ "above" it!) seems to be a viable solution since far more extended systems than just a CD collection, the explicit naming possibilities would be much larger.

The inherent fail of NTFS, etc. being that any hierarchical position isn't determined but by counting its position within the path, and any non-used position remains visible all the time if not actively suppressed for display - let alone the lack of any file identifier except for the - current, possibly ever-changing! - full path of the file...

(And so, yes, with lots of "\"-countings, again and again, things would be possible, but after all, there ARE attributes within the MFT, for every file, every folder, so why not also a 9- or 12-digit ID? But then, we all know Bill has got other hobbies, some even pretend vaccinating 8 billion down to just 1 or less being among them - which is pure slander of course...)

22111 3/2/2021 11:44 am

I hadn't thought about the fact that db records are in reference to the db they're in, whilst folders and files are considered "floating", between several computers, of the same person, among persons / departments in a corporation, even among persons and institutions world-wide. Okay, that complicates things a little bit.

There could be some two-or-three part ID (i.e. some "origin part" for the "creation device", a dot, numbering-up for that device, another dot, then, if necessary only, the original string from the file-from-the-outside, i.e. the string - NOT a number, but alphanumeric and obviously of variable (composite) lengths - would be appended upon import onto the device (just as a - simpler - string would be created upon any file (and folder) creation on the device; files from the "corporation" would be considered "semi-internal", a new (i.e. accordingly appended) string would only be assigned when the file is then modified in any way (automatic versioning, too), etc.

These are just some ideas of mine in less than 10 minutes after my realization my idea above (9- or 12-digit numbers) was not practicable except for private use, and such a concept could easily be refined to make it perfectly usable.

And yes, this would blow up the MFT just a little bit, and so what for modern systems (NTFS is from 1993 where the technical environment was totally different), and yes, an enhanced NTFS could manage both situations in parallel, assigning ID strings to new files and even existing ones in the system / corporation (of internal oder external origin), AND even to any "new" file coming "from the outside" and not bearing such an ID string already: It would process any such external string according to definite rules, and also assign/append further ID strings to external files created without.

At least WITHIN the organization in question, independently of its (current or future) size (from 1-person upward to 7-or-more-digit "seat" numbers), any folder and file would then be clearly identified.

Btw, such systems will come anyway, tax and other public administrations will make sure for it quite soon, and they will probably try to make those IDs discrete worldwide - which for file management reasons - the interests of physical or legal persons - obviously (and as hinted at supra) is NOT necessary.

As with everything, it's all in the conception; the realization will then follow, forced upon in the end by THIRD-party interests...