file searcher with user-defined metadata columns (Windows)
< Next Topic | Back to topic list | Previous Topic >
Posted by Pierre Paul Landry
Jan 24, 2021 at 03:01 PM
jimspoon wrote:
> With Infoqube I can create as many data fields as I want, and I can easily associate any single file with a single IQ item. I can assign any field:value combinations I want to each file. But the linking of files with Infoqube has to be done manually - and there is no way to know which files already have a corresponding Infoqube item, and which do not.
Indeed. As a minor improvement, in v117Pre3, a warning is shown when an item already points to that file/folder/URL and you’ll be able to link to the existing item instead of creating a new link.
I already have working code to monitor NTFS folders and and/remove items to sync with the file system, but never got around to implementing it in IQ.
If there is a demand, I may spend the few hours needed to do it :-)
Pierre
Posted by jimspoon
Jan 24, 2021 at 06:41 PM
That would be great Pierre !! I think lots of people - both IQ users and non-IQ users - would be interested in that. For example the people who have taken an interest in the File Meta and Custom Windows Properties projects.
Posted by Alexander Deliyannis
Jan 25, 2021 at 09:00 AM
Pierre Paul Landry wrote:
>I already have working code to monitor NTFS folders and and/remove items
>to sync with the file system, but never got around to implementing it in
>IQ.
>If there is a demand, I may spend the few hours needed to do it :-)
This sounds fantastic!
Posted by 22111
Jan 30, 2021 at 12:38 AM
Have you read this thread? They talked the EV developer into creating indexed csv metadata tables (it’s in the work); with that, EV will become even more valuable, and even today, it’s incredibly valuable.
https://www.voidtools.com/forum/viewtopic.php?t=7007
Do you speak of numeric values, for files? That’s not a common use case, but even today, it’s feasible to get filtered numerical retrieval, in EV. (sort being another thing, you could do it by scripting, i.e. regex search for the numbers in EV > export into an array > sorting over there - IF there is a way to do it all IN EV, please let me know.)
Regex is compacted logic so to speak, regex has no “AND” and doesn’t need it.
Eliminate “thousand dots/commata” (preserve decimal dots/commata, of course). Have a distinct code, e.g. for $e39826.34. [] are classes, with {min,max} occurrences (number of chars/digits).
You then want to search for files above 40 thousand; what does that mean?
- any 5 digits number (digits before possible decimals) with first digit min 4
- OR any min-6 digits number
Thus, you do an “OR” search for both possibilities, and if just one is a find, EV will list the file (no spaces in real code, dollar sign is “escaped”):
\$e ([4-9]{1}[0-9]{4}) | ([0-9]{6,})
You want under 100 thousand? This means max 5 digits, so you have to exclude 6 and more digits, by a negative look-ahead, the “+” meaning “at least one of”:
\$e [0-9]{,5}(?![0-9]+)
You want a range? I deliberately make it complicated (instead of e.g. 30-70,000) in order to state how to do it: 30,000 to 750,000 (in fact 749,999, the 750,000 would be another “OR”):
- 30,000 to 99,999 is 5-digits, first digit min 3
- 100,000 to 699,999 is 6-digits, first max 6
- 700,000 to 749,999 is 6-digits, first 7, second max 4
Thus, we have 3 “OR” groups (no spaces or line breaks in real code):
\$e ([3-9]{1}[0-9]{4}(?![0-9]+)) |
([1-6]{1}[0-9]{5}(?![0-9]+)) |
(7[0-4]{1}[0-9]{4}(?![0-9]+))
Once again, in the form it should be fed to EV: \$e([3-9]{1}[0-9]{4}(?![0-9]+))|([1-6]{1}[0-9]{5}(?![0-9]+))|(7[0-4]{1}[0-9]{4}(?![0-9]+))
It does NOT get any more complicated than that for any real-life use case, since for just need one “OR” group for any same-number digit range, plus one each for the “lesser-number digit” range below and above (in this example, for the 5-digit ranges at the ends of the 6-digit span).
It’s just the real string which looks horrible, with a little macro for entering the real numbers into the “forms” above, and then deleting the spaces and linefeeds, it should be easy.
The ADS “comments” in DO, x2 and XY (which is 32-bit anyway) are in binary format, i.e. not human-readable, not scriptable / readable / writable from external scripts, whilst ADS which you write by scripts, can be read / retrieved by scripts, and in EV, it’s on the ToDo list: https://www.voidtools.com/forum/viewtopic.php?t=7059 : “Indexing and maintaining this information would not be fast or efficient because the contents of named streams are scattered across the volume.” - ADS are additional files (big or tiny), whilst everything within the file name (incl. tags, in case) comes from the MFT and as such is automatically indexed in EV.
Without indexing of ADS, forget it, especially if you want to use such tags all over your folders, instead of just for one specific folder, that info has to be retrieved one-by-one, you can watch that in x2 e.g. - it’s just not functional. On the other hand, for additional file info which just must be at hand when the file itself is open, ADS is a very valuable means… except if you want to (also) store your files in the “cloud”, since most (all?) cloud file space providers just cut them from the file - boo!
Posted by jimspoon
Feb 12, 2021 at 12:50 AM
Hi 22111, thanks for your reply, I hadn’t just seen it until just now.
I hope that Everything will soon support metadata in CSV tables as the developer has said.
I use Everything all the time and have often used regular expressions to find files and it works great. As you say metadata fields could be placed in the filename and regular expressions could be used to find those files.
Filename and path length limitations might be a concern - although a max path length can be increased to 32K with a registry change, it still seems that that max file length is 255.
Sorting on values returned by a regex expression is not yet enabled in Everything but is on the to-do list. https://www.voidtools.com/forum/viewtopic.php?f=5&t=6656&p=20745&hilit=sort+by+regex#p20745
You’re right, an ADS is no different from the unnamed $DATA stream when it comes to where it is stored - depending on length it might be resident in the MFT or non-resident. If non-resident a search program would have to follow pointers to clusters and index the data found there.
If tags could be placed in a “file attribute” (unfortunately named because easily confused with the read-only, hidden, system, archive attributes etc.) that would always be resident in MFT file record segments, perhaps Everything could index them easily. But as far as metadata is concerned Microsoft definitely seems to have settled on the Windows Property System in which properties are stored in the “unnamed” (primary) file data stream, and extracted using “Property Handlers”, indexed by Windows Search. Unfortunately the Windows Property System doesn’t give USERS the opportunity to define their own metadata fields and values - at least I haven’t found it - this is up to app developers.
https://docs.microsoft.com/en-us/windows/win32/properties/windows-properties-system
https://docs.microsoft.com/en-us/windows/win32/properties/building-property-handlers-properties
22111 wrote:
Have you read this thread? They talked the EV developer into creating
>indexed csv metadata tables (it’s in the work); with that, EV will
>become even more valuable, and even today, it’s incredibly valuable.
>
>https://www.voidtools.com/forum/viewtopic.php?t=7007
>
>Do you speak of numeric values, for files? That’s not a common use case,
>but even today, it’s feasible to get filtered numerical retrieval, in
>EV. (sort being another thing, you could do it by scripting, i.e. regex
>search for the numbers in EV > export into an array > sorting over there
>- IF there is a way to do it all IN EV, please let me know.)
>
>Regex is compacted logic so to speak, regex has no “AND” and doesn’t
>need it.
>
>Eliminate “thousand dots/commata” (preserve decimal dots/commata, of
>course). Have a distinct code, e.g. for $e39826.34. [] are classes, with
>{min,max} occurrences (number of chars/digits).
>
>You then want to search for files above 40 thousand; what does that
>mean?
>- any 5 digits number (digits before possible decimals) with first digit
>min 4
>- OR any min-6 digits number
>Thus, you do an “OR” search for both possibilities, and if just one is a
>find, EV will list the file (no spaces in real code, dollar sign is
>“escaped”):
>\$e ([4-9]{1}[0-9]{4}) | ([0-9]{6,})
>
>You want under 100 thousand? This means max 5 digits, so you have to
>exclude 6 and more digits, by a negative look-ahead, the “+” meaning “at
>least one of”:
>\$e [0-9]{,5}(?![0-9]+)
>
>You want a range? I deliberately make it complicated (instead of e.g.
>30-70,000) in order to state how to do it: 30,000 to 750,000 (in fact
>749,999, the 750,000 would be another “OR”):
>- 30,000 to 99,999 is 5-digits, first digit min 3
>- 100,000 to 699,999 is 6-digits, first max 6
>- 700,000 to 749,999 is 6-digits, first 7, second max 4
>Thus, we have 3 “OR” groups (no spaces or line breaks in real code):
>\$e ([3-9]{1}[0-9]{4}(?![0-9]+)) |
>([1-6]{1}[0-9]{5}(?![0-9]+)) |
>(7[0-4]{1}[0-9]{4}(?![0-9]+))
>
>Once again, in the form it should be fed to EV:
>\$e([3-9]{1}[0-9]{4}(?![0-9]+))|([1-6]{1}[0-9]{5}(?![0-9]+))|(7[0-4]{1}[0-9]{4}(?![0-9]+))
>
>It does NOT get any more complicated than that for any real-life use
>case, since for just need one “OR” group for any same-number digit
>range, plus one each for the “lesser-number digit” range below and above
>(in this example, for the 5-digit ranges at the ends of the 6-digit
>span).
>
>It’s just the real string which looks horrible, with a little macro for
>entering the real numbers into the “forms” above, and then deleting the
>spaces and linefeeds, it should be easy.
>
>The ADS “comments” in DO, x2 and XY (which is 32-bit anyway) are in
>binary format, i.e. not human-readable, not scriptable / readable /
>writable from external scripts, whilst ADS which you write by scripts,
>can be read / retrieved by scripts, and in EV, it’s on the ToDo list:
>https://www.voidtools.com/forum/viewtopic.php?t=7059 : “Indexing and
>maintaining this information would not be fast or efficient because the
>contents of named streams are scattered across the volume.” - ADS are
>additional files (big or tiny), whilst everything within the file name
>(incl. tags, in case) comes from the MFT and as such is automatically
>indexed in EV.
>
>Without indexing of ADS, forget it, especially if you want to use such
>tags all over your folders, instead of just for one specific folder,
>that info has to be retrieved one-by-one, you can watch that in x2 e.g.
>- it’s just not functional. On the other hand, for additional file info
>which just must be at hand when the file itself is open, ADS is a very
>valuable means… except if you want to (also) store your files in the
>“cloud”, since most (all?) cloud file space providers just cut them from
>the file - boo!