Advice on research software

Started by Wayne K on 8/16/2013
Eduardo Mauro 8/19/2013 11:04 am
Should be released within a month.
Bernhard 8/19/2013 11:57 am


Eduardo Mauro wrote:
Should be released within a month.

Good news, thanks!

22111 8/20/2013 6:57 pm
I missed the need for quantifying, and this is indeed a very important aspect of the overall task.

I did not try with AS but it seems to me that there, some quantifying at least could be realized, within reports, by adding fields within the reports where other fields or "search hits" are summed up or even are just counted - unfortunately, the help section of AS is rather terrible, and as said, the - very instructive - forum help is gone together with the forum. Also, all this quantifying, especially in combinations = creating tables would need some scripting or use of a relational database?

Anyway, the description of CT6 seems very promising, all the more so since the "source identification data" is there (within the link info). So perhaps the developer will make it available, too, in some way, for example by toggle, instead of the link before each clip.

Of course, this will not resolve the quantification needs problem, so the following hint would apply to AS or CT6 or any other such program.

Consider having two softwares running at the same time, on two screens, or side by side on a large screen. Have AS, CT6 or such in order to edit your data, including formatting, coding, gathering clips in new collections, etc.

Then for any quantifying analysis, export your data into a plain .txt file (1 former paragraph here 1 very long line, but not necessarily), and work on it by an editor like KEdit (look for more details about KEdit within the donationcoder.com site). Especially for this type of analysis "how many #ac in combination with #pt but without #td", KEdit is one of the best tools there is, since you can go back and forth, meaning you join other codes, you withdraw some, etc., etc. - you just have to write your criteria down on paper in order to not get lost within your "more" and "less" choices, and at every moment, you will know how many such "paragraphs" = lines will meet your criteria combination.

As said, there is nothing that even comes near it for this work, except of course, I assume, specialized software; and of course, all these sub-groups of your data can be stored distinctly.

Of course, the deploy of this tool supposes preselections made within another tool, AS / CT6, that are able to put "source identification data" before each such paragraph / line.

Or, if you can live without wysiwyg formatting (you can toggle the long lines to paragraphs and back, in order to edit vs. analyze), have a very simple macro running that puts the "source data" from line 1 of any "package" before any of the following lines, then deletes that line: I suppose the records / "packages" of your original data are distinguished by a special character, or by two blank lines, paragraphs being separated by just one blank line, or similar. In such an environment, your original data could be "prepared for KEdit" in some 10 minutes.

KEdit is certainly worth a very thorough look (and also, they have plenty of help material for these special commands which make it unique), be it as an add-on to AS or CT6, or as your only tool from now on.

As soon as you got your "source data" into each respective line, it comes quite near what to expect from a database, except for sparing you all the programming / tweaking that would be necessary there.

Bernhard 8/21/2013 7:39 am


22111 wrote:
Then for any quantifying analysis, export your data into a plain .txt
file (1 former paragraph here 1 very long line, but not necessarily),
and work on it by an editor like KEdit (look for more details about
KEdit within the donationcoder.com site). Especially for this type of
analysis "how many #ac in combination with #pt but without #td", KEdit
is one of the best tools there is, since you can go back and forth,
meaning you join other codes, you withdraw some, etc., etc. - you just
have to write your criteria down on paper in order to not get lost
within your "more" and "less" choices, and at every moment, you will
know how many such "paragraphs" = lines will meet your criteria
combination.




KEdit is certainly worth a very thorough look (and also, they have
plenty of help material for these special commands which make it
unique), be it as an add-on to AS or CT6, or as your only tool from now
on.


But you should be aware that KEdit is costly and on the road to abandonware (http://www.kedit.com/whatsnew.html Eventually, THE The Hessling Editor (http://hessling-editor.sourceforge.net/ colud be a freeware alternative.

22111 8/21/2013 11:35 am
I should perhaps have added
- that these special abilities are not proper to KEdit, but to so-called "Eastern text editors"
- that there is a (an original?) free editor, XEdit
- that the others, KEdit, and THE you mentioned (The Hessling Editor), are probably derivatives of that XEdit

But
- I don't know those but by name, and it's supposed there are some differences, though
- for KEdit there is ample material explaining its special functionality
- then, perhaps you could use that KEdit material in order to understand the respective functionality in XEdit and / or THE
- you can (perhaps not legally, but in practice) make ample use of KEdit since trial version just does not allow saving big files, but we are speaking of plain text files here...
- if you really use it on a "professional" basis, you might be delighted by it so much that you end up paying the 149 dollars
- which would never occur if you worked with the above alternatives, so technically, you are not bound to pay for KEdit, but you might pay out of respect for this fine program

Having said this, I think I should add that I just "played around" with KEdit, needing standardized procedures for which I do the scripting once, then have the script do, on different data, exactly the same manipulations.

But your task is very different from mine, you have to analyse data in a way you do not know which way beforehand, meaning you will see, within your analysing process, how to refine (or even switch) your further analysis, meaning your intermediate results will decide upon your further needs at any given moment, and for such "plastic", not standardized, analysis, I never encountered anything better than KEdit.

Or any of those free alternatives mentioned above, provided they offer exactly the same functionality as KEdit in this respect, or even more.

Anyway, I'm afraid, every such solution will share the same problem as explained above: For "a = 1 or 2 or 3", you will have to do it the way "a = 1 or a = 2 or a = 3", neither "a in the range between 1 and 3" nor "a > 0 and < 4", which is, of course, a big problem if the range is not "1 to 3" but "50 to 200".

That's why, in the end, for such tasks, you even could be in need of a third tool, for example Excel (and which perhaps better explains the prices of dedicated software). But then, for some intermediate needs, you could both try tricks like
- macros that will enter "search" strings like "a=1 or a=2...", and not usings 1, 2, 3 but starting with 10 (or have 01,02,03...): this way you could easily identify ranges of ten
- using the regular expression capabilities of KEdit (or perhaps its free competitors; AS for example does not have such); if you are willing to delve into regex, you will even be able to replace any missing "a in the range from x to y" capability by the corresponding a bigger than regex "a = [..." and the corresponding lesser regex "a=[...", and this even for decimal numbers

Of course, the alternatives might do this as well, and this way, 1 hour of searching for the correct regex "codings" will perhaps spare you perhaps days of manual work.

22111 8/21/2013 11:39 am
Sorry, I did not edit the above enough. As you can see from the last paragraph there, you will probably NOT need any export to Excel then, or just for more sophisticad analysis; number range tasks can decidedly be executed by regex within KEdit alone, but not in AS, for example, just bear in mind that for every "search" "a in the range of x to y", you need TWO regex "searches", combined with "AND".

22111 8/21/2013 11:46 am
And sorry again, KEdit would be 129 dollars, not 149 dollars, if ever you decide to pay for this. Of course, a price of 99 dollars would probably double sales, if you put my error into perspective (I ownly know because I just checked if perhaps they lessened the price "from 149 dollars", but then remembered it always was 129 dollars, so if they rose it up to 149, it would not harm sales further, I suppose).

22111 8/22/2013 9:14 am
Regex (correction of the above)

It is obvious regex (as in KEdit), for numeric ranges, is extremely helpful (when in AS for example, only very short ranges could be replaced by a bunch of "AND" "searches"), but my description above was wrong. I try again:

If you do not have integers but numbers with decimal point/comma, you will have regex only consider the part before the point/comma, in most cases.

1)
For most ranges, one single "search" would be sufficient:
range 100-600
search for 3 digits (not also 2 or 4) and beginning with 1-6

2)
range 70-120
here indeed, regex experts would be able to do it in just one "search", when beginners like we are would do it in two, combined with an "AND":
(search for 2 digits and beginning with 7-9) OR (search for 3 digits beginning with 1 and then second digit 0 or 1)
this will only cover numbers up to 119, so if you really need 120, too, in this example, you have to combine a third search, for 120, with a second "AND"

3)
In my example of the post above, I wanted to express a different approach but got it totally wrong. You would search for a range by excluding (!) both the lesser numers, and the numbers too big for the range, but this would be done by a combination of two, three, four or even more searches:

(I mix up () and [] for better reading)

range 110-600
[3 digits] AND [ (first=2-5) OR (first=1 AND NOT second=0) OR (first=6 AND second and third =0 each) ]

or a last example, 110-1100
( [3 digits] AND [ (first=2-5) OR (first=1 AND NOT second=0) ] ) OR ( [4 digits] AND [ (second=0) OR (second=1 AND third = 0 AND fourth=0) ] )

So such Boolean / regex constructions are not beautiful and are easily gotten wrong (and without your knowing, then delivering wrong results), but at least, it is possible to "search for" numeric ranges (in fact, to select lines with values within those ranges) this way if your tool both provides Boolean search (or, as in KEdit, cascading searches/selections) and regex.

In most cases, your ranges will not be totally aleatoric, but will be some standard ranges again and again, and for such purposes, you will have to construct your search once, and then put it into your tool by a macro, for any further use.

So this is a viable solution for many such analysing tasks if you don't have dedicated software at your disposition.

22111 8/22/2013 9:18 am
Sorry again, even my very first example here is wrong since the range would be not -600 but -699, but my examples further down explain how to do it right. You see here that almost never just one regex will suffice, you will need Boolean combinations almost everywhere... (As said, regex experts can combine those within a single regex combination, but why not combining several regex combinations and have the advantage that they are give correct results, combined, even for a non-expert?)

22111 8/22/2013 9:20 am
(I meant "regex expressions", of course.)
22111 8/22/2013 9:37 am
I should add some important info here with respect to desktop search "engines". The only one I know of these and which is able to search different files for numeric ranges, is dtSearch, very good in every respect but 3 or 4 times the price of its competitors.

If you don't need this functionality on a regular basis, some of the competitors allow regex, and this way, you will be able to select files by numeric ranges they contain, and this means you could use "virtual fields" in .txt or .rtf files and such: "#a167" here, "#b236" there, and to be found by regex searches in the like of "b200-300".

This might be an important factor in deciding if you divide your data into records in a database, or if you leave your "records" as multiple, distinct files.

22111 8/26/2013 4:43 pm
Both people interested in KEdit/"text processing" of data, and interested in the interest of "old" software could be interested in this link:

http://takimag.com/article/spare_a_thought_for_the_late_abandoners_john_derbyshire/print#axzz2d5wRbPLC

Then, there is the Manfred Kühn blog article:

http://takingnotenow.blogspot.be/2013/04/mcphees-workflow.html

Kühn cites McPhee with a rather long passage there, and soon enough, I understood why he did this: The original New Yorker article is paywalled.

But you will perhaps be pleased to know that with a short passage from the above citation, another, presumably illegitimate copy of the article, albeit without some maps and charts, can be found, at least I found it with google, by accident, having searched for "citing its exceptional capabilities in sorting" (with the quotes); KEdit info can then be found in the second half of the McPhee article.

As it stands, KEdit, as is askSam, is some of those "late abandoners" softwares that do much more than similarly-priced current softwares do.

If the original poster has found a way to process his data, I would be delighted to get some info about his new workflow.