`find' has been the mainstay of file and directory maintenance for many years; it is an extremely flexible and powerful program. However, it has two main limitations. First, it is not at all extensible; there is no straightforward way to add facilities to it. Second, it has an arcane and bewildering command-line syntax. For example, it has different command-line switches for tests of file access times in minutes and days; file properties are specified in commands using an odd selection of single-letter formatting codes (e.g., %f for filename, %F for filesystem type). No utility with the power of `find' is going to be trivial to use; however `find' is arcane even for a Unix file management utility, which is quite a claim.
KFind fixes both these problems. Its tests and actions are specified in programming-lanuage statements, using variables with sensible names (like `size' for the file size and `filename' for the name). The language itself is `SLang', an interpreted language with similar power to `C'. Second, all but its most fundamental operations are written in the same programming language, so it can be extended very easily. One reason for extending KFind is to allow it to select files for processing based on their contents. Examples are supplied showing how to do this.
A KFind command has the following basic structure:
kfind [files,directories] --test '[test_expression]'
--action '[action_statements]'
The `test_expression' is something which evalues to an integer, where `0' is considered `false' and anything else `true'. Possible expressions include:
--test 'size > 1000'(size of file is greater than 1000 bytes), and
--test 'fname("*.cpp")'
(filename matches the pattern `*.cpp').
Possible actions include:
--action rmremove (delete) the file or directory, and
--action 'system("play " + filename)'
(ask the operating system to execute the command `play [filename]' where `filename' is the current file (or directory) name. Note the `filename' is a variable which is set to the value of the file currently being examined. A full list of variables is given later in this document.
Note that the single-quote characters around the test expression and the action have nothing to do with KFind syntax; they are to prevent the Unix shell being confused by the brackets and double-quotes in the expressions. There are various other ways to achieve this effect, including escaping (preceeding by a backslash) the difficult characters, but the single-quotes seem to work best in practice. Of course, if you want to a single-quote in your expressions you will need to consider the use of escape characters. You will need to consult your shell (sh, or bash, or whatever) documentation for more details.
message(size)
because `size' is an integer, not a string. There is little point carrying on to the next file as the error would simply occur again. It is recommended that --stoponerror be used until the logic of an expression is debugged.
test_expression can be any legal KFind statement, but it must evaluate to an integer where `0' means false and anything else means true. A return of true is taken to mean that the file matched some criteria. KFind's programming language is a superset of SLang, so any valid SLang statements can be used, as well as KFind extensions.
In this section, and the section on functions that follows, the data type of each variable is shown before it. For example `string filename' means that the variable `filename' is of type `string'. SLang strings (unlike `C' strings) are dynamically resizable, so there is no need to worry about their size.
string filename
The full pathname of the file selected, relative to the directory that was specified on the command line, or the current directory if no directory was given. For example, if I invoke kfind like this:
kfind /etc
filename will contain absolute pathnames, like /etc/conf.modules, etc. If I invoke it like this:
kfind .
filename will begin with the directory indicator `.'. See also `name' and `pathname'.
string name
The short filename, i.e., the terminal part of the filename, not including any directory information. See also `filename' and `pathname'.
string pathname
The full, absolute pathname of the current file. This pathname will always start with '/'
long size
The size of the file in bytes
int mode
The file mode, e.g., the mode flags obtained from the directory or i-node. This
field should _not_ normally be used to determine the
int uid
The (numeric) used ID of the file.
int gid
The (numeric) group ID of the file.
int nlink
The number of links to the file (normally 1)
long access
The time of last access of the file, in Unix format (seconds since midnight Jan 1 1970)
long modify
The time of last modification of the file, in Unix format (seconds since midnight Jan 1 1970)
long create
The time of creation of the file, in Unix format (seconds since midnight Jan 1 1970)
int totalFiles
The running totals of files and directories examined. This variable only assumes its final value when all processing is complete. To get a final count of files, print the value of totalFiles in a --postaction expression, like this:
--postaction 'message(string(totalFiles)'
int matchedFiles
The running totals of files and directories that matched the --test expression. This variable only assumes its final value when all processing is complete. To get a final count of files, print the value of totalFiles in a --postaction expression
Note that SLang syntax allows zero-argument functions to be written without terminating parentheses, so we can write, for example `dir()' or `dir'.
string dir()
returns true if the current selection is a directory
string reg()
returns true if the current selection is a regular file
string blk()
returns true if the current selection is a block device
string chr()
returns true if the current selection is a character device
string lnk()
returns true if the current selection is a symbolic link. Note that this function will never return true if the --follow command line switch is enabled, as the end of the link will be tested, not the link itself
string fifo()
returns true if the current selection is a FIFO pipe
int fnmatch(string text, string pattern)
returns true if the pattern matches the string, using filename matching convention. In this convention, `*' matches any sequence of characters and `?; matches a single character. This is the same convention that the Unix shell typically uses. A much more powerful matching mechansim -- POSIX regular expression matching -- is employed by the match() function. The match is case-sensitive.
int ifnmatch(string integer, string pattern)
Like fnmatch, but case-insensitive
int fname(string pattern)
A shortened form of fnmatch, defined for convenience. It tests the current value of `name' (the name part of the filename) against pattern, so it is equivalent to fnmatch(name, pattern). This function exists simply to make the command line shorter. The match is case-sensitive.
int ifname(string pattern)
Like fname, but case-insensitive
int match(string text, string pattern)
Performs a POSIX-compliant regular expression match of `pattern' on `text'. The full regular expression syntax is supported, including branches. For example, the patter 'Bach|Chopin|Handel' will match text containing the text `Bach' or `Chopin' or `Handel' (case sensitive; use imatch() for a case-insensitive version).
int imatch(string text, string pattern)
As `match()' but case-insensitive
string gidToGroup(int groupID)
Converts the specified group ID into a group name, if possible. If not possible, returns the string `unknown'.
string uidToUser(int userID)
Converts the specified user ID into a user name, if possible. If not possible, returns the string `unknown'.
string owner()
The user name of the owner of the file
string ownergroup()
The group name (string) of the group of the file
string fileRoot(string filename)
Returns the root part (that is everything except the final file extension) of the specified filename.
string nameRoot()
Returns the root part of the current short filename
string filenameRoot()
Returns the root part of the current full filename
void mv(string newName)
Renames (`moves') the current file or directory. The function `rename (oldName, newName)' can be used to rename a file other than the current file.
void rm()
Deletes the current file or directory. Any other file or directory can be deleted using the `remove (name)' or `rmdir(name)' functions respectively. Note that rm() will not prompt for user intervention before it deletes. You have been warned. For safety, rm() will not delete a directory containing files, it will only delete an empty directory.
void cp(string newName)
Copies the current file to a file with the specified name. The function `copy (oldName, newName)' can be used to copy files other than the current file. Note that cp() will not copy directories or devices, only files.
Access rights
KFind does not attempt to expand directories to which the user does not have acess rights. These errors are reported, but they do not stop the search, even if the --stoponerror option is specified. KFind does not have to be able to read a file to get its basic properties (size, owner, group, etc). However, it does have to be able to read it to get any other information. Specifically tests on the file's contents will fail with an error message if the file can't be read
Note on variables and functions
When a function has no arguments, e.g., size(), then the distinction between files and directories is somewhat technical, as both can be written the same way. SLang allows zero-argument functions to be written without the empty brackets that are essential in Java and C/C++. Thus, even though `dir' is a function and `size' a variable, I can say
kfind --test 'size > 1000'
or
kfind --test 'dir'
as if they were both variables. There is no need to place open/close parentheses for zero-argument functions.
Case-sensitivity
The functions `match' and `fnmatch' are case-sensitive There are case-insensitive versions `imatch' and 'ifnmatch' if that's what you need.
A design goal has been to make KFind operate at as near as possible to the same speed as GNU find. In practice, KFind is usually slightly slower than GNU find, because each test requires a call to the SLang interpreter, which has a certain overhead. In practice, most things that both KFind and GNU find can do are done about 10 percent faster by GNU find. Of course, there are many things that KFind can do that GNU find can't, so no comparison of speed is possible there.
There are some instances where the way in which a query is phrased can have a significant effect on performance. For example, consider testing whether some variable (call it X) contains the text 'abc' or the text 'def'. We could phrase this in two ways:
--test 'match(X, "abc") or match(X, "def")'
or
--test 'match(X, "abc|def")'
In practice the second query will execute much more quickly. There are two reasons for this.
First, the SLang run-time engine will always execute both parts of an `or' relationship, even if the first is true. This is different to some other programming languages (notably Java), where if the first branch of an `or' operation is true, there is no logical reason to execute the second. Of course, there may be good reason to execute both branches, as they may have side-effects on other parts of the program. SLang (and therefore KFind) allow the user to control the method in which `or' and `and' branches are executed. The standard `or' and `and' binary operators provide full evaluation. Short-circuit evaluation is provided by the `orelse' and `andelse' structures. The first --test line could be re-written as follows, to avoid an unnecessary `match' call:
--test 'orelse{match(X, "abc")}{match(X, "def")}'
Note the `orelse' is not an infix operator; its syntax is follows:
orelse {test1}{test2}...
The second reason that the second --test line will execute more quickly than the first is that the second allows cacheing of the regular expression `abc|def'. The regular expression test must parse the regular expression supplied, and it can only do that at run-time. Therefore we want to avoid parsing the same expression for each new file. KFind has a simple cacheing mechanism: it caches each regular expression, and does not re-parse if the next expression is the same. That is, there is capacity for one regular expression in its cache (of course, a deeper cache could be employed, but this would introduce its own overheads). The worst-case situation is to have two regular expressions (as in the first --test example) and parse each one alternately.
Here is another example. Suppose I wish to find all RPM package files on my computer which are compiled for the i386 processor architecture (RPM files are handled by the add-on module `rpm.kfind'). I could execute the following --test:
--test 'rpmPlatform=="i386"'
This would give correct results, even though many files will not be RPM files, and will not give a useful value for `rpmPlatform' (it will return an empty string). However, to find the value of rpmPlatform requires that KFind open the file and inspect its header. This is not a long job, but it could add up to a long time over a large volume of files. If we assume that all RPM files have names that end in `.rpm', then we can re-write the expression
--test 'andelse{fname("*.rpm")}{rpmPlatform=="i386"}'
Now KFind will only test the value of rpmPlatform if the filename ends in .rpm, and the file does not need to be opened to see if this is the case.
KFind is designed to be extensible, and it is relatively easy to add processing modules for new file types. These modules are written in SLang (see the file `rpm.kfind' for an example of an extension module), and must be compiled each time the KFind program is executed. On most systems KFind's start-up time is negligible but, if a large number of extension modules were included, it could become a problem. The KFind system configuration file `main.kfind' (which provides the basic functions) and all the extension modules can, if required, be pre-compiled and stored in compiled format. This will significantly reduce the start-up time. KFind itself can be used to do this, but if you're ready for that step then you're ready to figure out how yourself. See the SLang reference manual for more details. KFind configuration files are not distributed in pre-compiled format because ease of tweaking was considered to be more important than start-up time.
Search depth
The --maxdepth argument specifies the depth of expansion of subdirectories, with one expansion being `1', and so on. In GNU find one expansion is specified as a depth of `0'. So if I wanted to search the current directry, but none of its subdirectories, I would use --maxdepth=1. Setting maxdepth to zero guarantees that no directory expansion will happen. This can be useful if you want to test directories without expanding them at all.
Search order
By default, KSearch expands directories before processing the directories themselves. This is the opposite order to GNU find. This behaviour can be over-rideden by the --dirfirst switch.
Minimum depth
KFind has no option to limit searches to _greater_ than a certain depth of expansion. This is because there are only so many hours in the day and I couldn't think of a reason to use this. But it could easily be added if anyone wanted it.
1. Show all files that have been modified in the last 5 days. Note that `modify' and `now' are both given in seconds, so we need to convert 5 days to seconds
kfind / --test 'modify > now - 5 * 24 * 60 * 60'
2. Show all files in the current directory and its subdirectories whose names end in `.o'
kfind . --test 'fname("*.o")'
3. Display the full pathname and size of all files in the current directory and its subdirectories whose names end in mp3. Note that the `string' function converts the integer `size' to a string.
kfind . --test 'fname("*.mp3")' --action 'message(filename + " " + string(size))'
4. Display the `package name' for all files that can be interpreted as RPM files on the CD-ROM drive. Note that `rpmPackage' will produce an empty string if the file is not an RPM file, but in this case the test --test '"rpm"' prevents any non-RPM files being reported.
kfind /mnt/cdrom --test '"rpm"' --follow --action 'message(rpmPackage)' --print
5. Play all MP3 audio files (using the `splay' program) in the /media directory and its subdirectories, which have the text Chopin or Bach in their title fields
kfind /media --test 'match("Chopin|Bach")' --action 'system("splay " + filename)'
6. Find all `core' files and, if the user confirms, delete them.
kfind / --test 'fname("core")'
--action 'if (confirm("Delete file")) rm' --print
The --print option is useful here to ensure that the name of the file that is about to be deleted is shown to the user. The function `rm' deletes the current file. When executed, user will be presented with a message like this:
/home/fred/myprogs/core Delete file ([y]/n) ?
Note that, as it is currently defined, `confirm' defaults to `yes'.
7. Make a copy of all files in the current directory, adding `.bak' to the filename. But don't copy files that already end in `.bak'.
./kfind . --test '(not fname("*.bak"))' --action 'cp(filename + ".bak")'
/usr/lib/kfind/*.kfind KFind initialization files
Permission is granted to any individual or institution to use, copy, or redistribute this software so long as all of the original files are included, that it is not sold for profit, and that this copyright notice is retained.
LIKE MOST FREE SOFTWARE, KFIND IS PROVIDED `AS IS' AND COMES WITH NO WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED. IN NO EVENT WILL THE COPYRIGHT HOLDER BE LIABLE FOR ANY DAMAGES RESULTING FROM THE USE OF THIS SOFTWARE.
Please send bug reports and comments by email to: k.boone@kzone.eu.org. For bug reports, please include the version of KFind (see kfind -v ), the machine and operating system in use, and as much additional information as possible.