KFind

Section: User Commands (1)
Updated: April 2000
Index  

NAME

KFind - a file selection, searching and management tool  

SYNOPSIS

kfind {-adfpmtvx?} [paths]

 

WARNING AND DISCLAIMER

KFind is a powerful tool, and can be destructive if misused. It is intended to be used by experienced users and system managers, who will understand the implications of careless usage. Please be particularly cautious in the use of the copy, move and delete facilities. Please read the manual carefully before use. No responsibility is accepted for any loss or damage resulting from the use of this software. There is no warranty.

 

DESCRIPTION

KFind is a file searching, selection and management tool. It is intended to be an enhanced version of the standard Unix/GNU `find' utility.

`find' has been the mainstay of file and directory maintenance for many years; it is an extremely flexible and powerful program. However, it has two main limitations. First, it is not at all extensible; there is no straightforward way to add facilities to it. Second, it has an arcane and bewildering command-line syntax. For example, it has different command-line switches for tests of file access times in minutes and days; file properties are specified in commands using an odd selection of single-letter formatting codes (e.g., %f for filename, %F for filesystem type). No utility with the power of `find' is going to be trivial to use; however `find' is arcane even for a Unix file management utility, which is quite a claim.

KFind fixes both these problems. Its tests and actions are specified in programming-lanuage statements, using variables with sensible names (like `size' for the file size and `filename' for the name). The language itself is `SLang', an interpreted language with similar power to `C'. Second, all but its most fundamental operations are written in the same programming language, so it can be extended very easily. One reason for extending KFind is to allow it to select files for processing based on their contents. Examples are supplied showing how to do this.

 

OVERVIEW OF OPERATION

KFind searches a directory hierarchy, tests each file or directory against a specified expression and, if the expression is true, applies some action to the file. The expression can include tests of many attributes of the file, including its contents, and the action can be anything that can be expressed in KFind's built-in script language. Both the expression and the action are specified in an interpreted programming language called `SLang'. SLang has a syntax rather like JavaScript, but has the same basic facilities as `C'. KFind extends the basic SLang function set with a number of functions specific to file maintenance.

A KFind command has the following basic structure:

kfind [files,directories] --test '[test_expression]' 
                --action '[action_statements]'

The `test_expression' is something which evalues to an integer, where `0' is considered `false' and anything else `true'. Possible expressions include:

--test 'size > 1000' 
(size of file is greater than 1000 bytes), and

--test 'fname("*.cpp")'
(filename matches the pattern `*.cpp').

Possible actions include:

--action rm
remove (delete) the file or directory, and

--action 'system("play " + filename)'

(ask the operating system to execute the command `play [filename]' where `filename' is the current file (or directory) name. Note the `filename' is a variable which is set to the value of the file currently being examined. A full list of variables is given later in this document.

Note that the single-quote characters around the test expression and the action have nothing to do with KFind syntax; they are to prevent the Unix shell being confused by the brackets and double-quotes in the expressions. There are various other ways to achieve this effect, including escaping (preceeding by a backslash) the difficult characters, but the single-quotes seem to work best in practice. Of course, if you want to a single-quote in your expressions you will need to consider the use of escape characters. You will need to consult your shell (sh, or bash, or whatever) documentation for more details.

 

OPTIONS

-a action_expression,--action=action_expression
Specifies the action that will be carried out when a file match occurs. Any legal KFind language statements can be given, and these can refer to KFind variables (e.g., filename, size, owner). KFind statements are a superset of SLang program statements; any SLang statements can be used here.

-d,--dirfirst
If this option is selected, a directory will be considered for processing before the files it contains. The default is to process files before directories

-f,--follow
It this option is specified, symbolic links are treated as real files, provided the file at the destination end of the link exists. If it is not specified, then links are not followed. For most purposes it makes no difference whether --follow is specified or or. When it does matter, it probably should spcified. The reason it is not enabled by default is that links can point to directories higher in the heirarchy than themselves, which will cause all kinds of problems. You can get around some of these problems by limiting the search depth using the --maxdepth argument.

-m depth,--maxdepth=depth
Limit the depth of directory expansion to `depth'.

-o=action_expression,--postaction=action_expression
Exectute the expression after all other actions of been carried out. This is a good place to display a count of files processed

-p,--print
Prints the name of any matching file (to standard output). This is the default, and will happen unless the command line specifies any other form of action (e.g., using --action). If another action is specifed, then names are not printed by default. Use the option if you want to print and do something else.

-r=preaction_expression,--preaction=action_expression
Execute the expression before any files are processed.

-s,--stoponerror
If this option is selected, KFind will stop processing as soon as an error is raised. This does not affect the handling of syntax errors in the `action' and `test' expressions, as KFind will not even begin processing if there are syntax errors. However, run-time errors that are raised during the processing of `action' and `test' have to be dealt with somehow. The default action is to stop all processing of the current file, and move on the next one. If the error is, for example, an attempt to delete a file to which the user does not have access rights, then this is probably the correct thing to do. If the error is a logical one in the expression then it probably isn't. For example, this `action' expression will raise a run-time error

message(size)

because `size' is an integer, not a string. There is little point carrying on to the next file as the error would simply occur again. It is recommended that --stoponerror be used until the logic of an expression is debugged.

-t=test_expression,--test=test_expression
This is probably the most important command-line argument, and is required in almost all uses of kfind. If it is not specified, it defaults to `true', that is, a match is made to all files.

test_expression can be any legal KFind statement, but it must evaluate to an integer where `0' means false and anything else means true. A return of true is taken to mean that the file matched some criteria. KFind's programming language is a superset of SLang, so any valid SLang statements can be used, as well as KFind extensions.

-x,--noxdev
Don't cross devices. The default is to descend into subdirectories even if they are mounted on different devices. This means that executing ``kfind /'' will find all files on the system (or at least all the user has access to). If, for example, /home and /usr are mounted filesystems on /, then ``kfind / --noxdev'' will not expand /home and /usr.

-v,--version
Prints the version number and compilation date.

 

VARIABLES

In this section, and the section on functions that follows, the data type of each variable is shown before it. For example `string filename' means that the variable `filename' is of type `string'. SLang strings (unlike `C' strings) are dynamically resizable, so there is no need to worry about their size.

string filename

The full pathname of the file selected, relative to the directory that was specified on the command line, or the current directory if no directory was given. For example, if I invoke kfind like this:

kfind /etc

filename will contain absolute pathnames, like /etc/conf.modules, etc. If I invoke it like this:

kfind .

filename will begin with the directory indicator `.'. See also `name' and `pathname'.

string name

The short filename, i.e., the terminal part of the filename, not including any directory information. See also `filename' and `pathname'.

string pathname

The full, absolute pathname of the current file. This pathname will always start with '/'

long size

The size of the file in bytes

int mode

The file mode, e.g., the mode flags obtained from the directory or i-node. This field should _not_ normally be used to determine the
   int uid

The (numeric) used ID of the file.

int gid

The (numeric) group ID of the file.

int nlink

The number of links to the file (normally 1)

long access

The time of last access of the file, in Unix format (seconds since midnight Jan 1 1970)

long modify

The time of last modification of the file, in Unix format (seconds since midnight Jan 1 1970)

long create

The time of creation of the file, in Unix format (seconds since midnight Jan 1 1970)

int totalFiles

The running totals of files and directories examined. This variable only assumes its final value when all processing is complete. To get a final count of files, print the value of totalFiles in a --postaction expression, like this:

--postaction 'message(string(totalFiles)'

int matchedFiles

The running totals of files and directories that matched the --test expression. This variable only assumes its final value when all processing is complete. To get a final count of files, print the value of totalFiles in a --postaction expression

 

FUNCTIONS

Note that SLang syntax allows zero-argument functions to be written without terminating parentheses, so we can write, for example `dir()' or `dir'.

string dir()

returns true if the current selection is a directory

string reg()

returns true if the current selection is a regular file

string blk()

returns true if the current selection is a block device

string chr()

returns true if the current selection is a character device

string lnk()

returns true if the current selection is a symbolic link. Note that this function will never return true if the --follow command line switch is enabled, as the end of the link will be tested, not the link itself

string fifo()

returns true if the current selection is a FIFO pipe

int fnmatch(string text, string pattern)

returns true if the pattern matches the string, using filename matching convention. In this convention, `*' matches any sequence of characters and `?; matches a single character. This is the same convention that the Unix shell typically uses. A much more powerful matching mechansim -- POSIX regular expression matching -- is employed by the match() function. The match is case-sensitive.

int ifnmatch(string integer, string pattern)

Like fnmatch, but case-insensitive

int fname(string pattern)

A shortened form of fnmatch, defined for convenience. It tests the current value of `name' (the name part of the filename) against pattern, so it is equivalent to fnmatch(name, pattern). This function exists simply to make the command line shorter. The match is case-sensitive.

int ifname(string pattern)

Like fname, but case-insensitive

int match(string text, string pattern)

Performs a POSIX-compliant regular expression match of `pattern' on `text'. The full regular expression syntax is supported, including branches. For example, the patter 'Bach|Chopin|Handel' will match text containing the text `Bach' or `Chopin' or `Handel' (case sensitive; use imatch() for a case-insensitive version).

int imatch(string text, string pattern)

As `match()' but case-insensitive

string gidToGroup(int groupID)

Converts the specified group ID into a group name, if possible. If not possible, returns the string `unknown'.

string uidToUser(int userID)

Converts the specified user ID into a user name, if possible. If not possible, returns the string `unknown'.

string owner()

The user name of the owner of the file

string ownergroup()

The group name (string) of the group of the file

string fileRoot(string filename)

Returns the root part (that is everything except the final file extension) of the specified filename.

string nameRoot()

Returns the root part of the current short filename

string filenameRoot()

Returns the root part of the current full filename

void mv(string newName)

Renames (`moves') the current file or directory. The function `rename (oldName, newName)' can be used to rename a file other than the current file.

void rm()

Deletes the current file or directory. Any other file or directory can be deleted using the `remove (name)' or `rmdir(name)' functions respectively. Note that rm() will not prompt for user intervention before it deletes. You have been warned. For safety, rm() will not delete a directory containing files, it will only delete an empty directory.

void cp(string newName)

Copies the current file to a file with the specified name. The function `copy (oldName, newName)' can be used to copy files other than the current file. Note that cp() will not copy directories or devices, only files.

 

NOTES

Access rights

KFind does not attempt to expand directories to which the user does not have acess rights. These errors are reported, but they do not stop the search, even if the --stoponerror option is specified. KFind does not have to be able to read a file to get its basic properties (size, owner, group, etc). However, it does have to be able to read it to get any other information. Specifically tests on the file's contents will fail with an error message if the file can't be read

Note on variables and functions

When a function has no arguments, e.g., size(), then the distinction between files and directories is somewhat technical, as both can be written the same way. SLang allows zero-argument functions to be written without the empty brackets that are essential in Java and C/C++. Thus, even though `dir' is a function and `size' a variable, I can say

kfind --test 'size > 1000'

or

kfind --test 'dir'

as if they were both variables. There is no need to place open/close parentheses for zero-argument functions.

Case-sensitivity

The functions `match' and `fnmatch' are case-sensitive There are case-insensitive versions `imatch' and 'ifnmatch' if that's what you need.

 

OPTIMIZATION AND PERFORMANCE

A design goal has been to make KFind operate at as near as possible to the same speed as GNU find. In practice, KFind is usually slightly slower than GNU find, because each test requires a call to the SLang interpreter, which has a certain overhead. In practice, most things that both KFind and GNU find can do are done about 10 percent faster by GNU find. Of course, there are many things that KFind can do that GNU find can't, so no comparison of speed is possible there.

There are some instances where the way in which a query is phrased can have a significant effect on performance. For example, consider testing whether some variable (call it X) contains the text 'abc' or the text 'def'. We could phrase this in two ways:

--test 'match(X, "abc") or match(X, "def")'

or

--test 'match(X, "abc|def")'

In practice the second query will execute much more quickly. There are two reasons for this.

First, the SLang run-time engine will always execute both parts of an `or' relationship, even if the first is true. This is different to some other programming languages (notably Java), where if the first branch of an `or' operation is true, there is no logical reason to execute the second. Of course, there may be good reason to execute both branches, as they may have side-effects on other parts of the program. SLang (and therefore KFind) allow the user to control the method in which `or' and `and' branches are executed. The standard `or' and `and' binary operators provide full evaluation. Short-circuit evaluation is provided by the `orelse' and `andelse' structures. The first --test line could be re-written as follows, to avoid an unnecessary `match' call:

--test 'orelse{match(X, "abc")}{match(X, "def")}'

Note the `orelse' is not an infix operator; its syntax is follows:

orelse {test1}{test2}...

The second reason that the second --test line will execute more quickly than the first is that the second allows cacheing of the regular expression `abc|def'. The regular expression test must parse the regular expression supplied, and it can only do that at run-time. Therefore we want to avoid parsing the same expression for each new file. KFind has a simple cacheing mechanism: it caches each regular expression, and does not re-parse if the next expression is the same. That is, there is capacity for one regular expression in its cache (of course, a deeper cache could be employed, but this would introduce its own overheads). The worst-case situation is to have two regular expressions (as in the first --test example) and parse each one alternately.

Here is another example. Suppose I wish to find all RPM package files on my computer which are compiled for the i386 processor architecture (RPM files are handled by the add-on module `rpm.kfind'). I could execute the following --test:

--test 'rpmPlatform=="i386"' 

This would give correct results, even though many files will not be RPM files, and will not give a useful value for `rpmPlatform' (it will return an empty string). However, to find the value of rpmPlatform requires that KFind open the file and inspect its header. This is not a long job, but it could add up to a long time over a large volume of files. If we assume that all RPM files have names that end in `.rpm', then we can re-write the expression

--test 'andelse{fname("*.rpm")}{rpmPlatform=="i386"}' 

Now KFind will only test the value of rpmPlatform if the filename ends in .rpm, and the file does not need to be opened to see if this is the case.

KFind is designed to be extensible, and it is relatively easy to add processing modules for new file types. These modules are written in SLang (see the file `rpm.kfind' for an example of an extension module), and must be compiled each time the KFind program is executed. On most systems KFind's start-up time is negligible but, if a large number of extension modules were included, it could become a problem. The KFind system configuration file `main.kfind' (which provides the basic functions) and all the extension modules can, if required, be pre-compiled and stored in compiled format. This will significantly reduce the start-up time. KFind itself can be used to do this, but if you're ready for that step then you're ready to figure out how yourself. See the SLang reference manual for more details. KFind configuration files are not distributed in pre-compiled format because ease of tweaking was considered to be more important than start-up time.

 

DIFFERENCES FROM GNU 'FIND'

This section describes the differences between KFind and GNU find. The differences are the subject of the rest of the document.

Search depth

The --maxdepth argument specifies the depth of expansion of subdirectories, with one expansion being `1', and so on. In GNU find one expansion is specified as a depth of `0'. So if I wanted to search the current directry, but none of its subdirectories, I would use --maxdepth=1. Setting maxdepth to zero guarantees that no directory expansion will happen. This can be useful if you want to test directories without expanding them at all.

Search order

By default, KSearch expands directories before processing the directories themselves. This is the opposite order to GNU find. This behaviour can be over-rideden by the --dirfirst switch.

Minimum depth

KFind has no option to limit searches to _greater_ than a certain depth of expansion. This is because there are only so many hours in the day and I couldn't think of a reason to use this. But it could easily be added if anyone wanted it.

 

EXAMPLES

1. Show all files that have been modified in the last 5 days. Note that `modify' and `now' are both given in seconds, so we need to convert 5 days to seconds

kfind / --test 'modify > now - 5 * 24 * 60 * 60'

2. Show all files in the current directory and its subdirectories whose names end in `.o'

kfind . --test 'fname("*.o")'

3. Display the full pathname and size of all files in the current directory and its subdirectories whose names end in mp3. Note that the `string' function converts the integer `size' to a string.

kfind . --test 'fname("*.mp3")' --action 'message(filename + " " + string(size))'

4. Display the `package name' for all files that can be interpreted as RPM files on the CD-ROM drive. Note that `rpmPackage' will produce an empty string if the file is not an RPM file, but in this case the test --test '"rpm"' prevents any non-RPM files being reported.

kfind /mnt/cdrom --test '"rpm"' --follow --action 'message(rpmPackage)' --print

5. Play all MP3 audio files (using the `splay' program) in the /media directory and its subdirectories, which have the text Chopin or Bach in their title fields

kfind /media --test 'match("Chopin|Bach")' --action 'system("splay " + filename)'

6. Find all `core' files and, if the user confirms, delete them.

kfind / --test 'fname("core")'  
--action 'if (confirm("Delete file")) rm' --print

The --print option is useful here to ensure that the name of the file that is about to be deleted is shown to the user. The function `rm' deletes the current file. When executed, user will be presented with a message like this:

/home/fred/myprogs/core
Delete file ([y]/n) ?

Note that, as it is currently defined, `confirm' defaults to `yes'.

7. Make a copy of all files in the current directory, adding `.bak' to the filename. But don't copy files that already end in `.bak'.

 ./kfind . --test '(not fname("*.bak"))' --action 'cp(filename + ".bak")'

 

FILES

/usr/lib/kfind/*.kfind  KFind initialization files

 

AUTHOR AND COPYRIGHT

This program is copyright (C) 2000 Kevin Boone. See the file COPYING for information on how the program may be copied and distributed.

Permission is granted to any individual or institution to use, copy, or redistribute this software so long as all of the original files are included, that it is not sold for profit, and that this copyright notice is retained.

LIKE MOST FREE SOFTWARE, KFIND IS PROVIDED `AS IS' AND COMES WITH NO WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED. IN NO EVENT WILL THE COPYRIGHT HOLDER BE LIABLE FOR ANY DAMAGES RESULTING FROM THE USE OF THIS SOFTWARE.

Please send bug reports and comments by email to: k.boone@kzone.eu.org. For bug reports, please include the version of KFind (see kfind -v ), the machine and operating system in use, and as much additional information as possible.

 

BUGS

Undoubtedly, but none that are documented.

 

SEE ALSO

find, slang

 

ACKNOWLEDGEMENTS

SLang was developed by John E. Davis at MIT.


 

Index

NAME
SYNOPSIS
WARNING AND DISCLAIMER
DESCRIPTION
OVERVIEW OF OPERATION
OPTIONS
VARIABLES
FUNCTIONS
NOTES
OPTIMIZATION AND PERFORMANCE
DIFFERENCES FROM GNU 'FIND'
EXAMPLES
FILES
AUTHOR AND COPYRIGHT
BUGS
SEE ALSO
ACKNOWLEDGEMENTS

This document was created by man2html, using the manual pages.
Time: 21:47:36 GMT, March 30, 2000