SQLspell release notes ====================== What is SQLspell? ================= A simple, fast spelling checker -- for single words and text files -- for Linux systems. It uses an SQL database server (not supplied!) for all dictionaries, so that (i) different users can easily share dictionaries, and (ii) the highly-optimized searching provided by SQL servers allows very fast checking. Although intended for Linux, I don't think there's anything specific to Linux in the program, so it might work on other Unix systems with a bit of tweaking. Although SQLspell is a client-server system, the use of this architecture does not seem to lead to a reduced performance compared to stand-alone products when the client and server are on the same computer, so SQLspell is a viable alternative to programs like `ispell' even on single-user workstations. Features ======== * Client-server architecture for dictionary sharing * Fast checking and word suggestions using SQL * `Crossword' mode for finding words that match patterns * Checks single words, and text files * Supports any number of dictionaries in the same database * Command-line and X-window graphical versions * Built-in dictionary management functions with both command-line and graphical versions * Configured centrally, per-user, and on command line * Can act as a filter in Unix pipes Pre-requisites ============== The SQLspell client queries one or more databases on an SQL server. This implies that such a server is accessible, and has been configured for the intended user to obtain access to it. As far as I am aware, only ANSI-standard SQL queries are generated, so the choice of server ought not to be critical. The only potential compatibility issue is that the server must support 'like' with wild-cards, and the 'having' clause. I have only tested it with MySQL v2.0, which is available free of charge for Linux. About the programs ================== lookupword ----------- A command-line program to check the correctness of a word, and give suggestions. It can also add and delete words from a dictionary (using command line switches) and search for words with wild-card patters. For example, to find words that are seven letters long and begin and end with `t', enter %lookupword t_____t For a full list of command-line switches, enter %lookupword -help (This works with all the programs in this package) xlookupword ----------- Provides exactly the same functionality as `lookupword' but with a graphical user interface. Normally start it without options, e.g., %xlookupword & xspelltext ---------- Checks the spelling of plain text from any source, using a graphical user interface. By default it reads from standard input and writes to standard output. For example, to check the spelling of file `doc1.txt' and save the checked output in `doc1.checked.txt': %cat xspelltext < doc1.txt > doc1.checked.txt Note that xspellfile does not start work until the whole file has been read in, so it will be slow to start if you pipe data in from a slow source. However, it will work as part of a pipe. xspellfile ---------- Checks the spelling of a plain text file and writes the output back to the same file. For example %xspellfile myfile.txt & (NB. xspellfile is implemented as a shell script that calls xspelltext on a temporary file.) Setting up the dictionary database on the server ================================================ Creating databases and assigning permissions is server-dependent, so the SQLspell client won't try to do any of this. It is the job of the administrator to set up the databases in the format the client expects. The dictionary database should be called 'sqlspell'. In this database can be any number of tables, one for each main, supplementary or stop dictionary. You can create these tables manually, or let the SQLspell utilities do it for you. This section describes the manual approach (which offers slightly more flexibility); for the automatic method see `dictionary management' below. Each dictionary table should have exactly one field, called 'word' of type 'char' or `varchar' and large enough to fit the biggest expected word. This field should be a primary key and index (on some systems these are equivalent). If space is more important than speed, make `word' a varchar field instead. I normally make `word' a field of `char(30)' because I am more concerned about speed. The database `sqlspell' should provide at least `select' access to all potential users, and probably higher levels of access to some. To edit a supplementary (or main) dictionary the user will need insert, delete and update access. To create new supplementary dictionaries will need `create table' access. Since most SQL servers don't allow permissions to be set on a per-table basis, you probably don't want to give `drop table' permissions to anyone except the administrator. Note also that there is probably no way to give different permissions on `main' and `supplementary' dictionaries. This means in effect that the choice of `main' and `supplementary' dictionaries is largely one of convenience or policy. For example, if your main dictionary is called `uk_english' there is nothing to stop a user from using the '-suppl' command line option to open this as a supplementary dictionary and edit it. This is a limitation in most SQL servers, not in this software. Loading words into the dictionaries =================================== Some or all of your dictionaries will need to be stuffed with words before use. Almost certainly this will be true of the main dictionaries. Supplied with this distribution is a (large) list of UK English words in the file 'eng-uk.txt'. I normally stuff this into the main dictionary using a SQL query like: sql> load data infile 'eng-uk.txt' into table 'uk_english'; Not all SQL servers support the `load data' mechanism. Another possibility is to transform the word list into a set of 'insert' queries using a simple program of shell script. This is standard Unix stuff so I'm not going to go into details here. In outline, you need to transform each line into a query like %insert into uk_english values ('aardvark'); For other suggestions, see `Dictionary management'. Note that the file `eng-uk.txt' was compiled by myself from a variety of sources, and checked for conformance with the spelling guidelines in the Concise Oxford Dictionary version 9. I used the CD-ROM version of COD to do this check, but I did not extract words from the CD-ROM. This is because (a) it would probably be illegal and (b) I don't know how: I think it's encrypted. In any event, I am fairly sure the word list is accurate, but it is not exhaustive. In particular it's short on apostrophized words, like `they're' and `wouldn't'. You can create as many main, supplementary and stop dictionaries as you like, but the client program only allows one of each type to be selected at a given time. Dictionary management ===================== SQLspell supports any number of supplementary dictionaries, but only one can be in use at a time. There is no difference in structure between `main', `supplementary' and `stop' dictionaries, except in how they are used by the software. The `add' and `delete' word operations apply only to the current supplementary dictionary. However, you can add words to a main dictionary under program control simply by opening it as a supplementary. For example %xlookupword -suppl uk_english will open the dictionary `uk_english' as a supplementary whatever its original role. You can then add and remove words from this dictionary exactly as with a supplementary. Exactly the same considerations apply to `stop' dictionaries. For example, to remove the word `wether' from the stop dictionary `mystop',: $lookupword -suppl mystop -delete wether (Note that `xlookupword' could also have been used, but the user interface is not neessary for this single operation) If you specify a dictionary that does not exist, the program will ask if you want to create it. If so, it will be created in the database currently selected. Note that this operation will fail if the database has been set up so that you don't have access rights to create new tables. Configuration ============= All the SQLspell utilities read the same configuration files and accept the same command-line options. However, not all configuration options will have an effect with all programs. Usually it is a common-sense issue to decide where this will be the case. Program operations are read from three places. In increasing order of priority these are: * the global configuration file `/etc/sqlspell.rc' * the user's configuration file `~/.sqlspellrc' * the command line The global and local configuration files have the same format. The utilities will work even if neither exist, but you will have to give the names of the dictionaries on the command line. Because I always use the same main dictionary I specify its name in configuration file. In a simple (i.e., single-user) installation that's probably all you need to specify. In a multi-user system, you should probably specify the name of the database server if it is not local The configuration file options are documented in the sample configuration file `sqlspell.rc' with the distribution. Compiling and installing SQLspell ================================= Install executables ------------------- If you are using Linux on an Intel-based PC, then you don't need to compile, as executables are included. Just un-pack the archive sqlspell-0.1.tar.gz into any convenient directory and do #make install in that directory. In fact you don't need to formally install anything; you can simply copy the executables (listed above) to any convenient directory and run them from there. By default the installation places the executables in /usr/local/bin and the global configuration file in /etc The executables are statically linked with all the libraries that they require except the standard X11 support library which is dynamically linked (because it's so enormous). As this latter library is part of all X-window systems, and does not change very much, you should not have any problems with library incompatibilities with this software. Compiling --------- Even if you have an Intel-based Linux system you may want to recompile to change the program defaults or to make the libraries dynamically linked (and thereby make the executables about 60% smaller). To recompile you will need -- in addition to the obvious stuff like a C++ compiler -- * MySQL v2.0 development support files (static libraries and header files) * XForms v0.86 or v0.88 development support files (static libraries and header files) * X11 header files All this stuff is included with recent distributions of Linux for (I think) all platforms. To prepare for compilation edit the top section of Makefile to specify locations of files and directories on your system. Then type %make all and cross your fingers. Notes ===== This program was developed using an evolutionary design method, and the intermediate stages of development are retained to be used as part of a course on object-oriented software development. Therefore there are far more source files in the distribtion than ought to be necessary, along with an index and description of these files in sqlspell_source_index.html. If you do `make install' only the final versions are installed. More detailed design information may be found at the author's Web site (address below). Legal issues ============ This is public-domain software with no restrictions of any kind. Please feel free to use any of it in any way you please. Needless to say, there is no warranty of ay kind, and the author will not enter into discussion with anyone about bugs or limitations. Author ====== Kevin Boone k.boone@mdx.ac.uk http://i.am/kevin_boone