Node: Working With Affix Info in Word Lists, Next: Format of the Personal and Replacement Dictionaries, Previous: Creating an Individual Word List, Up: Working With Dictionaries
The munch
command takes a list of words from standard input
and outputs a list of possible root words and affixes. The root may,
however, be invalid as it does not check them against the exiting
dictionary. For example the command:
echo brother | aspell -l en munch
produces
brother broth/R brothe/R
The expand
command is the reverse of munch
, it
expands affix flags to produce a list of words. For example:
echo both/R | aspell -l en expand
produces
both bother
It formal usage is:
aspell expand [level] [limit]
Where level is the expansion level. Valid values are between 1 and 3. Level 1 is the default if not otherwise specified. Level 2 causes the original root/affix to be included, for example:
both/R both bother
Level 3 causes multiple lines to be printed, one for each generated word, with the original root/affix combination followed by the word it creates:
both/R both both/R bother
Levels larger than 3 may also be supported, but should not used as they may eventually be removed.
If a limit parameter is given than only expansions which effect the first limit letters will be expanded. If a base word is not completely expanded for a given affix flag that flag will be left on the word. Note that prefixes are always expanded.
The munch-list
command will reduce the size of word list via
affix compression. It will reduce a list of words to a minimal (or
close to it) set of roots and affixes that will match the same list of
words. The list of words is read from standard input and the result,
the "munched" list, is written to standard out. It's usage is:
aspell munch-list [keep] [single|multi] [simple] < infile > outfile
where simple
, single
, multi
, and
keep
are literal values.
The default algorithm used should give near optimum results. In some cases the set of words returned is, provably, the minimum number possible. In the typical case the number of words returned is within 1% of the optimum number.
By default Aspell will remove redundant affix flags. The keep
flag will avoid removing them, which can be useful if you want to
include all possible expansions for each base word.
When cross products are involved it may be beneficial to list a base
word more than once. Unfortunately, the current version of Aspell can
not correctly handle multiple base words in a dictionary. Therefore,
the current default behavior is to only include the one with the most
expansions. All of them can be included via the multi
flag.
Once aspell is able to handle multiple base words the default will be
to include them all. The single
flag can be used to only
include one of them.
The simple
flag will select an alternate faster algorithm.
This algorithm is very similar to the munch
command
distributed with Myspell (the Open Office spell checker), however, it
doesn't give nearly as good results. It does okay for the English
word list but not for some other languages such as German; the normal
algorithm reduced a list of 312,002 German words to 79,420 base words
while the simple algorithm only reduced it to 115,927 words. This
algorithm may disappear in a future version of Aspell.