# \indexindy and sorting rules

```
Hello out there,

I have followed your discussion and I'd like to bring another topic
into discussion once more.

As learned from several discussions and readings the sorting problem
is still not very well solved. Specifying sorting rules can be an
tedious task which is error-prone in many cases. Based on the
evaluation of the ISO standard (see our Homepage) I have developed the
following concept, which I have partially implemented this weekend.

1. Letters are entities owning properties that can be used for sorting
purposes. A letter can be defined with the following declaration

(define-letter "umlaut-u with circumflex"
(:case lower)
(:accent circ)
(:letter "u"))

This defines the letter "umlaut-u with circumflex" to have the
properties as defined above. Another example is

(define-letter "umlaut-U with trema"
(:case upper)
(:accent trema)
(:letter "u"))

2. Sorting is done on a sequence of partial orderings that should
result in a total order. Partial orders can be defined with
definitions such as

(define-partial-order :letter
("a" "b" "c" ... "u" "v" ...))

(define-partial-order :case
(upper lower))

(define-partial-order :accent
(trema acute circ tilde hat))

The names of the partial orders directly refer to the property
names above.

3. A total order can be specified with the declaration

(define-total-order
(:letter)
(:accent backwards)
(:case))

This sorts the a word (a sequence of letters) first according to
the weights as given by the partial order :letter, then according
to the weights from :accents (this is the French sorting order) and
finally according to the :case.

As long as we have a sorting model that is based on this scheme we are
finished.

Still missing is a appropriate mapping that transforms a string (a
sequence of chars) into a sequence of letters (which have become real
objects now).

This could look like:

(define-mapping "umlaut-u" ("\~"u" "ü"))
(define-mapping "umlaut-A" ("\~"A" "Ä"))

[I hope you can see the ISO-Latin chars as well]

What I was just discussing with Gabor is the problem of markup (once
more). Often indexes contain commands such as "\index" (see for
example the LaTeX Companion) for with different index entries must be
specified for the command "\index" and the word "index" sorted as

a) <i markup=cmd><n markup=cmd><d markup=cmd><e markup=cmd><x markup=cmd>

versus

b) <i><n><d><e><x>

Here the <...> notation indicates a letter-object with additional
properties.

A partial order

(define-partial-order :markup
(cmd other))

can then be used to solve the remaining ambiguities. The question
remains how to define the mapping

"\index" -> a)

"index"  ->  b)

Two schemes seem to be possible:

1. A mapping is based on string or regexp-transformations (such as the
current sort-rules) but extended with mapping rules.

Informally we could say that "\index" must be written as
"\cmd{index}" and there is a mapping rule that says

(define-mapping "\cmd{(.*)}" "\1"
:with-property (:markup cmd))

indicating that the replacement text "\1" will be further mapped
onto letters that have the additional property (:markup cmd).

This needs a flexible and dynamically configurable parser (not too
hard to implement).

2. We try to tackle the problem the other way around. This concerns
the discussion about \indexindy command. Something like

\indexindy[markup=texttt,...]{foo}

\indexindy[...]{\texttt{foo}}

could solve the problem.

Markup is not embedded in the plain keyword. A scanner is not
necessary anymore. Markup can be done in the markup-backend with
something like

(markup-keyword :markup "texttt" :open "\texttt{" :close "}")

This would effectively yield the same results. It suffers from the
fact that not more than one markup can be associated with a
keyword, which seems be the case rarely.

which solution you prefer most.

If there are open questions, ask me. Maybe I'm too deep into this
stuff that my explanations are not unterstandable :)