\documentstyle[12pt,multicol]{article} \addtolength{\textwidth}{.5cm} \def\diatop[#1|#2]{{\setbox1=\hbox{{#1{}}}\setbox2=\hbox{{#2{}}}% \dimen0=\ifdim\wd1>\wd2\wd1\else\wd2\fi% \dimen1=\ht2\advance\dimen1by-1ex% \setbox1=\hbox to1\dimen0{\hss#1\hss}% \rlap{\raise1\dimen1\box1}% \hbox to1\dimen0{\hss#2\hss}}}% %e.g. of use: \diatop[\'|{\=o}] gives u macron acute \title{Standardization of Sanskrit for Electronic Data Transfer and Screen Representation} \author{Dominik Wujastyk} \date{9 September 1990} \begin{document} \maketitle \section*{Text Encoding Guidelines} During the 8th World Sanskrit Conference, Vienna 1990, a panel was held to discuss the standardization of Sanskrit for electronic data transfer. Participants were encouraged to acquire and study the {\em ACH-ACL-ALLC Guidelines for the Encoding and Interchange of Machine-readable Texts}, edited by Lou BURNARD and C.~M.~SPERBERG-MCQUEEN (Chicago and Oxford, 1990). These {\em Guidelines\/} are available free of charge in Europe from L.~Burnard, Oxford University Computing Service, 13 Banbury Road, Oxford OX2 6NN, England, or in the USA from C. M. Sperberg-McQueen, Computer Center (MIC 135), University of Illinois at Chicago, Box 6998, Chicago, IL 60680, USA. \section*{7-bit coding for file transfer} Professor H. Falk presented a program called {\tt CONVERT} that conveniently converts any coding scheme used in a data file to any other coding scheme. This program was generously made available at no cost, together with Turbo Pascal source code. Prof.\ Falk also presented a very useful 7-bit, multi-byte ``mediation code'' which will be of general use for file exchange. \section*{8-bit character set for text display} Finally, although the above two provisions cover all essential needs, the panel still felt that a standard assignment of graphic codes for the display of Sanskrit transliteration would be helpful. An ad hoc committee of interested parties was formed, and two 8-bit `code pages'' were designed. One, {\em Classical Sanskrit\/} (CS), for standard use and another, {\em Classical Sanskrit Extended\/} (CSX), which included the former, but also provided for Vedic, MIA, Tamil and some special usages. These code pages take as their point of departure IBM's code page 437, the default set of character codes built into the IBM PC and clones. The characters listed below are replacements for the characters in code page 437 which have the same numerical code. E.g., character number 224 in code page 437 is a Greek letter alpha ($\alpha$); CS redefines it to be a with a macron (\=a). All codes not specified below are assumed to be as code page 437. E.g., character number 130 is e acute (\'e). The codes assigned were as follow: \begin{multicols}{2}[\subsection*{Classical Sanskrit (CS)}] \begin{small} \begin{tabbing} 000 \= x underdot macron acute \= (normally German eszett, xx) \kill 166 \> l tilde \> \~ l \\ 167 \> m overdot \> \.m \\ 224 \> a macron \> \a=a \\ 225 \> not used (normally German {\em eszett}, \ss) \\ 226 \> A macron \> \a=A \\ 227 \> i macron \> \a=\i \\ 228 \> I macron \> \a=I \\ 229 \> u macron \> \a=u \\ 230 \> U macron \> \a=U \\ 231 \> r underdot \> \d r \\ 232 \> R underdot \> \d R \\ 233 \> r underdot macron\> \diatop[\a=|\d r]\\ 234 \> R underdot macron\> \diatop[\a=|\d R]\\ 235 \> l underdot \> \d l \\ 236 \> L underdot \> \d L \\ 237 \> l underdot macron\> \diatop[\a=|\d l]\\ 238 \> L underdot macron\> \diatop[\a=|\d L]\\ 239 \> n overdot \> \.n \\ 240 \> N overdot \> \.N \\ 241 \> t underdot \> \d t \\ 242 \> T underdot \> \d T \\ 243 \> d underdot \> \d d \\ 244 \> D underdot \> \d D \\ 245 \> n underdot \> \d n \\ 246 \> N underdot \> \d N \\ 247 \> s acute \> \a's \\ 248 \> S acute \> \a'S \\ 249 \> s underdot \> \d s \\ 250 \> S underdot \> \d S \\ 251 \> not used (normally the root sign $\surd$) \\ 252 \> m underdot \> \d m \\ 253 \> M underdot \> \d M \\ 254 \> h underdot \> \d h \\ 255 \> H underdot \> \d H \\ \end{tabbing} \end{small} \end{multicols} \newpage \begin{multicols}{2}[\subsection*{Classical Sanskrit Extended (CSX) additions} The following definitions are added to the above Classical Sanskrit character set.] \begin{small} \begin{tabbing} 000 \= x underdot macron acute \= (normally German eszett, xx) \kill 159 \> r underbar \> \b r \\ 168 \> a macron breve \> \diatop[\u|\a=a]\\ 169 \> i macron breve \> \diatop[\u|\a=\i]\\ 170 \> u macron breve \> \diatop[\u|\a=u]\\ 173 \> n underbar \> \b n \\ 181 \> a macron acute \> \diatop[\a'|\a=a]\\ 182 \> a macron grave \> \diatop[\a`|\a=a] \\ 183 \> i macron acute \> \diatop[\a'|\a=\i] \\ 184 \> i macron grave \> \diatop[\a`|\a=\i] \\ 189 \> u macron acute \> \diatop[\a'|\a=u] \\ 190 \> u macron grave \> \diatop[\a`|\a=u] \\ 198 \> r underdot acute\> \diatop[\a'|\d r] \\ 199 \> r underdot grave\> \diatop[\a`|\d r] \\ 207 \> r underdot macron acute\> \raisebox{.25ex}{\rlap{\a'{ }}}\diatop[\a=|\d r] \\ 208 \> a tilde \> \~ a \\ 209 \> i tilde \> \~ \i \\ 210 \> u tilde \> \~ u \\ 211 \> e tilde \> \~ e \\ 212 \> o tilde \> \~ o \\ 213 \> e breve \> \u e \\ 214 \> o breve \> \u o \\ 215 \> l underbar \> \b l \\ \end{tabbing} \end{small} \end{multicols} \bigskip These codes were chosen to have minimal impact on the standard IBM PC extended ASCII character set, but they are intended for general use in displaying Indological texts on any machine with an 8-bit (or greater) character set. Dr. D. Wujastyk will be making available small programs that load the above character sets into the EGA or VGA display adaptors, for IBM PC users. The above character codings have been approved by R. E. Emmerick, H. Falk, R. Lariviere, G. J. Meulenbeld, H. Nakatani, M. Tokunaga, D.~Wujastyk, P. Schreiner and M. Yano. These character codings are primarily intended for use in situations when the screen display of these characters is requried, such as in word processing. They may, of course, be used for data transfer, where, however, a 7-bit code (perhaps with multi-byte character codes) is still preferable. One such 7-bit scheme is provided hy H. Falk (see 2. above). \newpage These character codings are currently open for discussion and comments may be directed to Dr. D. Wujastyk at Wellcome Institute, 183 Euston Road, London NW1 2BN, England,\\ or by email at Bitnet/Earn: {\tt dow@harvunxw} or Janet: {\tt D.Wujastyk@uk.ac.ucl}. After a suitable lapse of time, the character sets will be sent to ECMA and ISO for registration. They will also be sent to the Text Encoding Initiative for registration, probably with H. Falk's 7-bit coding scheme. Such registration in no way enforces these schemes; it merely makes them available centrally for reference. Other schemes may also be registered in the future. \end{document}