Amusing linguistics web searches

Maintaining a blog with musings on linguistics has brought a lot of search engine traffic my way, and I occasionally look at my server logs to see what searches bring up my website. Often these are not particularly interesting, as people either come for something very specific that I’ve written about, or conversely, for some reason one of my posts shows up for what would seem to be a completely unrelated non-linguistics search. However, occasionally I see very amusing search strings. Here are four of the most recent ones that made me chuckle:

  • is tocharian worth learning, it’s hard to imagine what position a person would have to be in to need to ask this;
  • are people poor in yoshkar-ola, well, I think bednost’ is an apt word for Mari El in many senses;
  • салфетки glagolitic, is someone making a medieval Slavic-themed restaurant?
  • navajo elders understand chinese, looks like someone has been reading Gavin Menzies.

An anachronistic paired word in Chuvash

I have written here before about the use of paired words in the Volga–Kama languages to denote an entire class of things, e.g. Chuvash yïvăś-kurăk ‘vegetation’ < yïvăś ‘tree’ + kurăk ‘grass’.

An amusing consequence of this is a jarring anachronism if one of the items in the paired word construction was discovered or invented after the event being described. Consider the following from a Chuvash children’s text on the history of the Olympic games: Хӗҫ-пӑшаллӑ ҫынна Олимпие кӗме юраман ‘[In Ancient Greece] people bearing arms were not allowed into Olympia.’

The paired word here is xĕś-păşal ‘arms, weapons’, made up of xĕś ‘sword’ and păşal ‘rifle’. Obviously there were no rifles in Ancient Greece, but apparently the paired word has become so lexicalized that an author can legitimately use it in any historical context.

Mari /ŋ/ represented by Cyrillic <н>

In attestations of the Mari language from the 18th-century, Mari /ŋ/ tends to be represented with the Cyrillic letter <н>. Lots of manuscripts represent MariE jeŋ ‘person’ as <ен>, for instance. For more examples, see Alhoniemi’s 1979 commentary on the Mari wordlist of P. S. Pallas.

A colleague of mine found this odd, as he would have expected the sequence <нг>. Yet, denoting the sound [ŋ] in the same way as another single consonant has a long history. Consider Greek where the sequence [ŋg] is always spelled <-γγ->. Also, a samoyedologist once told me of a foreign colleague (Japanese, if I recall correctly) who kept hearing Nenets /ŋ/ as /g/; his ears simply couldn’t pick up on the nasal property of the consonant.

But if historically /ŋ/ has been confused by other peoples as either /n/ or /g/, the question remains why these Russian (and Russia-resident German) wordlist compilers constantly denoted Mari /ŋ/ with the symbol for /n/ and never for /g/. One reason for this may be that the compilers were already using Cyrillic <г> to represent Mari /ɣ/, which is a fricative, not a stop. Since the only other voiced velar sound in the language was a fricative, the velar stop /ŋ/ was heard as the closest stop to it: /n/.

But in the neighbouring Udmurt language, where the /g/ is a stop, not a fricative, 18th-century compilers still denoted /ŋ/ with the same symbol for /n/. D. G. Messerschmidt’s wordlist, which has been reprinted with a commentary by V. V. Napolskikh, has <Gurpuhn> for Udmurt dial. gurpuŋ ‘heron, stork’. (Note, however, how Messerschmidt denotes the sequence [ŋg] in <Ning-goron> for Udmurt dial. ńiŋgoron ‘woman’.)

So what else in these Uralic languages and in the native languages of these Russian and German compilers could have motivated the choice of the letter usually denoting /n/ and not the letter for /g/? Something worth thinking about.

SSL authentication on Freenode with Emacs ERC

The Freenode IRC network allows clients to pass an SSL certificate and automatically identify their nick with NickServ upon logging in. Freenode offers instructions on creating the SSL certificate, as well as how to configure SSL authentication on several IRC clients, but it does not describe the setup for Emacs ERC.

I managed to get this working after some failed attempts following example code on the web. The problem is that an argument is missing from the gnutls-cli command described in other people’s Emacs init files that one comes across through a search. If one just runs gnutls-cli --x509certfile ~/.ssl/mynick.cert -p 6697 from the command line, one sees in the output: Successfully sent 0 certificate(s) to server. Of course, without a certificate sent to the server, the automatic NickServ identification will fail.

The correct way is to add the --x509keyfile argument, i.e. gnutls-cli --x509certfile ~/.ssl/mynick.cert --x509keyfile ~/.ssl/mynick.key -p 6697 When this is done, the output will show Successfully sent 1 certificate(s) to server. Then, NickServ identification will run automatically assuming that you have followed Freenode’s instructions and told NickServ what your certificate’s SHA1 fingerprint is.

A lot of people’s Emacs init files define tls-program as a global variable and specify the certificate to pass there. This is bad for privacy, as while one wants to disclose one’s identity to Freenode, you probably don’t want to potentially tell every other server contacted through SSL who you are. Therefore, the best thing to do is create a function to call ERC, and use Emacs’ let statement to define a value of tls-program that will only be valid for ERC:

(defun start-irc ()
"Connect to IRC over SSL and pass a certificate for nick identification."
(let ((tls-program '("gnutls-cli --x509certfile ~/.ssl/mynick.cert --x509keyfile ~/.ssl/mynick.key -p %p %h")))
(erc-tls :server "" :port 6697
        :nick "mynick" :full-name "mynick")))

For Freenode, as with all SSL connections through Emacs, users may also want to consider the certificate pinning function that GnuTLS provides, see Jens Lechtenbörger’s Certificate Pinning for GNU Emacs.

Four levels of politeness in 17th-century Spanish

One of the more interesting books that I’ve read lately is Christopher J. Pountain’s A History of the Spanish Language Through Texts (London: Routledge, 2001). For the so-called Golden Age of Spanish literature, Pountain especially chooses texts by standardization-minded authors who inadvertently offer many details of the popular speech of their time. The following passage from Gonzalo de Correas’s Arte de la lengua española castellana (1625) suggests a much more complex system than the one found in Peninsular Spanish today, which is down to just tu and usted (and when I moved to Spain in the early millennium, I was urged to use usted much more sparingly than foreigners – on the basis of learning materials from Latin America – usually feel they should).

Devese tanbien mucho notar la desorden, i discordante concordia, que á introduzido el uso, ora por modestia, ora por onrra, ò adulazion. Para lo qual es menester primero advertir, que se usan quatro diferenzias de hablar para quatro calidades de personas, que son: vuestra merzed, él, vos, tu… De merzed usamos llamar à las personas à quien rrespetamos, i debemos ò queremos dar onrra, como son: xuezes, cavalleros, eclesiasticos, damas, i xente de capa negra, i es lo mas despues de señoria. Él usan los maiores con el que no quieren darle merzed, ni tratarle de vos, que es mas baxo, i propio de amos à criados, i la xente vulgar i de aldea, que no tiene uso de hablar con merzed, llama de él al que quiere onrrar de los de su xaez. De vos tratamos à los criados i mozos grandes, i à los labradores, i à personas semexantes; i entre amigos adonde no ai gravedad, ni cunplimiento se tratan de vos, i ansien rrazonamientos delante de rreies i dirixidos à ellos se habla de vos con devido rrespeto i uso antiguo. De tu se trata à los muchachos i menores de la familia, i à los que se quisieren bien: i quando nos enoxamos i rreñimos con alguno le tratamos de él, i de vos por desdén. Supuesto lo dicho, en las tres diferenzias primeras de hablar de merzed, él, vos, se comete solezismo en la gramatica i concordanzias contra la orden natural de las tres personas, xeneros i numeros.

The disorder and disconcordant concord which usage has introduced, whether through modesty, respect or adulation, should also be noted. For this it is necessary, first, to state that four different ways of speech are used for four qualities of person, namely: vuestra merzed, él, vos, tu … We usually call people we respect by merzed, such as judges, gentry, clergy, ladies and black cape people, and it is the highest after señoría. Él is used by older people for someone they do not wish either to call merzed or address as vos, which is lower, and typical of masters to servants; and common and village people, who are not accustomed to using merzed in their speech, address as él people to whom they want to show respect from their class. We call servants and grown up boys vos, and labourers, and such like people; and among friends where there is no gravity nor ceremony vos is used, and so in speeches made in front of kings and addressed to them vos is used with due respect and old usage. Children, younger members of the family and loved ones are called ; and when we get angry and quarrel with someone we call them él, and vos to disparage them. Bearing in mind the foregoing, in the first three of speaking (merzed, él, vos) there are violations of grammar and agreement against the natural order of three persons, gender and number.

One wonders how much this system was really agreed upon by all, and how much it was an idealization of shifting norms across time and space. The Hungarian I learned from Zsuzsa Pontifex’s Teach Yourself Hungarian back in the 1990s seemed to present a straightforward four-level system too: te, maga, Ön, tetszik. However, foreign learners are told very quickly that maga has been on the way out for decades, and if used today is just as likely to be pejorative as it is to tend towards showing respect. In other descriptions, the tetszik address is either replaced by another form of address, or a fifth level is added to the system.

Similarly, of the four-level system I’ve often heard proposed for Romanian – tu, dumneata, dumneavoastră, domnul/doamna – the second is rarely heard in Transylvania and the last is only heard from waiters at high-class restaurants who are clearly aping the French experience.

Жгонский язык

While trawling back issues of the journal Sovetskoye Finno-Ugrovidenija for interesting reading on Mari, I came across a Russian dialect I had never heard of before, and which seems virtually unknown on the English-speaking web. As S. M. Strel’nikov writes in his 1978 article “Марийские элементы в жгонском языке” (Mari elements in zhgonsky jazyk):

Жгонским языком (от жгон ’шерстобит’) называют свой условный язык русские ремесленники Костромской области (пимокаты и портные), в недалеком прошлом занимавшиеся отхожим промыслом во многих губерниях России. Хотя численность носителей жгонского языка сокращается, его и сейчас помнят лица пожилого возраста во многих насееленных пунктах Нейского, Мантуровского, Макарьевского районов Костромской области, Варнавинского и Ветлужского районов Горьковской области.

Zhgonsky jazyk (from zhgon “woolspinner”) is the name by which Russian craftsmen in the Kostroma district (bootmakers and tailors) refer to their language; these craftsmen in the not-so-distant past were engaged in seasonal labor in many parts of Russia. Although the number of speakers of zhgonsky jazyk has declined, it is still remembered by elderly people in many settlements in the Ney, Manturov, and Makaryev regions of the Kostroma district, and in the Barnavin and Vetluga regions of the Gorsky district.

This language was an argot, meant to allow these craftsmen to communicate in secret when traveling about. Certainly the examples provided in this article are completely incomprehensible without glosses, e.g. Ши́до в плеха́нку пови́титься сохля́ть ‘I’ve got to head to the steam bath to wash’, Декни́ приты́лить ‘Give me a smoke’.

While zhgonsky jazyk drew on other languages such as Udmurt, German, Greek and Turkish, the Mari stock is prominent and Strel’nikov suggests that this argot arose on the basis of interaction between Russians and speakers of Northwestern Mari. Some zhgonsky jazyk words of Mari origin concern the numbers (e.g. ны́лик ‘4’ < MariNW nəl, канда́йша ‘8’ < MariNW kändäŋš) and weather (уре́ж ‘rain’ < MariNW jur, ю́кша ‘cold, winter’ < MariNW jükšem). Strel’nikov identifies altogether 44 items as derived from Mari, and some of them have gone amusing shifts in meaning as is common in these sorts of secret languages.

The two versions of Sergeev’s monograph on Mari manuscripts

O. A. Sergeev has dedicated much of his career to examining 18th and 19th-century manuscript word lists of the Mari language. In 2000 the Mari state press published his Mari-language overview of these treasures under the title Тошто марий мутер-влак.Cover of O. A. Sergeev’s 2000 publication Тошто марий мутер-влак In 2002, from the same publisher, his Russian-language monograph entitled Истоки марийской письменности appeared.

At a brief glance, one may be inclined to view the 2002 publication as simply a Russian translation of the original Mari-language work. Indeed, the table of contents is virtually identical; the two works discuss the same manuscripts in the same order, and then contain the same overall analysis of the materials from various perspectives. However, as I discovered in the course of my own research on Pallas’s Mari word list while initially using only the 2000 publication, there are differences between the two versions that mean that anyone looking into the history of Mari ought to go through both of them.

As an example, let’s consider Sergeev’s description of the manuscript Эрм. 577 №, инв. 802 held in the Russian National Library. The 2000 publication reads (pp. 35–36):

Ты очымо рукописят XVIII курым дене кылдалтын. Чылаже тудо 54 лаштык гыч шога. Рукопись марий-влакын илыме верла гыч Санкт-Петербургышто ямдылыме “Сравительный словарь всех языков и наречий” мутерлан колтымо материалжылан шотлалтеш. Каласаш кӱлеш: мутерысе шомак-влакым Российысе кугыжа Екатерина II шкеак возен. Кажне лаштык ик мут гыч шога, вара ты руш йылмысе шомакым “инородный” йылмылаш кусарыме. Марий йылмысе мут-влак 12-шо № дене пуалтыныт. Тылеч посна моло родо-тукым да пошкудо йылме-влакат улыт. Мутлан, 9-ше № дене эстон йылме кая, 10-шо № дене — удмурт, 11-ше № дене — чуваш йылме вераҥыныт.

Руш йылмысе мут-влак кӱшнӧ ончымо тыгаяк 286 реестре дене икгаяк улыт. Тылеч посна ик ойыртем уло: ты памятникыште руш йылме деч посна церковнославянский йылмын кышаже палдырна. Мултнан: вуй ’голова, глава’, шиндза ’глаз, око’, юкше ’стужа, холод, хлад’, молат. Руш йылмысе глагол, слово, речь шомак-влак марий йылмыш ик ’шомак’ мут дене кусаралтыныт.

17 мут деч моло шомак пелен ударенийым палемдыме.

Мут шагал умылан кӧра рукописьын могай наречий (але говор) негызеш ышталтмыжым каласаш неле.

The description in the 2002 publication (pp. 43–44) doesn’t mention that the Mari words are glossed in both Russian and Church Slavonic, and the presence of a particular Mari word for ‘cold’ is left out:

В данной рукописи хранятся материалы дла “Сравителного словаря всех языков и наречий”, написанные собственноручно Екатериной II. Итого — 54 л. Каждый лист содержит по одному заглавному слову, которое переведенно на “инородные” языки, напр., под № 12 даны марийские параллели, под № 9 — эстонские, под № 10 — удмуртские, под № 11 — чувашские и т. д. Реестр совпадает со словником 286 заглавных русских слов. Далее дается список 286 “черных” слов без перевода на “инородческие” языки.

Лексемы глагол, слово, речь переведены на марийский язык одним словом: шомакъ.

На лексемах, за исключением отдельных слов (17), поставлено ударение.

Скудный материал памятника не позволяет определить его диалектную основу.

For another manuscript, the 2002 publication mentions a misunderstanding between the Mari informant and the compiler of the word list with regard to the motion verb mijaš, which had not been discussed in the 2000 book. And in the general overview of the manuscripts, where Sergeev points out how some compilers offer a series of synonyms, the list in the Russian translation on page 69 cites fewer items than the 2000 original on page 69.

One might think these small differences, but one man’s small difference may be another’s vital piece of information.

The curious Mari word шнуй šnuj ‘holy’

The first time I ever came across Meadow Mari шнуй šnuj ‘holy’, I was quite struck by the word. It resembles nothing in Chuvash, Tatar or Russian, but the initial consonant cluster means that it cannot be a native Mari word.

While used in the modern literary language, the word is notably missing from the dialectal dictionaries compiled by late 19th-century or early 20th-century fieldworkers. As far as I can tell, the first dictionary to include the word is V. M. Vasil’ev’s Марий Мутэр (Moscow, 1926), where it is glossed ‘святой’ with the example sentence Шнуй пийал дэнэ йумо ий мучашы̆м шуйа маны̆т. The word is not present in the 1956 Mari–Russian dictionary, possibly on account of the anti-religious edicts of the time, but it is found in the 1991 Mari–Russian dictionary and the 10-volume Словарь марийского языка, where it seems to have long become an established feature of the literary language.

An initial cluster is also found in Eastern Mari spaj ‘beautiful, graceful’, a loan from Tatar zipa, in turn from Persian zibā (see Paasonen’s Ost-Tscheremissisches Wörterbuch pp. 112–113), so I looked for some Turkic or Perso-Arabic word of a similar shape for šnuj, but I eventually gave up. As the word has generally been overlooked in discussions of Mari etymology, there the matter rested for some time.

However, I recently saw that Raija Bartens took a look at šnuj in her paper “Marilaista Isä meidän–rukouksen käännöksistä”, a contribution to the Festschrift for Seppo Suhonen Oekeeta asijoo (Helsinki: Suomalais-Ugrilainen Seura, 1998). In coming across šnuj in her survey of Mari translations of the Lord’s Prayer through the ages, Bartens was just as baffled by the word was I was. Noting that it is used only in Christian contexts, Bartens asks if it isn’t simply a corruption of Russian священный.

Oddly, in his История марийского литературного языка (p. 137), I. G. Ivanov lists this word among terminology from the Mari pagan religion that fell out of active use during the rise of the Mari literary language. Not only was this word not used for Mari paganism, but it is the rise of the literary language in the 20th century that propelled this word from some extremely obscure origin to common use by Meadow Mari-speaking writers everywhere.

Tatar in Arabic script

Though I’ve often heard that there is a rich pre-1917 literature in Tatar that is no longer widely accessible because of the change of script, I probably wouldn’t have learned how to read Tatar in Arabic script had I not come across a couple of very useful guides. One, the more serious, is The front cover of the book Гарәп язуы нигезендә татарча әлифба by Dž. G. ZäjnullinГарәп язуы нигезендә татарча әлифба by Dž. G Zainullin (Татарстан китап нәшрияты, 1989).

The other, a colourful children’s reader entitled الفبا (Alifba), was published by the Tatar diaspora in Berlin in 1918. I’ve scanned this and uploaded it as a PDF (18MB).A page from a Tatar reader with a text in Arabic script and a drawing of a dog and two cats

Any adaptation of the Arabic script to a Turkic language would have to indicate the frontness of the vowels in a word, but one solution for this that I wasn’t expecting is the use of certain Arabic emphatics to specify back vowel words. However, this doesn’t hold for all cases – as one works through these books, exceptions pile on exceptions. All in all, this system is so bloody complicated that it’s no surprise that Tatar activists pine instead for the Latin script of the 1930s. Still, I am hoping that a knowledge of this script will let me discover some unjustly forgotten literature over the centuries before the October Revolution.