An unexpected corpus: Russian version

Over at his blog Panchronica, Guillaume Jacques expresses his delight about The Jesus Film, that product of some American Protestant sect that has now been translated into an enormous amount of languages, even ones for which written material is extremely scanty. It has certainly been of great help to me as I’ve learned Ossetian, and the existence of separate Albanian translations for Kosovo and the Republic of Albania will help foreign learners feel comfortable with both the Gheg and Tosk variants of that language.

While there is probably no other film so widely translated as The Jesus Film, for my own particular purposes I’ve been pleased to find something else, and where the story is less likely to be familiar to the viewer: the Soviet cartoon Трое из Простоквашино (“The Trio from Prostokvashino”) has been dubbed into a number of languages, mainly from Southern Russia and the Caucasus, for example:

  • Ossetian
  • Ingush
  • Lezgian
  • Karachay-Balkar (I was very surprised by how difficult this language is to understand, I thought I would be able to follow it pretty easily after learning Kipchak languages from further east);
  • Lak
  • Kumyk
  • Tatar (under the translated title Простоквашинодан өчәү)

Clicking the links in the sidebar, one can find one’s way to other cartoons in various languages of the former USSR. There’s even an entire playlist of Ossetian-dubbed cartoons.

MariE tolašemtalašem ‘try hard, strive’ < Tatar talaš

One of the frustrations of working with Tscheremissiches Wörterbuch is that some Mari items are labeled Tschuw. or Tat., but the exact source is not specified and sometimes one has to dig a little to determine the original Chuvash or Tatar word.

A case in point is MariE tolašemtalašemsich bestreben, eilen, irgwendwie zu tun versuchen’. This is marked as a Tatar loanword in TschWb, and the word is clearly of Turkic origin since it has a causitive derivational form MariE tolaštaremtalaštarem. I turned to my dictionary of literary Kazan Tatar, the Татарско-русский словарь (Казань: Мәгариф, 2007), and found a phonetic match: талашу. However, the meanings ‘сспориться, скандалить, переругиваться’ of this verb and its derivational forms were not close enough to the Mari verb to satisfy.

If my Tatar dictionary doesn’t help for a Turkic loanword in Mari, the next stop is a Chuvash one. Ashmarin’s Thesaurus Linguae Tschuvaschorum contains a verb corresponding to the Tatar one and almost certainly a borrowing of it, namely tulaş, and the first meanings mentioned are the same as for the Tatar: ‘беситься, злиться, грызться’. However, buried deeper down in the entry is the meaning we’re looking for: возиться, стараться. This is an understandable extension of the Turkic root tal-, the basic meaning of which is ‘to force; to take by force’.

Thus Mari and Chuvash preserve a meaning of the Tatar word that seems to have died out among Kazan Tatars. Interestingly, Russian too borrowed this Tatar word dialectally and uses it in a similar sense, or at least it did in the 19th century: a verb талашитьсясуетиться, толочься, метаться’ is attested from the Tambov region in the Толковый словарь Даля, compiled by Vladimir Ivanovich Dal’ and published in 1863–1866.

Incidentally, had I carefully examined the Mari–English Dictionary instead of basing myself solely on Tscheremissiches Wörterbuch, then I could have figured out this etymology more quickly, because one of the meanings of MariE lit. толашаш is ‘to quarrel, to squabble, to bicker’, and that meaning is not found in TschWb. However, the Mari–English Dictionary, being a general literary-language reference and not a dialect dictionary, does not list the origin of the item, and I wonder if the word in that meaning was found only in Eastern Mari communities under heavy Tatar influence before the rise of the literary language, and only the meaning ‘try hard, strive’ is pan-Mari.

Tatar in Arabic script

Though I’ve often heard that there is a rich pre-1917 literature in Tatar that is no longer widely accessible because of the change of script, I probably wouldn’t have learned how to read Tatar in Arabic script had I not come across a couple of very useful guides. One, the more serious, is The front cover of the book Гарәп язуы нигезендә татарча әлифба by Dž. G. ZäjnullinГарәп язуы нигезендә татарча әлифба by Dž. G Zainullin (Татарстан китап нәшрияты, 1989).

The other, a colourful children’s reader entitled الفبا (Alifba), was published by the Tatar diaspora in Berlin in 1918. I’ve scanned this and uploaded it as a PDF (18MB).A page from a Tatar reader with a text in Arabic script and a drawing of a dog and two cats

Any adaptation of the Arabic script to a Turkic language would have to indicate the frontness of the vowels in a word, but one solution for this that I wasn’t expecting is the use of certain Arabic emphatics to specify back vowel words. However, this doesn’t hold for all cases – as one works through these books, exceptions pile on exceptions. All in all, this system is so bloody complicated that it’s no surprise that Tatar activists pine instead for the Latin script of the 1930s. Still, I am hoping that a knowledge of this script will let me discover some unjustly forgotten literature over the centuries before the October Revolution.

Mari uštə̑š ‘verst’ as a calque on Kipchak

All of the Kipchak languages except Karaim referred to the Russian verst as čaqïrïm, a derivation of the verb čaqïr- ‘to shout’, that is, a verst was seen as the distance a shout would carry.

In Mari, a word for verst is MariE W uštə̑š, for which Tscheremissisches Wörterbuch gives no etymology. One’s eye is then drawn to a verb on the same page, uštal kolten ‘I shout’, which would support deriving the Mari term in the same way as the Kipchak.

The odd thing is that this verb is attested with the meaning ‘shout’ in only one dialect in Tscheremissisches Wörterbuch, that of Krasnoufimsk in the Eastern Mari diaspora. Everywhere else, ueštaš (with an e that reduces and drops out dialectally) is met only in the meaning ‘to yawn’. In Etymologisches Wörterbuch des Tscheremissischen (Mari), Bereczki et al. reject the longstanding Uralic etymology for this word (some Ob-Ugric verbs for ‘yawn’) and instead propose a simple etymology from onamatopoeia: u, representing the sound one makes when shouting or yawning, followed by the denominal verb-forming suffix -Všt-.

The irregular correspondences in uštə̑š between the Mari dialects along with the fact that not all dialects have the verb from which it is derived, underscore how this noun must have been calqued by a dialect in relatively close contact with Tatar, and then mediated to the other dialects.

Mari šaške ‘mink’ borrowed even into South Kipchak

MariE šaške, MariW šäškə ‘mink’ and Finnish dial. häähkä ibid. have some kind of old relationship with Lithuanian šẽškas ‘polecat’. Whether it’s a Baltic > Uralic loan or vice versa doesn’t matter, the match is very old, and therefore we must assume that Chuvash šaškă ‘mink’ is a loan from Mari.

Moving on to the Volga Kipchak languages, we find an irregular initial correspondence in Tatar čäške ‘mink’, but one could suppose that we are dealing with the same word. Äxmatjanov’s Tatar etymological dictionary, at any rate, accepts a Mari etymology. And then the word is also found in Bashkir, as šäške.

Now, the most interesting aspect of all this, is that the word is found in Kazakh. Though standard Kazakh has suw küzeni for ‘mink’, Radloff recorded a form čäške ‘some kind of aquatic animal’, and this must have been borrowed from Bashkir. When I first began studying the Volga-Kama region, I would compare features found there to Kazakh, and if they were present in the latter, assume that they were either from Proto-Kipchak or at least from outside the Volga–Kama area. However, at least some words have been borrowed from North Kipchak to South Kipchak, and ‘mink’ is another one.

With so much language learning, how does one ever publish anything?

A couple of years ago I quoted a statement from an introductory Altaic studies textbook that the continual language learning in this field means a lifelong commitment. It’s one thing to continually learn languages over one’s scholarly career to broaden one’s horizons, but lately it seems that so much language learning is imposed that I cannot ever actually finish a journal submission.

This is how things have gone so far:

  1. When I began my studies of Finno-Ugrian linguistics, my initial concern was just Mari, which struck me as the Uralic language with the most readily assimilable grammar, and Russian so that I could use the only decent textbook of Mari available at the time. (Of course I was learning Finnish too as a foreigner in Helsinki, and Saami, Erzya and Nenets as other coursework.)
  2. After a few months it became clear that one can hardly do anything with Mari without having real proficiency in Chuvash and Tatar.
  3. A few months after that, I saw that understanding the Turkic languages of the Volga–Kama area requires some knowledge of what they were like before they arrived in that part of the world. So, numerous references on the Turkic family in general were added to my reading list, and I had to learn a couple of other Turkic languages (I chose Turkish and Kazakh) to act as a sort of control group for Volga Kipchak.
  4. As the years went by, it became clear that I had considered enough the relationship of the Permian languages with Mari, so courses of Udmurt and Komi became obligatory before I could even dare to comment on the prehistory of Mari. The Ob-Ugrian languages are another area I should strengthen.

At the moment I’ve got a Mari-related research project that I would very much like to bring to publication, but I have the feeling that I will not have done my scholarly due diligence unless I get two more languages under my belt, namely Moksha Mordvin (Erzya Mordvin is not enough) and Ossetian. I’m very worried that the latter is going to lead to even more things to follow up on in Iranian. This could bog me down for years.

The low-hanging fruit in Uralic studies has long been taken. I think it virtually impossible now to publish a paper on Mari considering only that language and no others around it. To someone today, it seems incredible that in 1950 Thomas Sebeok was able to score another entry on his list of publications simply with a two-page article on how Mari family names or patronymics typically precede a person’s own name.

Do scholars who frequently publish simply say at some point OK, I’ve got enough data now and I am collecting no more? Are they not scared that during the peer review process some possibly more knowledgeable scholar is going to condemn them for overlooking data from another language spoken far away but nonetheless essential to the subject?

Mari koma ‘otter, beaver’ and its Chuvash and Tatar analogues

The first prayer in Paasonen’s collection of Eastern Mari texts has several paragraphs of supplications for a fruitful hunt that really tests one’s familiarity with Mari animal names: swans, martens, lynxes, bears, elk, etc. etc. One animal hitherto unfamiliar to me is koma, which Paasonen’s dictionary glosses as what should be two separate animals:

выдра / Otter; aus seinem Leder verfertigten die Tscheremissien vormals Mützen; боберъ J 88, выдра Tr. koma-jol Fuss(fell) des Otters (1038). koma-upš Otterfellmütze (1283. [Tschuw. Zol. xoma выдра, tat. kama.]

Tscheremissisches Wörterbuch also says [< Tschuw. / Tat.], though the presence of the initial velar should establish that koma was borrowed from Tatar and not Chuvash; Hill Mari ama ‘beaver’ is the Chuvash loan.

In Chuvash, Ashmarin lists only xoma, so the word is limited to the Viryal zone and it is probably a borrowing from Tatar. Fedotov’s etymological dictionary also lists a form xăma, but it is not clear where he got that from, because he cites only Ashmarin.

In Tatar, kama is part of the literary language and my Tatar-Russian dictionary defines it as выдра. Äkhmat’janov’s etymological dictionary draws a not very convincing comparison with Old Turkic kam ‘shaman’, but kama with the same meaning as the Tatar is found in Siberian Tatar, and Shor has kamnaɣïs.

So, the word appears to have been brought to the Volga–Kama area from elsewhere. However, that strange distribution within Turkic makes one want to look at other Siberian language families (though a cursory glance at a Ket dictionary shows nothing similar-looking under выдра and бобр).

As the definitions of this term in the various languages are very much bound up with the notion ‘fur-bearing animal’, one might link Ashmarin’s xumă ‘sable’ (from an earlier *kam-ïK?) to the Tatar word.

Tatarisms in Paasonen’s Eastern Mari texts

The Eastern Mari dialect represented in the texts gathered by Heikki Paasonen in April–July 1900 is, for the most part, not especially different from the Mari literary language. However, there are some interesting signs of contact with Tatar. Thus one naturally finds some loanwords like tarlau ‘burnt clearing’ and okaš ‘to read’.

In some places we find lack of accusative marking on the object when it is indefinite, e.g. in kajenə̑t urem dene šap ojlen ojlen koktə̑nat siɣarə̑m tul pə̑zə̑kten purlə̑nə̑t ‘they went along the street speaking loudly and with lit cigarettes in their mouths’. As tul ‘fire’ is the object here of pə̑zə̑ktaš ‘to set’, one would rather expect the accusative form tulə̑m. In fact, in Paasonen’s Eastern Mari dictionary, he cites from somewhere else the phrase pueš tulə̑m pə̑zə̑ktaš ‘to start a wood fire’ where the expected accusative appears. Similarly in pojan kupesβlak par kičken, trojka kičken koštə̑t ‘rich merchants ride with two horses hitched, with three horses hitched’ the two objects lack an accusative.

What I suppose is another copy of a Tatar model is the phrase šukume šagalme lij möŋgö ‘some time later [lit. a lot or a little time passing]’. The Mari question particle mo is a Tatar borrowing, but in this case it is used in a sense (‘or’) not often encountered in Mari non-interrogative sentences, and it doesn’t display labial harmony.

Chuvash reeds and stems

Eastern Mari has the word omə̑ž ‘reed’, which Paasonen’s dictionary notes is a borrowing from Chuvash. In the Skvortsovs’ Chuvash-Russian dictionary I found the source: Cv. xămăš ‘bulrush’. Fedotov’s etymological dictionary compares this to a wide variety of Turkic cognates such as Turkish kamış and Yakut xomus.

But a few lines above it, one finds an entry for a remarkably similar word: xămăl ‘stubble (of cereals)’. Fedotov compares this to Tatar and Bashkir qamïl ‘bulrush’.

These must be the same words, both going back to Proto-Turkic *kamïš ‘grass stalk (or the like)’ and showing the ‑š ~ ‑l distinction that divides the family in two. Outside of Chuvash, the ‑l variant has no cognates outside of Volga Kipchak, and thus can be regarded as a Volga Bulgarian loan into Tatar and Bashkir. The ‑š variant, on the other hand, must be a loan from Volga Kipchak into Chuvash.

An amusing bit of trivia, two distantly related languages trading cognates with different meanings.

On Chuvash śüś ‘hair’

In their 1983 paper on early Bulgarian loanwords in the Permian languages, Rédei & Róna-Tas derive Chuvash śüś ‘hair’ from Proto-Turkic by proposing an intermediate form that is not actually attested anywhere: PT yulči (Cf. Kashgari yulïč ‘goat’s hair’) > Chuv. *śevśi > śüś (p. 77).

Why should this not be considered a simple Tatar loan? A Proto-Turkic word for ‘hair’ as inherited by the Kipchak languages was *sač. Tatar now has čäč after the initial consonant was assimilated to the following č and then, because Tatar č is articulated with great palatalization, the vowel was fronted.

Early Tatar loans in Chuvash show Cv. o/u for Tatar a, and Cv. ś for Tatar č. However, Chuvash ś is just as palatalized as Tatar č (indeed, it can be argued that phonetically they are the same sound). Thus, Chuvash could borrowed the word from Tatar in an intermediate form *čač, raised the vowel, and then fronted it on its own or within an areal context.