An unexpected corpus: Russian version

Over at his blog Panchronica, Guillaume Jacques expresses his delight about The Jesus Film, that product of some American Protestant sect that has now been translated into an enormous amount of languages, even ones for which written material is extremely scanty. It has certainly been of great help to me as I’ve learned Ossetian, and the existence of separate Albanian translations for Kosovo and the Republic of Albania will help foreign learners feel comfortable with both the Gheg and Tosk variants of that language.

While there is probably no other film so widely translated as The Jesus Film, for my own particular purposes I’ve been pleased to find something else, and where the story is less likely to be familiar to the viewer: the Soviet cartoon Трое из Простоквашино (“The Trio from Prostokvashino”) has been dubbed into a number of languages, mainly from Southern Russia and the Caucasus, for example:

  • Ossetian
  • Ingush
  • Lezgian
  • Karachay-Balkar (I was very surprised by how difficult this language is to understand, I thought I would be able to follow it pretty easily after learning Kipchak languages from further east);
  • Lak
  • Kumyk
  • Tatar (under the translated title Простоквашинодан өчәү)

Clicking the links in the sidebar, one can find one’s way to other cartoons in various languages of the former USSR. There’s even an entire playlist of Ossetian-dubbed cartoons.

MariE tolašemtalašem ‘try hard, strive’ < Tatar talaš

One of the frustrations of working with Tscheremissiches Wörterbuch is that some Mari items are labeled Tschuw. or Tat., but the exact source is not specified and sometimes one has to dig a little to determine the original Chuvash or Tatar word.

A case in point is MariE tolašemtalašemsich bestreben, eilen, irgwendwie zu tun versuchen’. This is marked as a Tatar loanword in TschWb, and the word is clearly of Turkic origin since it has a causitive derivational form MariE tolaštaremtalaštarem. I turned to my dictionary of literary Kazan Tatar, the Татарско-русский словарь (Казань: Мәгариф, 2007), and found a phonetic match: талашу. However, the meanings ‘сспориться, скандалить, переругиваться’ of this verb and its derivational forms were not close enough to the Mari verb to satisfy.

If my Tatar dictionary doesn’t help for a Turkic loanword in Mari, the next stop is a Chuvash one. Ashmarin’s Thesaurus Linguae Tschuvaschorum contains a verb corresponding to the Tatar one and almost certainly a borrowing of it, namely tulaş, and the first meanings mentioned are the same as for the Tatar: ‘беситься, злиться, грызться’. However, buried deeper down in the entry is the meaning we’re looking for: возиться, стараться. This is an understandable extension of the Turkic root tal-, the basic meaning of which is ‘to force; to take by force’.

Thus Mari and Chuvash preserve a meaning of the Tatar word that seems to have died out among Kazan Tatars. Interestingly, Russian too borrowed this Tatar word dialectally and uses it in a similar sense, or at least it did in the 19th century: a verb талашитьсясуетиться, толочься, метаться’ is attested from the Tambov region in the Толковый словарь Даля, compiled by Vladimir Ivanovich Dal’ and published in 1863–1866.

Incidentally, had I carefully examined the Mari–English Dictionary instead of basing myself solely on Tscheremissiches Wörterbuch, then I could have figured out this etymology more quickly, because one of the meanings of MariE lit. толашаш is ‘to quarrel, to squabble, to bicker’, and that meaning is not found in TschWb. However, the Mari–English Dictionary, being a general literary-language reference and not a dialect dictionary, does not list the origin of the item, and I wonder if the word in that meaning was found only in Eastern Mari communities under heavy Tatar influence before the rise of the literary language, and only the meaning ‘try hard, strive’ is pan-Mari.

Andreev’s Chuvash textbook and what’s wrong with it

I wrote this review of I. A. Andreev’s Чувашский язык. Практический курс 3rd ed. (Cheboksary: Чувашское книжное издательство, 2011) ISBN 9785767018130 for a book-rating website, but I thought I should also post it here where it is probably more likely to be read. The cover of Andreev’s textbook (3rd ed. 2011) While I do love to just rant about this and other poor learning resources, I think it would be helpful if this book’s flaws were known, as one can avoid being too greatly disappointed. I remember how thrilled I was to discover the book nearly a decade ago, and how quickly my bubble was burst.

Continue reading Andreev’s Chuvash textbook and what’s wrong with it

An anachronistic paired word in Chuvash

I have written here before about the use of paired words in the Volga–Kama languages to denote an entire class of things, e.g. Chuvash yïvăś-kurăk ‘vegetation’ < yïvăś ‘tree’ + kurăk ‘grass’.

An amusing consequence of this is a jarring anachronism if one of the items in the paired word construction was discovered or invented after the event being described. Consider the following from a Chuvash children’s text on the history of the Olympic games: Хӗҫ-пӑшаллӑ ҫынна Олимпие кӗме юраман ‘[In Ancient Greece] people bearing arms were not allowed into Olympia.’

The paired word here is xĕś-păşal ‘arms, weapons’, made up of xĕś ‘sword’ and păşal ‘rifle’. Obviously there were no rifles in Ancient Greece, but apparently the paired word has become so lexicalized that an author can legitimately use it in any historical context.

Tatar in Arabic script

Though I’ve often heard that there is a rich pre-1917 literature in Tatar that is no longer widely accessible because of the change of script, I probably wouldn’t have learned how to read Tatar in Arabic script had I not come across a couple of very useful guides. One, the more serious, is The front cover of the book Гарәп язуы нигезендә татарча әлифба by Dž. G. ZäjnullinГарәп язуы нигезендә татарча әлифба by Dž. G Zainullin (Татарстан китап нәшрияты, 1989).

The other, a colourful children’s reader entitled الفبا (Alifba), was published by the Tatar diaspora in Berlin in 1918. I’ve scanned this and uploaded it as a PDF (18MB).A page from a Tatar reader with a text in Arabic script and a drawing of a dog and two cats

Any adaptation of the Arabic script to a Turkic language would have to indicate the frontness of the vowels in a word, but one solution for this that I wasn’t expecting is the use of certain Arabic emphatics to specify back vowel words. However, this doesn’t hold for all cases – as one works through these books, exceptions pile on exceptions. All in all, this system is so bloody complicated that it’s no surprise that Tatar activists pine instead for the Latin script of the 1930s. Still, I am hoping that a knowledge of this script will let me discover some unjustly forgotten literature over the centuries before the October Revolution.

Battle of the etymologists

The verb MariE püč́kampəčkäm ‘cut off’ is funny. In the Uralisches etymologisches Wörterbuch (367) the word is derived from a supposed Proto-Uralic *pečkä‑ (päčkä‑) ‘to cut’ on the basis of North Saami bæsˈkedi‑ ‘cut hair or wool off’ and Mordvin E M pečke ‘cut off, chop off’. Bereczki upholds this etymology in his Etymologisches Wörterbuch des Tscheremissischen (Mari) without mentioning any alternatives.

On the other hand, Fedotov in his Этимологический словарь чувашского языка (I 409) etymologizes Chuvash păčkă ‘saw’ on the basis of Turkic – namely the widespread *pïčak/bičäk ‘knife’ – and claims (again without mentioning any alternative) that MariW pəčkäm is a borrowing from Chuvash. Who is right here?

There is only one Uralic etymology in Bereczki where *pe‑ gives MariE pü‑, namely püńč́ö ‘pine’ < *penčä (UEW 727). Otherwise pü- in Mari is normally from *pä‑, e.g. pükš ‘hazelnut’ < *päškз (UEW 726–7). However, if we assume that the Proto-Uralic form was *päčkä, that would conflict with the Mordvinic forms, as Moksha Mordvin usually preserves PU *ä and does not raise it to e. I suppose that is why the UEW placed a question mark before the Mordvinic forms.

Can derivational morphology settle the question? The frequentative of this verb is püč́keẟem, and a quick search of the Mari–English Dictionary shows that ‑eẟem is overwhelmingly found in inherited Uralic vocabulary (or at least pre-Chuvash borrowings), not Turkic loanwords. It is not exclusively so – note joɣeẟem ‘flow’ < Chuvash and tojeẟem ‘hide’ < Tatar – but I would think it probable that MariE püč́kem is inherited.

Ultimately, however, with the resemblance between the Proto-Turkic and Proto-Uralic forms, we might have to take the dreaded notion of “sound symbolism” into account here, something which usually makes me want to drop the question entirely, leaving it for someone else with a greater gift for linguistics.

The birth of Russian/Central Asian studies at Indiana University

A few months ago I read David C. Engerman’s Know Your Enemy: The Rise and Fall of America’s Soviet Experts (Oxford University Press, 2009), hoping it might have some details about the rise of Uralic and Altaic studies at Indiana University in Bloomington. As I wrote here Engerman’s book was something of a disappointment, but a scholar at IU has drawn my attention to a recent paper by Blake Puckett, “Central Eurasian Studies at IU (the pre-Department Years)”. Here’s the abstract:

The Department of Central Eurasian Studies at Indiana University dates its origins to the Army Specialized Training Program conducted at IU starting in 1943. But the history of the Department from that beginning to its official emergence as a Department in 1966 is less well known. This paper follows the development of Central Eurasian Studies during this first twenty year period, tracing its interactions with both internal and external events. Relations between departments, the influence of individual personalities, governmental funding and world events all factor into the rise of a unique department at Indiana University one that traces its roots primarily neither to a geographic region nor to an academic discipline, but largely to an [imagined] family of languages. Particularly interesting are the connections between Linguistics as a field of study and broader efforts to promote language training and the understanding of various cultures and regions. The history also provides grounds to reflect on current concerns over the influence of DoD funding in the academy and the recurrent tensions within academia between the (practical) preparation of professionals and the advancement of (theoretical) knowledge.

There are many interesting details here of the sources of funding for these studies, how European-born linguists like Thomas Sebeok, Alo Raun and Felix Oinas ended up in the United States, and just a touch of academic scandal and intrigue.

More Chuvash and Mari at OpenStreetMap

I am drawing up a table of placename abbreviations from Ashmarin’s Chuvash dictionary along with their geographical coordinates, e.g. Урас-к. = д. Ураз-касы, Янтиковского района ЧАССР = 55.571, 47.7352. This will allow me to more easily map the distribution of some isoglosses that have interested me. For the most part, it has been very easy to link Ashmarin’s villages with contemporary ones, though there are a small number of villages which either no longer exist, or which were drastically renamed after the October Revolution.

In the course of doing this research, I’ve added the Chuvash names for several hundred villages in Chuvashia and in the Chuvash diaspora to OpenStreetMap (a project I am passionate about, as I described here). One of the strange things I’ve discovered is that Tatars and Bashkirs are more likely to recognize Chuvash than editors from Chuvashia. Very, very few villages in Chuvashia were marked with a Chuvash name on OSM when I began this project, but villages in Tatarstan and Bashkiria that historically had a Chuvash population were often marked with the Chuvash name alongside the Russian, Tatar or Bashkir name.

In two instances for Chuvash villages within Chuvashia, someone had specified the Chuvash name not with the name:cv tag but with the old_name tag, which just breaks my heart.

Many of the Chuvash placenames floating around the internet were drawn from the Chuvash Encyclopedia, an authoritative reference source. However, the Chuvash Encyclopedia was digitized at some early time when Chuvash fonts weren’t thought widely available. Thus, for the Chuvash letters ҫ,ӗ,ӑ,ӳ, the Chuvash Encyclopedia actually uses the similar-looking codepoints from the Latin-1 block of Unicode, not the Cyrillic block. Because the names were copied and pasted elsewhere, this error persists in the Tatar-language Wikipedia and some OpenStreetMap points. I suppose I’ve have to write a script to automate correcting these on OSM.

For the moment I am not so enthusiastic about adding Mari placenames, because existing Meadow Mari/Eastern Mari placenames are marked up variously with name:chm and name:mhr. I’ve never thought about the existence of three ISO 639-3 codes for Mari (Mari in general and Meadow Mari/Eastern Mari respectively, plus mrj for Hill Mari) as a problem before, but because OSM generates map tiles based on one and only one ISO 639 code, some Mari-language names will not be visible whichever code one chooses. I suppose this too will have to be automated with a script, however redundant it might seem to add both name:chm and name:mhr to every single point.

Mari uštə̑š ‘verst’ as a calque on Kipchak

All of the Kipchak languages except Karaim referred to the Russian verst as čaqïrïm, a derivation of the verb čaqïr- ‘to shout’, that is, a verst was seen as the distance a shout would carry.

In Mari, a word for verst is MariE W uštə̑š, for which Tscheremissisches Wörterbuch gives no etymology. One’s eye is then drawn to a verb on the same page, uštal kolten ‘I shout’, which would support deriving the Mari term in the same way as the Kipchak.

The odd thing is that this verb is attested with the meaning ‘shout’ in only one dialect in Tscheremissisches Wörterbuch, that of Krasnoufimsk in the Eastern Mari diaspora. Everywhere else, ueštaš (with an e that reduces and drops out dialectally) is met only in the meaning ‘to yawn’. In Etymologisches Wörterbuch des Tscheremissischen (Mari), Bereczki et al. reject the longstanding Uralic etymology for this word (some Ob-Ugric verbs for ‘yawn’) and instead propose a simple etymology from onamatopoeia: u, representing the sound one makes when shouting or yawning, followed by the denominal verb-forming suffix -Všt-.

The irregular correspondences in uštə̑š between the Mari dialects along with the fact that not all dialects have the verb from which it is derived, underscore how this noun must have been calqued by a dialect in relatively close contact with Tatar, and then mediated to the other dialects.

Uralic linguistics data on OpenStreetMap

I have used Openstreetmap.org a great deal while travelling, and being a GPS anorak, I’ve added a great deal of previously unrecorded streets, shops and other points of interest. It has become my usual map reference, superior to Google Maps in its libre nature and its surprisingly richer coverage of certain areas.

While reading Paasonen’s Tscheremissischen Texte collected among the Mari of Bashkiria, I was curious where exactly these villages were. Paasonen describes them as centered around the small town of Čurajevo, 25 versts north of Birsk. That is roughly this map view.

Zoom in, and one will find that almost all of the villages that Paasonen lists still exist, at least nominally. There is also a village there named Oktyabr which, with some research, could probably be identified with one of Paasonen’s pre-revolutionary village names. The Mari names for these villages and one of the local rivers were missing, so I added them (OpenStreetMap allows one to specify different-language names for points by appending to the XML tag a colon followed by the ISO-639 code, so name:chm for Mari).

I’d like to see the Uralic/Altaic/etc. linguistics community add more of these details, not so much as a source of reliable toponymic data for scholarship – one still needs to mine archives – but at least to make it convenient for linguists to pull up the placenames they encounter in the old text collections and dictionaries. Just being able to see these Mari villages on the map makes the texts more enjoyable, and it elucidates some of Paasonen’s comments on inter-village communication.