An unexpected corpus: Russian version

Over at his blog Panchronica, Guillaume Jacques expresses his delight about The Jesus Film, that product of some American Protestant sect that has now been translated into an enormous amount of languages, even ones for which written material is extremely scanty. It has certainly been of great help to me as I’ve learned Ossetian, and the existence of separate Albanian translations for Kosovo and the Republic of Albania will help foreign learners feel comfortable with both the Gheg and Tosk variants of that language.

While there is probably no other film so widely translated as The Jesus Film, for my own particular purposes I’ve been pleased to find something else, and where the story is less likely to be familiar to the viewer: the Soviet cartoon Трое из Простоквашино (“The Trio from Prostokvashino”) has been dubbed into a number of languages, mainly from Southern Russia and the Caucasus, for example:

  • Ossetian
  • Ingush
  • Lezgian
  • Karachay-Balkar (I was very surprised by how difficult this language is to understand, I thought I would be able to follow it pretty easily after learning Kipchak languages from further east);
  • Lak
  • Kumyk
  • Tatar (under the translated title Простоквашинодан өчәү)

Clicking the links in the sidebar, one can find one’s way to other cartoons in various languages of the former USSR. There’s even an entire playlist of Ossetian-dubbed cartoons.

PIE roots as a mnemonic device in Farsi spelling

Persian roots in which a silent vāv must be written after an initial khe are often considered the bane of foreign learners of Farsi. I myself felt some discontent at having to learn this silly spelling rule after initially encountering Persian in the wonderfully clear Cyrillic script used by Tajiki. However, one of those little eureka moments one encounters in historical linguistics was that these words can be traced back to Proto-Indo-European roots with intial *sw-, e.g.:

  • خواهار ‘sister’ < PIE *swésōr;
  • خوابیدن ‘to sleep’ < PIE *swep‑;
  • خویش ‘himself’ < PIE *swe‑ (I guess, but even if I guess wrong, it still helps to remember).

Thus, a little knowledge of PIE can instantly serve as a mnemonic device in some tricky aspect of a language that arose millennia later.

Increasing age may make it more challenging to learn a language to real conversational proficiency and lose that accent, but I’ve been so encouraged lately by how a decade-plus of sometimes focused and deliberate, but just as often casual and absentminded, learning provides remarkable benefits in reaching a middling level effort-free. Another example is when I recently picked up an intermediate-level reference for Japanese grammar (a language I’ve never formally studied) and realized that I know most of the words used in the example sentences purely through some kind of osmosis over the years. It is wonderful how everything out there ties together somehow. Now if I could just have these fruits of a decade’s experience and have that decade itself back…

Mari words in Cheung’s Studies in the Historical Development of the Ossetic Vocalism

J. L. Cheung’s Studies in the Historical Development of the Ossetic Vocalism (Wiesbaden, Reichert Verlag, 2002), which goes well beyond what its title suggests, is in many respects an updating or refinement of Abaev’s Ossetian etymological dictionary. Cheung’s monograph also has an index for each of the languages, Iranian or otherwise, drawn on in the work. Unlike Abaev’s enormous, and mostly wrong, use of Mari, Cheung limits his etymologies to just four Mari words: βerɣe ‘kidney’, kutkə̑ž ‘eagle’, ož(o) ‘stallion’ and pire ‘wolf’.

Thus we are on much firmer ground than in Abaev’s dictionary, although Cheung again misrepresents the Mari word for ‘wolf’ as pirägy, and that is probably a borrowing from Tatar anyway.

Mari words in Abaev’s etymological dictionary of Ossetian

V. I. Abaev’s Историко-этимологический словарь осетинского языка (published in four volumes in 1958–1989) is quite famous and I was happy to discover a PDF on everyone’s favourite filesharing community for linguistics books. You can also order a paper version from some Russian online bookstores as print-on-demand. However, it wasn’t until I browsed the Helsinki library shelves that I discovered there was an index for it as well. The Указатель volume was published in Moscow in 1995.

(Furthermore, Abaev also published 22 pages of addenda and corrections to the dictionary as his contribution to the Festschrift for Ladislav Zgusta Historical, Indo-European and Lexicographical Studies ed. Hans H. Hock, Berlin: Mouton de Gruyter, 1997.)

The index contains sections for all the various languages Abaev dealt with, including individual Finno-Ugrian languages. As I am very interested in late East Iranian loanwords in Mari, I looked at what Mari words Abaev had mentioned. Below I present a list, with Abaev’s representation of the Mari (a jumble of transcriptions and dialect forms) replaced by the Tscheremissisches Wörterbuch headwords. Unfortunately, most of these can be treated as Chuvash or Tatar loanwords, inherited Uralic vocabulary or coincidential resemblances, and certainly not as the result of direct Iranian–Mari contact. Clearly the field has moved on since Abaev’s heyday.

Mari Ossetian Page Better etymology
alaša ‘gelding’ alasa id. I 44 < Tatar
čəgət cyxt ‘cheese’ I 328 Not in TschWb, but if Mari it would be < Chuvash
kə̑ńe ‘hemp’ kättag ‘cloth’ I 590
keńe gän id., kättag ‘cloth’ I 513, I 590
kerde ‘sword’ kard id. I 571
kož ‘spruce’ k’ozä ‘conifer shoot’ I 638 < PU *kose
kukšo xysk’ id. IV 270
mör ‘berry’ myrtkä id. II 141 < PU *mïrja
naməs namys id. II 155 < Tatar
pire ‘wolf’ biräğ id. I 263 < Tatar
pursa ‘pea’ pysyra ‘nettle’ II 248 < Chuvash
rüzem ‘to shake (trans.)’ rizyn ‘to shake (intrans.)’ II 418
rə̑βə̑ž ‘fox’ ruvas id. II 434
sokə̑r ‘blind’ soqqyr id. III 138 < Tatar
šu ‘bristle, fishbone’ syg ‘barb’ III 186
šüĺö ‘oats’ syl ‘rye’ III 194 < Chuvash
šur ‘horn’ sy id. III 181 < Proto-Iranian
toβar ‘axe’ färät id. I 451
tomaša ‘strange thing; commotion’ tamaša id. III 228 < Chuvash or Tatar
tul ‘stormwind’ tyfyl ‘whirlwind’ III 328 < Cv. tăvăl or Tat. tawïl
tumna ‘owl’ tojmon id. III 298 < Chuvash
tə̑rke ‘young pine’ tägär ‘maple’ III 252 TschWb says < Tat./FU?
umla ‘hops’ xymlläg id. IV 262 < Chuvash
uža ‘sells’ wäj id. IV 67 < PU *wosa, borrowed from PIE
βaraš ‘hawk’ wari ‘falcon’ IV 50
βürɣeńe ‘copper’ ärxy id. I 186
[eŋer-]βaze ‘fishing rod’ wis ‘rod, pole’ IV 111
βerɣe ‘kidney’ wyrg IV 123

It’s worth mentioning that Abaev’s supposed Mari word for ‘wolf’ is pirägy, clearly from MariE pire but in itself clearly erroneous. Abaev’s ghost word was later perpetuated in J. L. Cheung’s Studies in the Historical Development of the Ossetic Vocalism, p. 173, about which more later.

Pashto has an unusual 1 sg./pl. proclitic ra

Within the Indo-European languages, one becomes so used to seeing first-person singular pronouns starting with (V)m- that I was very struck to learn (from an aforementioned book on clitics) that Pashto has a first-person singular/plural proclitic ra. Where could such an odd form come from? Robson & Tegey’s article on Pashto in Routledge’s The Iranian Languages ed. Gernot Windfuhr provides no explanation, though it does show that Pashto’s 1 sg. and 1 pl. enclitic pronouns are me and molam, the shape one would expect.

I found an answer in A New Etymological Vocabulary of Pashto (Wiesbaden: Dr Ludwig Reichert Verlag, 2003), an updating of Georg Morgenstierne’s classic 1927 work. The entry for ra reads:

, Kh , Afr (Km) ər-, pronominal adv. of the first person ‘to/for me, us; here, hither’ — Darm. < Av aθра᷌ ‘here’ ( < *aθра᷄, ər < *áθра); Av iθра ‘here’ is also a possible etymon; cf. Orm K hir, ; L īr, ar.

(By Avestan Morgenstierne in fact meant from an Old Iranian form of which the Avestan equivalent is…)

Diglossia in Pamiri music and poetry

The cover of volume 5 of Smithsonian Folkways’ series Music of Central AsiaI was excited to finally get Badakhshan Ensemble: Song and Dance from the Pamir Mountains, the fifth album in Smithsonian Folkway’s series Music of Central Asia (volumes 1–7 are of great ethnographic interest, after that the series descends into World Music crossover gimmickry), but I must say that I’m disappointed that of the nine songs on this compilation, only one is in a Pamiri language (Shughni).

The rest are in Persian. I knew that Tajiki was widely known in the area since it is the official language of Tajikistan, but I guess I wasn’t expecting the inhabitants of the Pamirs to be so passionate about the same Persian classical poetry as the lowlanders. The songs here are attributed to Rumi and Hafiz (such attributions are sometimes spurious, the liner notes say), and in the DVD that accompanies the set, an old man speaks of the wisdom that Persian classical poetry contains.

It’s not just this one CD+DVD set. The texts in Gabrielle van den Berg’s Minstrel Poetry from the Pamir Mountains: A Study on the Songs and Poems of the Ismailis of Tajik Badakhshan (Reichert Verlag, 2004) are mainly, the author tells me, in Tajiki with a lesser number in Shughni.

On YouTube as well, a great deal of Pamiri cultural videos are in Tajiki. If you are patient, you can eventually find something in Shughni, though this doesn’t especially repay one’s effort since the audio quality is usually painfully low.

Stable and unremarkable diglossia like Westerners using Latin for classical/liturgical music, or shame of one’s native language? Eventually I’ll travel to Badakhshan and try to figure out the language relations firsthand.

Perso-Arabic vocabulary in Tatar

The great thing about learning Tatar vocabulary is that, with a little effort at finding out the different spellings, you often get Farsi and Tajik vocabulary (and Arabic, Turkish, a lot of Caucasian languages…) for free. Here’s a list of just a few recent things I’ve acquired:

Tatar Farsi Tajik
игътибар ‘attention’ اعتبار
хөрмәт ‘respect’ حرمت хурмат
һөнәр ‘specialization, focus’ هنر ҳунар
дәрәҗә ‘rank, authority’ درجه дараҷа
табигать ‘nature’ طبيعت табиат
дәвам ‘duration’ دوام ‘durability, endurance’ давом ‘duration’
шигар ‘slogan’ شعار

There may well be Tajik cognates for the two missing items, but unfortunately I never managed to buy a Tajik-Russian dictionary, and I can’t figure these out with my Russian-Tajik dictionary.

Inscriptions in Nepal

On my recent trip to Nepal I came across two inscriptions of linguistic interest.

The first is an unusual inscription in Kathmandu’s Durbar Square. This was placed here by King Pratap Malla in the 17th century. The king was a linguaphile and this poem to the goddess Kali includes words from 15 scripts and languages. According to an article in the Nepali newspaper República these are Persian, Arabic, Maithili, Kiranti, Newari, Kayathinagar (the script then used in western Nepal), Devanagri, Gaudiya, Kashmiri, Sanskrit, two different Tibetan scripts, English and French.

You can clearly make out French l’hiver ‘winter’ and automne ‘autumn’ as well as English winter.

Sadly, a significant part of this inscription has already been effaced. Indeed, the same is happening to most of the inscriptions in Durbar Square, and in spite of its UNESCO World Heritage Site status nothing is being done to protect them.

The second interesting inscription is on the pillar that the Emperor Ashoka set up in the 3rd century BC in Lumbini, the birthplace of the Buddha. This Prakrit-language proclamation releasing Lumbini from tax obligations is written in the Brahmi script. The plaque standing in front of the pillar has a Latin transliteration and translations into English and Nepali.

Iranian from quincunx and back again

When I first became acquainted with Persian some years ago, two grammatical features seemed unusual to me from an Indo-European perspective. One was the ezafe construction, which I eventually learned was the product of contact with Caucasian languages. But the other was the formation of the present tense with a prefix me‑ (indicative) or be‑ (subjunctive) followed by the verb stem and personal endings. In his chapter ‘Dialectology and Topics’ in Routledge’s The Iranian Languages pp. 24–25, Gernot Windfuhr offers a fine summary of the changes that produced the modern Persian system of tenses, which not only clarifies the origin of me‑ and be‑, but shows that Persian has returned to the same five-member tense/aspect system that Iranian (like Greek) started off with.

The history of the parameters and axes of the verb systems from Old Iranian to Modern Iranian shows a cycle from a five-member quincunx to varying Middle Iranian systems back to a quincunx. The development is shown here with the example of Persian.

The inherited fundamental and primary verbal parameter of the Early Old Iranian system is triple aspect which intersects with the binary tense parameter of present and past (marked by the augment a‑). It is centered on the perfective aorist:

Early Old Iranian
Present Past
Imperfective PR a-PR “Present system”
Perfective AOR “Aorist system”
Resultive-stative PF (a-PF) “Perfect system”

In time, this triple aspect system was reduced to forms of the “present” system, i.e. imperfect present and imperfective past, leaving only a few forms of the aorist and the perfect. With their loss, the highly complex inherited system was reduced to a single imperfective stem, distinguishing present vs. augmented imperfect: PR vs. a-PR.

Concomitantly, however, the vacated aorist and perfect ranges of the system were partially filled by the innovation of a new perfective system based on the adjectival completive participle in -tá plus the present and past copula, with both intransitive and transitive verbs.

In Middle Persian, the resulting four-member system of two imperfective and two perfective forms was extended by replacing the copula with the stative verb ēst‑ ‘to stand’. The outcome was a six-member system with a triple aspect axis and a binary tense axis:

Middle Persian
Present Past
Imperfective raw‑ (a-raw‑) present imperfect (later lost)
Perfective raft COP raft būd COP preterit past preterit
Resultive-stative raft ēst‑ raft ēstād COP perfect pluperfect

In addition, the adverb hamē lit. ‘forever’ expressed ongoing and progressive action as well as continuing state, while its pendant (homophonous with the adverb ‘out, away’) expressed the singularity of an event in present and past and assumed inchoative or future connotation with the present stem.

In Early New Persian, (ha)mē‑ and bē‑ were continued, but the periphrastic resultative ēst‑ forms were replaced by extended forms based on the verbal adjective in -tag (< *-taka). bi and could still occur with these verb forms, and neither was obligatory. The core system in terms of frequency was the following:

Early New Persian
Present Past
Imperfective mē-raw‑ mē-raft‑
Perfective bi-raw‑ bi-raft‑ inchoat.-fut. singularity
Unmarked raw‑ raft‑ gen. present gen. past
Resultive-stative raft-a COP raft-a bud‑

Subsequently the system was restructured by the coalescence of the unmarked forms with the perfective forms by the fifteenth century.

  1. In the present, the perfective bi-form assumed distinct subjunctive function, alternating with the unmarked general present form, now opposed to the indicative present-future -form.
  2. In the past, the general unmarked form subsumed the function of the bi-form to express both general and perfective events, now opposed to the imperfective -past form. It thereby assumed the central role of an aorist in the resulting five-member system.

The core of the system became thus as follows, and has not changed since:

Pre-Modern, Indicative
Present Past
Imperfective mē-rav‑ mē-raft‑
Perfective raft‑
Resultive-stative raft-a COP raft-a bud‑

The non-indicative sub-system developed in parallel to the indicative core, using the imperfect and past-perfect forms for irreal function, and using the present subjunctive of ‘to be’ for the perfect subjective:

Pre-Modern, Non-Indicative
Present Past
Imperfective bi-rav‑ mē-raft‑
Perfective raft‑
Resultive-stative raft-a bāš raft-a bud‑