Andreev’s Chuvash textbook and what’s wrong with it

I wrote this review of I. A. Andreev’s Чувашский язык. Практический курс 3rd ed. (Cheboksary: Чувашское книжное издательство, 2011) ISBN 9785767018130 for a book-rating website, but I thought I should also post it here where it is probably more likely to be read. The cover of Andreev’s textbook (3rd ed. 2011) While I do love to just rant about this and other poor learning resources, I think it would be helpful if this book’s flaws were known, as one can avoid being too greatly disappointed. I remember how thrilled I was to discover the book nearly a decade ago, and how quickly my bubble was burst.

Continue reading Andreev’s Chuvash textbook and what’s wrong with it

An anachronistic paired word in Chuvash

I have written here before about the use of paired words in the Volga–Kama languages to denote an entire class of things, e.g. Chuvash yïvăś-kurăk ‘vegetation’ < yïvăś ‘tree’ + kurăk ‘grass’.

An amusing consequence of this is a jarring anachronism if one of the items in the paired word construction was discovered or invented after the event being described. Consider the following from a Chuvash children’s text on the history of the Olympic games: Хӗҫ-пӑшаллӑ ҫынна Олимпие кӗме юраман ‘[In Ancient Greece] people bearing arms were not allowed into Olympia.’

The paired word here is xĕś-păşal ‘arms, weapons’, made up of xĕś ‘sword’ and păşal ‘rifle’. Obviously there were no rifles in Ancient Greece, but apparently the paired word has become so lexicalized that an author can legitimately use it in any historical context.

Battle of the etymologists

The verb MariE püč́kampəčkäm ‘cut off’ is funny. In the Uralisches etymologisches Wörterbuch (367) the word is derived from a supposed Proto-Uralic *pečkä‑ (päčkä‑) ‘to cut’ on the basis of North Saami bæsˈkedi‑ ‘cut hair or wool off’ and Mordvin E M pečke ‘cut off, chop off’. Bereczki upholds this etymology in his Etymologisches Wörterbuch des Tscheremissischen (Mari) without mentioning any alternatives.

On the other hand, Fedotov in his Этимологический словарь чувашского языка (I 409) etymologizes Chuvash păčkă ‘saw’ on the basis of Turkic – namely the widespread *pïčak/bičäk ‘knife’ – and claims (again without mentioning any alternative) that MariW pəčkäm is a borrowing from Chuvash. Who is right here?

There is only one Uralic etymology in Bereczki where *pe‑ gives MariE pü‑, namely püńč́ö ‘pine’ < *penčä (UEW 727). Otherwise pü- in Mari is normally from *pä‑, e.g. pükš ‘hazelnut’ < *päškз (UEW 726–7). However, if we assume that the Proto-Uralic form was *päčkä, that would conflict with the Mordvinic forms, as Moksha Mordvin usually preserves PU *ä and does not raise it to e. I suppose that is why the UEW placed a question mark before the Mordvinic forms.

Can derivational morphology settle the question? The frequentative of this verb is püč́keẟem, and a quick search of the Mari–English Dictionary shows that ‑eẟem is overwhelmingly found in inherited Uralic vocabulary (or at least pre-Chuvash borrowings), not Turkic loanwords. It is not exclusively so – note joɣeẟem ‘flow’ < Chuvash and tojeẟem ‘hide’ < Tatar – but I would think it probable that MariE püč́kem is inherited.

Ultimately, however, with the resemblance between the Proto-Turkic and Proto-Uralic forms, we might have to take the dreaded notion of “sound symbolism” into account here, something which usually makes me want to drop the question entirely, leaving it for someone else with a greater gift for linguistics.

More Chuvash and Mari at OpenStreetMap

I am drawing up a table of placename abbreviations from Ashmarin’s Chuvash dictionary along with their geographical coordinates, e.g. Урас-к. = д. Ураз-касы, Янтиковского района ЧАССР = 55.571, 47.7352. This will allow me to more easily map the distribution of some isoglosses that have interested me. For the most part, it has been very easy to link Ashmarin’s villages with contemporary ones, though there are a small number of villages which either no longer exist, or which were drastically renamed after the October Revolution.

In the course of doing this research, I’ve added the Chuvash names for several hundred villages in Chuvashia and in the Chuvash diaspora to OpenStreetMap (a project I am passionate about, as I described here). One of the strange things I’ve discovered is that Tatars and Bashkirs are more likely to recognize Chuvash than editors from Chuvashia. Very, very few villages in Chuvashia were marked with a Chuvash name on OSM when I began this project, but villages in Tatarstan and Bashkiria that historically had a Chuvash population were often marked with the Chuvash name alongside the Russian, Tatar or Bashkir name.

In two instances for Chuvash villages within Chuvashia, someone had specified the Chuvash name not with the name:cv tag but with the old_name tag, which just breaks my heart.

Many of the Chuvash placenames floating around the internet were drawn from the Chuvash Encyclopedia, an authoritative reference source. However, the Chuvash Encyclopedia was digitized at some early time when Chuvash fonts weren’t thought widely available. Thus, for the Chuvash letters ҫ,ӗ,ӑ,ӳ, the Chuvash Encyclopedia actually uses the similar-looking codepoints from the Latin-1 block of Unicode, not the Cyrillic block. Because the names were copied and pasted elsewhere, this error persists in the Tatar-language Wikipedia and some OpenStreetMap points. I suppose I’ve have to write a script to automate correcting these on OSM.

For the moment I am not so enthusiastic about adding Mari placenames, because existing Meadow Mari/Eastern Mari placenames are marked up variously with name:chm and name:mhr. I’ve never thought about the existence of three ISO 639-3 codes for Mari (Mari in general and Meadow Mari/Eastern Mari respectively, plus mrj for Hill Mari) as a problem before, but because OSM generates map tiles based on one and only one ISO 639 code, some Mari-language names will not be visible whichever code one chooses. I suppose this too will have to be automated with a script, however redundant it might seem to add both name:chm and name:mhr to every single point.

The tangled etymology of tvorog

Chuvash turăx ‘sour milk’, according to Fedotov’s etymological dictionary, has a secure Turkic etymology, cf. Old Turkic tar ‘buttermilk’. This word in Chuvash or some other early Turkic language is probably the source of Common Slavonic *tvarogŭ (which was even further borrowed into German as Quark) as well as Hungarian túró. It is also claimed by Fedotov to be the origin of MariE torə̑k, MariW tarə̑k.

There is, however, another very similar Chuvash word both in meaning and in phonology that Fedotov does not connect with turăx, namely tăvara (Viryal tora) ‘small cheese’. (This is the source of MariE tuara, MariW tara ‘curd cake, cheese pastry, curd pancake’, Tatar tura ‘homemade cheese’.)

Looking at these two words together, I wondered if Cv. tăvara is the inherited Turkic word, while turăx is a reborrowing from Russian. Loss of a final voiced velar is a normal occurrence in Chuvash, cf. ura ‘leg’ < *aẟaɣ < *aẟak, so tăvara is to be expected.

While any connection goes unmentioned by Fedotov, both Chuvash words are treated together in Róna-Tas & Berta’s West Old Turkic under Hu. túró. In the course of their discussion, they write:

The WOT form toraɣ may be a denominal derivation from tor(ï) with the denominal suffix +rAk, which originally served as an intensifier and later a marker of the comparative degree. The voicing of the final -k is problematic here, because the suffix exists in Chuvash and is +rAx. It is, however, possible that the -x is a second intensifier in Chuvash, +rAk > +rAg > +rA > +rA+x.

Assuming reborrowing from Russian, could the unvoiced final velar in Chuvash turăx reflect the devoicing of final consonants in Russian after the loss of the yers? How old is that feature anyway?

The vocalism of the Turkic roots is problematic. Cv. tăvara assumes original first-syllable *-o/-u, but the word has been compared to Old Turkic tar with its original *a.

The ancient Indo-European comparanda τυρός ‘cheese’ and Avestan tūiri ‘cheeselike milk, whey’ (see Beekes Etymological Dictionary of Greek) make me wonder if this is a steppe Wanderwort.

Mari šaške ‘mink’ borrowed even into South Kipchak

MariE šaške, MariW šäškə ‘mink’ and Finnish dial. häähkä ibid. have some kind of old relationship with Lithuanian šẽškas ‘polecat’. Whether it’s a Baltic > Uralic loan or vice versa doesn’t matter, the match is very old, and therefore we must assume that Chuvash šaškă ‘mink’ is a loan from Mari.

Moving on to the Volga Kipchak languages, we find an irregular initial correspondence in Tatar čäške ‘mink’, but one could suppose that we are dealing with the same word. Äxmatjanov’s Tatar etymological dictionary, at any rate, accepts a Mari etymology. And then the word is also found in Bashkir, as šäške.

Now, the most interesting aspect of all this, is that the word is found in Kazakh. Though standard Kazakh has suw küzeni for ‘mink’, Radloff recorded a form čäške ‘some kind of aquatic animal’, and this must have been borrowed from Bashkir. When I first began studying the Volga-Kama region, I would compare features found there to Kazakh, and if they were present in the latter, assume that they were either from Proto-Kipchak or at least from outside the Volga–Kama area. However, at least some words have been borrowed from North Kipchak to South Kipchak, and ‘mink’ is another one.

With so much language learning, how does one ever publish anything?

A couple of years ago I quoted a statement from an introductory Altaic studies textbook that the continual language learning in this field means a lifelong commitment. It’s one thing to continually learn languages over one’s scholarly career to broaden one’s horizons, but lately it seems that so much language learning is imposed that I cannot ever actually finish a journal submission.

This is how things have gone so far:

  1. When I began my studies of Finno-Ugrian linguistics, my initial concern was just Mari, which struck me as the Uralic language with the most readily assimilable grammar, and Russian so that I could use the only decent textbook of Mari available at the time. (Of course I was learning Finnish too as a foreigner in Helsinki, and Saami, Erzya and Nenets as other coursework.)
  2. After a few months it became clear that one can hardly do anything with Mari without having real proficiency in Chuvash and Tatar.
  3. A few months after that, I saw that understanding the Turkic languages of the Volga–Kama area requires some knowledge of what they were like before they arrived in that part of the world. So, numerous references on the Turkic family in general were added to my reading list, and I had to learn a couple of other Turkic languages (I chose Turkish and Kazakh) to act as a sort of control group for Volga Kipchak.
  4. As the years went by, it became clear that I had considered enough the relationship of the Permian languages with Mari, so courses of Udmurt and Komi became obligatory before I could even dare to comment on the prehistory of Mari. The Ob-Ugrian languages are another area I should strengthen.

At the moment I’ve got a Mari-related research project that I would very much like to bring to publication, but I have the feeling that I will not have done my scholarly due diligence unless I get two more languages under my belt, namely Moksha Mordvin (Erzya Mordvin is not enough) and Ossetian. I’m very worried that the latter is going to lead to even more things to follow up on in Iranian. This could bog me down for years.

The low-hanging fruit in Uralic studies has long been taken. I think it virtually impossible now to publish a paper on Mari considering only that language and no others around it. To someone today, it seems incredible that in 1950 Thomas Sebeok was able to score another entry on his list of publications simply with a two-page article on how Mari family names or patronymics typically precede a person’s own name.

Do scholars who frequently publish simply say at some point OK, I’ve got enough data now and I am collecting no more? Are they not scared that during the peer review process some possibly more knowledgeable scholar is going to condemn them for overlooking data from another language spoken far away but nonetheless essential to the subject?

Gemination in Viryal Chuvash as evidence of a substrate

The conference collection Volgan alueen kielikontaktit (Turku, 2002) has a paper by A. V. Emel’janova titled “Геминация — черта субстратной лексики языка-предшественника” that I’ve read a number of times over the years and have struggled with.

Emel’janova’s thesis is essentially that occurences of gemination in the Malokaračin variety of the Viryal dialect provide evidence that the Chuvash there absorbed a Mari population, e.g. ačča ‘child’ (literary language ača), vălčča ‘roe’ (lit. vălča), śamkka ‘forehead’ (lit. śamka), etc. That is, under the influence of the presence of unvoiced medial consonants in the local Mari dialect, Chuvash medials became unvoiced, and Chuvash has a rule that unvoiced stops be pronounced as geminates.

What is especially interesting about Emel’janova’s data is several Chuvash words which don’t show gemination: kalča ‘shoots, young growth’, molča ‘bath’, xănča ‘when’, on čońa ‘at that time’, kĕntele ‘bundle of fabric prepared for spinning’, śınsančan ‘from people’, măntra ‘ball’.

Of these words without gemination, molča < баня and kĕnčele < кудель are early loanwords from Russian, and we know that there was a voiced d in the latter. Agyagási has identified măntra as a borrowing from the Late Gorodets population, so this may help to establish the presence of voiced stops in that language. Conversely, the word kukkăl’ ‘pie’ (lit. kukăl’, also in Agyagási’s Late Gorodets wordlist, shows gemination, so can we reconstruct a system of both voiced and unvoiced consontants for the Late Gorodets language?

But what confuses me about all this is the chronology. Bereczki’s view (Gründzuge der tscheremissischen Sprachgeschichte) was that the Volga Bulgars did not settle in the north of Chuvashia and absorb its Mari population until just before the Mongol invasion in the 13th century. Emel’janov on the other hand claims that the Bulgars took the north of Chuvashia and absorbed some of its Mari population already in the 9th and 10th centuries.

Furthermore, the word куделе was borrowed into Volga Bulgarian early, before the loss of nasal vowels in Old Russian (which probably occurred by the 11th century on the basis of the Ostromir Gospels). If the substrate population for this Chuvash dialect pronounced medial -č- as unvoiced in the Chuvash vocabulary they took on, and thus caused gemination, they should have done the same with kĕnčele, that is, unless the word was borrowed after Chuvash and the substrate came into contact. Therefore, if the substrate were Mari, then intensive Chuvash and Mari contact would have to be traced back to at least two centuries earlier. Do we want to do that?

Mari koma ‘otter, beaver’ and its Chuvash and Tatar analogues

The first prayer in Paasonen’s collection of Eastern Mari texts has several paragraphs of supplications for a fruitful hunt that really tests one’s familiarity with Mari animal names: swans, martens, lynxes, bears, elk, etc. etc. One animal hitherto unfamiliar to me is koma, which Paasonen’s dictionary glosses as what should be two separate animals:

выдра / Otter; aus seinem Leder verfertigten die Tscheremissien vormals Mützen; боберъ J 88, выдра Tr. koma-jol Fuss(fell) des Otters (1038). koma-upš Otterfellmütze (1283. [Tschuw. Zol. xoma выдра, tat. kama.]

Tscheremissisches Wörterbuch also says [< Tschuw. / Tat.], though the presence of the initial velar should establish that koma was borrowed from Tatar and not Chuvash; Hill Mari ama ‘beaver’ is the Chuvash loan.

In Chuvash, Ashmarin lists only xoma, so the word is limited to the Viryal zone and it is probably a borrowing from Tatar. Fedotov’s etymological dictionary also lists a form xăma, but it is not clear where he got that from, because he cites only Ashmarin.

In Tatar, kama is part of the literary language and my Tatar-Russian dictionary defines it as выдра. Äkhmat’janov’s etymological dictionary draws a not very convincing comparison with Old Turkic kam ‘shaman’, but kama with the same meaning as the Tatar is found in Siberian Tatar, and Shor has kamnaɣïs.

So, the word appears to have been brought to the Volga–Kama area from elsewhere. However, that strange distribution within Turkic makes one want to look at other Siberian language families (though a cursory glance at a Ket dictionary shows nothing similar-looking under выдра and бобр).

As the definitions of this term in the various languages are very much bound up with the notion ‘fur-bearing animal’, one might link Ashmarin’s xumă ‘sable’ (from an earlier *kam-ïK?) to the Tatar word.

Obscure paired words in Mari and Chuvash

Mari, Chuvash, Tatar and Udmurt occasionally employ two paired words to denote an entire class of objects or people, often transcending the two specific items named in the paired word expression, e.g. Chuvash yïvăś-kurăk ‘vegetation’ < yïvăś ‘tree’ + kurăk ‘grass’.

Frustrating, though shedding light on a prehistoric stage of the language, are situations where one of the items in the paired word expression (typically the second) is no longer used. I’ve been collecting a few of these recently. Paasonen’s Eastern Mari texts present iɣe-šuβo ‘child’, where the first element is the still common iɣe ‘offspring, animal young’. The second element, however, which is also represented dialectally as šə̑βe, has no clear meaning and only survives in this expression. Tscheremissisches Wörterbuch only marks the entry for this with a W to note that it is a “Volgaic” item; my Mordvinic dictionary is presently a continent away, someone could leave a comment if they know what Erzya or Moksha word is proposed as a cognate. (Note that this Mari paired word was ultimately combined into a discrete and unanalysable lexical item in икшыве, which is firmly part of the literary language).

Chuvash too has such mysterious expressions. tïră-pulă ‘grain’ (< tïră ‘bread, grain’) is mentioned in Ashmarin’s dictionary as a compound (with the variant form tïră-pul), but the second item is not explained. Perhaps it is not entirely unreasonable to connect this to the verb *bol ‘to be’ with the sense ‘sustenance’, not an unusual way of referring to food. Other meanings attested in Turkic for a root reconstructable with the shape *bVlïK, namely ‘fish’, ‘wound’ or ‘honey’, don’t seem to belong here.

Chuvash küršĕ-aršă ‘neighbours’ (< küršĕ ‘neighbour’) is also mentioned in Ashmarin without any explanation of its second element. Ditto for ača-păča ‘children’ (< ača ‘child’), which may simply be a reduplication for expressive purposes but nonetheless demands investigation.