18 May 2005

language and star wars

The impending release of Episode 3 has generated some amount of talk over the use of language varieties throughout the two Star Wars trilogies: notably with respect to Yoda's syntax and to the use of varieties for non-humans. Funnily enough I used Star Wars as an example of the manipulation of language in film several weeks ago in class (but I left Yoda out of it).

Serious research in language in these films is hard to find. "Star wars" registers some hits in LLBA, with respect to argumentation surrounding Reagan's SDI, and to a so-called whale call. No film-related work appears. Likewise, "Yoda" only shows up as an author's name. I know only of this unpublished syntactic analysis:

Botma, E., E.J. van der Torre, and M. Zimmerman. 2000. With You the Force May Be: Explorations into the Syntax of a Jedi Master. Paper presented at TiN-Dag 2000, Utrecht.

They cite David Crystal's take on the matter:

"The rarity of OSV constructions and languages perhaps explains the impact of this strange speech style used by the Jedi Master, Yoda, in the film Return of the Jedi (1983)." (Crystal 1987:98)

Botma et al's basic findings were that in addition to huge amounts of topicalization, Yoda uses a fair amount of V2 structures and lacks do-support, features found in Old English among other languages. (So a possible interpretation of Yoda's speech is that it reflects his age - he's so old he speaks an earlier version of the language).

I think one thing to keep in mind with Yoda is that his newer scripts may differ somewhat in structure from his older ones. (This is also true of his recent Diet Coke commercial - Yoda has sold out! - in which he topicalizes an imperative: "That Diet Coke, give Yoda").

As for other varieties, Eric Bakovic discusses human/alien interactions in the original trilogy. I spent some time searching blogs and messageboards about the representation of alien speech in these films, and a contrast between the two trilogies emerges. In episodes IV-VI, aliens speak other languages, usually understand humans, but are understood only by some humans. Eric shows this to be true of the droid R2D2 as well. In Episode I, aliens speak foreign-accented English. For example, the crafty Neimodians speak with Japanese accents, while a shady alien merchant has a Jewish accent. Meanwhile, the faithful but dumb Jar-jar speaks with a poor rendition of some kind of islander creole. Likewise, Queen Amidala's body double speaks stiffly British, while Padme (the Queen undercover), keepin it real, has an American accent.

Much online discussion revolves around whether these manipulations are racist. It's not hard to argue that the linguistic stereotypes invoke unfair cultural stereotypes. Skeptics, however, claim that George Lucas shouldn't be labeled racist, since he has an ethnically diverse cast of humans. I think it's reasonable say that the casting is not racist but the linguistic manipulation is.

16 May 2005

a host of shifting sports metaphors

I recently discussed the "get untracked" construction, noting its use by King Clancy and speculating about when the phrase may have been adopted from the baseball subculture by the hockey subculture. I also labeled it as a peripheral construction that marks the discourse of sports media.

A bunch of examples have popped up in the last 24 hours of similar constructions passing in and out of sport, as well as across sports, and I outline them here.

Final Four/Drop the gloves: The winner of Survivor Palau mentioned in one of the final tribal councils that his plan for the game had been to help a select few teammates along to the final four, at which point they would drop the gloves and duke it out. "Drop the gloves" is a hockey-specific phrase that refers to part of the ritual that precedes an on-ice fight. Of course it was not meant literally in Survivor's context, but the example indicates a metaphorical non-sport non-fighting usage for the construction. Meanwhile, "Final four" (apparently of a March Madness origin) seems to be a pervasive way of saying almost-last-man-standing, even in a format (like Survivor) that does not use a 2 by 2 semifinal elimination.

Hat Trick/Triple Crown: The Czech win in the ice hockey world championships has given several players a world championship to add to their Olympic gold and Stanley Cup victories. This story uses both "Triple Crown" and "Hat Trick" to describe this rare collocation.

"Hat Trick" evidently has its origins in cricket, and in its extention to soccer and hockey, it has come to mean an achievement in which a player scores 3 goals in one game. Hockey also has the rare "natural hat trick" (three unanswered goals) and the even rarer "team hat trick", a series of three consecutive championships. Such a string of wins is known elsewhere as a three-peat; I have been able to locate lots of discussions of -peats up to nine-peat. Above that, the search gets side tracked with discussions of "ten peat samples", but I found a seventeen-peat:

Repeat is no problem here; three-peat certainly makes sense; four-peat begins pushing it; seventeen-peat begins to knock on absurdity's door.

As for "triple crown" in hockey, I'm not sure if I've heard it applied this way before, but I know the list of people with those three pieces of hardware is very short. (I also find it absurd that Jagr would be called the 15th player to achieve it while Slegr would be 16th!)

Unlike the newish application of "Triple Crown" to hockey, other uses of it require the wins to occur in the same season. A same-year triple-crown in hockey is logically impossible, given that the IHWC normally coincides with the first several rounds of the Stanley Cup playoffs. (and if the scheduling were different, it's still physiologically highly unlikely, given the combined grueling effects of a regular season, 2 international tournaments, and 4 best-of-7 playoff rounds that a player would need to complete a same-year triple crown).

I'm guessing "Triple Crown" started with a combination of wins at the Kentucky Derby, the Preakness, and the Belmont. I've also heard it applied (oddly) to golf, in reference to a win in the US Open, British Open, and Canadian Open.

The multiple-winner phrase more familiar in golf is the Grand Slam, a combination of the US Open, Masters, British Open, and (ack, I forget ... PGA championship?). The same phrase applies in tennis, as a combination of wins at the US Open, French Open, Australian Open, and Wimbledon. Both extend the "four-at-once" notion of a baseball grand slam (home run with the bases loaded; so 1 h and 4 rbi), and both require the same-year restriction for their use. [Update July 8 2006: Little did I know that Grand Slam has older roots in bridge. See here for speculation that its trajectory took it from bridge through baseball to golf and tennis.]

A same-year championship in English soccer is a league double: a team needs to win both the Premiership and the FA Cup. Manchester United once won a triple: a double plus a victory in the pan-European Champion's league. We'll see if either usage makes it into another sport.

12 May 2005

public embarassment

In the midst of a momentary conniption, I bungled my last post enough to disable its comment function and render the whole thing uneditable, permanently. [and in the meantime, this post seems to have overwritten it!]

A potential commenter alerted me to this fact, while pointing out the apparent oddity that it would take hours to alphabetize less than a million lines. I described my code to this person, who kindly replied:

[acw]: Sorting is so universally useful that there is a sorting utility built into Java. Create an array of strings and pass it to java.util.Arrays.sort(array, 0, length). It returns void, and side-effects the array to put the elements into alphabetical order.

In short, this means I didn't need to write the code to the extent that I did. I'm blushing at the prospect of all the java-savvy blog browsers witnessing my clunky coding skills in such a public venue. And I can't even delete the post! (although, I wouldn't if I could).

I took acw's advice and wrote a new program that sorted my test file in 5 seconds rather than 5 hours. I emailed to thank acw, adding the following observation:

mind you, my old code lets you decide your own alphabetic ordering, and probably could be tweaked to alphabetize from word-ends rather than word beginnings. By the way, I had looked in vain for a sorting utility on the java website, which is why I embarked on making my own bubble-sorter in the first place.

Not that either of those applications is necessarily useful, but they could be if you're working with a language in which characters have a different alphabetic precedence.

11 May 2005

I broke my Word

MS Word that is.

Actually, I didn't break it, but I pushed it very very far, enough to put its word-count function off by 70,000. I've got this project that requires alphabetizing huge lists of words, and I challenged myself to write a Java program to do it for me (and it works).

Before I knew the actual word count of the test file, I opened it with Word, which told me there were 251,092 words in it. I ran the alphabetizer, which ran (hours) long enough for me to get frustrated enough to stop it. To be sure it wasn't simply seizing up, I added a feature that prints onscreen the number of words that have been alphabetized. Indeed, it took several hours before the number approached 250K. The rate of successful alphabetization slows down as the list of items to compare increases.

But then it kept going, far beyond 250K, and finally stopped at 319,604 words. I briefly entertained the notion that I had reached the upper limit of Word's word-counting capacity. Maybe, but Word correctly counted the number of pages - 5608, which it took several minutes to tally. At 57 lines per page and 1 word per line, this page count seems to be pretty accurate.

In fact, the counting tool is pretty precise. I added two words to the file, and the word count reflectd that. The issue is what counts as a word - a single apostrophe does, but anything in all caps does not. I therefore presume my file has around 70K such words.

Just out of curiosity, I decided to see how long it would take Word to alphabetize the lines in the same file. But Word says "The document is too big for word to handle".

the king gets untracked

A bit of a hub-bub today at Language Log regarding one of my pet interests, the linguistics of sports. Much of the discussion revolves around the usage of "getting untracked" to indicate ending a slump or spell of poor play.

Lila Gleitman mentions it in a hockey context, possibly involving the Flyers in the 60s. Mark Liberman uncovers quite a few baseball usages. I know I've heard it myself quite a bit - I googled {untracked hockey} and got about 4400 hits. Most of it is not that interesting, but shows a consistent usage of "get untracked" meaning "pull out of a slump" (Except one or two mentions of untracked snow).

Google is not a good place to look for historical data, but I found an overview of the career of King Clancy, a Senators phenom in the 20s who later would help Toronto to its first Stanley Cup. After retiring, Clancy became a coach; in this passage he describes one of his first coaching jobs:

Clancy was hired to coach the Montreal Maroons in 1937-38. "The team never got untracked in the one month I was there, and before I knew it, I was out of a job," King shrugged. The Maroons won six, lost eleven and tied one in the 18 games Clancy was employed by the Maroons.

It's not clear when this passage (part of an interview) was written, so it's not really evidence of anything other than "untracked" being used before 1986 (when Clancy died). But it does make me wonder whether he'd have used the phrase in 1937, or whether it entered his lexicon decades later.

I actually think a model of the sporting world's lexicon is in some sense warranted, but it's got to include a distinction between core concepts and peripheral constructions. The core concepts include the names of positions, equipment, and elements of rules, while the peripheral constructions mark the discourse surrounding a game or sport. "Get untracked" is one of these - if you see it or hear it, you can be fairly sure the topic is a team or player in a slump. The interest here is lexicographic - where the terminology or phrase comes from, whether it applies metaphorically outside the sport, and whether it transfers across sports (as "get untracked" probably did, from baseball to hockey, sometime before the 1960's). Whether a model of this kind of lexicon enriches our knowledge of linguistics is another thing, but it seems like a unique way of tracking the diffusion of sound/meaning/function triplets across time and (social) space.

02 May 2005

I know, eh?

So there's been all kinds of talk at Language Log regarding the Canadian tag particle eh?. It begins with Mark Liberman discussing a query he received from a reader regarding the particle in relation to modal and affective tags. In trying to find an answer he found a paper by Elaine Gold documenting sociolinguistic attitudes and native-speaker judgements about different functions of the particle.

For the record, tag eh can have both modal and affective readings. That's all I can offer for this issue -- whether they have slightly different intonational contours would require lots more data than I have access to.

Mark suggests that it may not be appropriate to rely on native speaker judgements for this kind of discourse element, at least for anything but the documentation of sociolinguistic attitudes about it. Surveys, though very useful, are not by themselves an adequate way to study such patterns of usage. Instead, actual discourse is a better place to look for data on these patterns.

I have to agree with this point, because of the risk that speakers would under- or over-report how much they've heard it and used it in various functions. (Gold acknowledges this issue in her paper). One issue I would have brought up had I attended that CLA was the interrogative usage, in which the particle is added to a sentence which is already a question with auxiliaries inverted. Gold's example is What are they trying to do, eh? Ignoring for a moment Mark's advice about native speaker judgements, I say this is an impossible usage of the word, unless the question is rhetorical.

Mark follows up, posting about electronically available transcripts of discussions from the Ontario provincial parliament, including instantiations of eh. Then, in Part 3, he comments upon an apparent "filled pause" from one of these transcripts, in which the particle seems to function like "um":

Mr Murdoch: [...] A couple of other ones: the stockyards, the money you talk about, is that the province's money? It is, eh, the money that you're -- who owns them?

I just find it hard to believe that the eh in this utterance is [both a pause-filler and] the same discourse item as the tag and narrative eh. There are at least two alternatives: it could be that the addressee nodded in response to the first question. Murdoch acknowledges the nod by saying "It is, eh?" and continues to the next sentence (starting at The money). In that scenario, it's a tag [and not a filler]. Or, it could be that the filled pause is actually a lax [ɛ], which is difficult to spell any other way, but which is distinct from the tense vowel of the tag (it would also have different intonation). Really, the only way to be sure is to have an audio recording (or very narrow transcription) of the exchange.

Part 4 discusses a parallel with Japanese ne, which has many similar functions, including a stigmatized narrative use. The narrative usage for eh is an interesting one, and I have read in other sources besides Gold's paper that it is one of the innovative Canadian functions of the marker. I haven't thought much about its relationship with discourse structure, but I'm going to hypothesize that it acts as a focuser, but with post-focus position. That is, while focuser-like sets off and highlights the following phrase, focuser-eh sets off the preceding phrase. E.g.: "Not an easy thing to talk about, eh, but you might get the drift". ( = "Not an easy thing to like, talk about, but you might get the drift").

I'm interested now in the intonational contours of eh, especially regarding (probably minute) differences across functions. There is always a drop in tone on the word before eh, and eh then has a rising contour - but the degree of the drop or the end point of the rising contour might differ slightly across modal, affective, imperative, and narrative functions.

Unlike ne (or huh), eh cannot ever have falling intonation, even in its narrative use. Again, I say this with the caveat that I'm weighing in with native speaker judgements rather than with data from recorded discourse. Which I can't get right now, as my usage drops to near zero when I'm not in the company of Canadians or close friends. (Simply because using it triggers an amused response that distracts from the intended conversation).

[Update (15 minutes later): Heidi Harley has already posted some similar thoughts on the eh tokens from the Hansard transcripts.]