Archive for the ‘Language Technology’ Category

Wordnik Smartwords: E-books just got schooled

Tuesday, June 15th, 2010

If you’re reading books as E-books on E-readers or iPads, chances are you’d like to exploit the new platform by making reading more interactive. E-readers already have built-in dictionaries, but now the Smartwords open standard from the Wordnik online dictionary (and all-around word information source) will make words “smarter.”

Wordnik logo

In the following video from The Wall Street Journal‘s D: All Things Digital conference in June 2010, lexicographer and Wordnik CEO Erin McKean demonstrates how Smartwords allows someone to get lengthy definitions for technical terms, buy books on searched concepts, and get quizzed on words for the college entrance exam (hat tip to VentureBeat).

Link to video

The video below from O’Reilly’s TOC Conference (Tools of Change for Publishing Conference) in February 2010 is disappointingly vague, but the main point is that the Smartwords platform lets you learn (about words):

  • where they are and
  • where they came from
  • when they are
  • how they relate to other words
  • who created them and
  • who they’re with now

I take this to mean the contexts, connotations, collocations (words that co-occur), and other connections among words. I would dub this “Word Con 4,” but one is a col- and it might also sound like a word conference or a lexical DEFense CONdition for shooting language-maven missiles (after eating and before leaving) at people who misuse too many words.

Link to video

These are exciting times for how we access words and information. Once we reach the immersive hologram phase I suppose tagged words will have avatars to come by and explain themselves to us. “Wrestling with” a new concept could cause injuries without proper safety protocols, and “wrapping your head around” an idea might make for an unflattering online video of you.

Side note:

Erin McKean (her Twitter) uses delightful analogies. Below are two talks she has given about dictionaries.

2007 TED Talk on redefining the dictionary

Link to video

2007 talk at Google on what one should know about dictionaries (almost 55 minutes)

Link to video

Lingro: Free Web page translation by word and more

Friday, November 30th, 2007

You can get whole Web pages translated (Google Language Tools, Babel Fish, etc.), but here’s a free tool to translate any individual word on a Web page you want: Lingro (via Education Week).

Lingro works for English-English and both to and from English for:

  • Spanish
  • French
  • Italian
  • German and
  • Polish.

It uses the free Wiktionary dictionaries and also has an input-translator dictionary, file translator, and vocabulary study tools.

Go forth and click multiple unknown words.

Follow-up: Help test internationalized domain names

Tuesday, October 16th, 2007

As I reported last week (Help test internationalized domain names 2007-10-15), the time has come to see how non-Latin characters will work in domain names. The Internet administrators at ICANN have set up test domains (http://example.test) in eleven writing systems. You can type them into different browsers, put them in E-mails, and create a wiki page with your name in non-Latin characters at each domain.

Here’re the domains:

It’s still ASCII encoding; each domain has a transliteration encoding starting with an “xn--” prefix.

Here’re the actual addresses:

When you get into Web pages, the encoding gets rather long. For example, this would be the wiki listing for Japanese filmmaker Akira Kurosawa:

http://例え.テスト/黒澤明

Here’s what you may see in your browser address bar:

http://xn--r8jz45g.xn--zckzah/%E9%BB%92%E6%BE%A4%E6%98%8E

They worked in my browser and E-mail. I was also pleasantly surprised to see that the Chinese and Japanese domains work with both the ASCII period/full stop “dot” (.) and with the Chinese/Japanese open-circle period (。).

More information at:

My Name, My Language, My Internet: IDN Test Goes Live

IDNwiki

Help test internationalized domain names 2007-10-15

Friday, October 12th, 2007

As I mentioned over a year ago (More writing systems for Internet addresses), Internet domains are biased towards English speakers. You have to use only the Latin alphabet (without accents), Arabic numerals, and hyphens (all English ASCII characters) for Web site and E-mail domains. After seven years of research and workshops, ICANN (the Internet Corporation for Assigned Names and Numbers) is ready to start testing internationalized domain names (IDNs).

You can help on Monday, October 15, 2007

Visit the links on their page to the test domains in:

  • Arabic
  • Chinese (simplified and traditional characters)
  • Greek
  • Hindi
  • Japanese
  • Korean
  • Persian
  • Russian
  • Tamil and
  • Yiddish.

Within their own country codes (.cn, .jp, etc.), some countries have been using their own writing systems for domain names since 2003, but they’ve still been stuck with the top-level .cn, etc. Even accented letters are a tricky thing. Just this week Spain, within .es, added accented vowels, tilde-n, and more.

China has gone further and added Chinese-character top-level domains: .公司 (.gongsi, .com; “company,” “corporation”) .网络 (.wangluo, .net; “network”), and .中国 (.zhongguo, .cn; “China”).

Now ICANN wants to make those sort of non-ASCII domains accessible for Web surfers who aren’t in China (and avert the surfing errors that could happen with multiple, separate internets).

Why the delays?

Aside from the technical difficulties and possible indifference to other languages by the American company ICANN (moneyweb.co.za/mw/view/mw/en/page94?oid=165049&sn=Detail) [EDIT (6/6/10): dead link], there are also the fears of trademark holders and opportunities for spoof domains using similar-looking letters from languages like Greek and Russian.

But let’s look to the future. Onward to an Internet for all writing systems!

Picture this: Zlango linguistic icons for cellphones and Web

Friday, October 12th, 2007

You can liven up your cellphone/mobile phone messages and send Web messages and E-cards with Zlango icons (zlango.com/icons) [EDIT (6//6/10): dead link] (note: includes a few off-color terms) (via TechCrunch).

Icons

Zlango currently has over 200 icons (219, including at least four repeats I noticed) on their site in eight categories:

  • People: 25 icons
  • Actions: 43
  • Places: 21
  • Feel: 26
  • Time: 22
  • Language: 20
  • Fun: 25
  • Descriptions: 37

Zlango is not an international language, of course. It has no grammar and favors languages like English that mostly use word order instead of morphology (word changes and affixes) for grammar.

Observations

  • Why are “beach” and “army” considered types of buildings like “cafe” and “pub”?
  • The hungry bird chick looks more like “hungry” than “want.”
  • Flying Superman doesn’t make me think “can.”
  • “Please” looks like “pray.”
  • A timer bomb about to explode doesn’t really work for “soon” when it’s a good thing about to happen.
  • Would everyone get “beautiful” from a picture of a swan (presumably from the Hans Christian Andersen fairy tale “The Ugly Duckling” who was really a swan-to-be)?
  • Amusingly, it’s an Israeli company, but “food” is a hot dog and “restaurant” is a hot-dog place (perhaps it’s a kosher hot dog from beef).

Videos on Zlango’s Youtube page

You might also want to look into psychologist David Premack’s work with chimpanzees and plastic tokens to create simple communication.

IBM brings animated avatars to sign-language translation

Friday, September 21st, 2007

Imagine you’re a signing Deaf person and you find out about a spoken lecture. You need a human interpreter, but there’s no time. What do you do?

In the UK, soon you’ll be able to use IBM UK’s Say It Sign It (SiSi) translation system.

SiSi converts spoken English to British Sign Language (BSL) via speech recognition, translation, and a computer-animated avatar character.

Demonstration video on Youtube (14 seconds)

Link to video

This is a wonderful invention for simple communication and last-minute occasions. I hope, however, that people don’t try to replace human interpreters with speech-recognition avatars. The nuances of the face, which can carry important grammatical information in sign languages, can’t be perfectly rendered in animation in all their complexity and subtlety.

I also wonder how good the speech-recognition and translation programs are. I suspect the device is really translating into a pidgin sign language that includes elements of BSL and English. The way information is presented in natural sign languages and spoken languages can be quite different. Sign languages use a lot of simultaneity/overlapping of information in space and what are called size and shape specifiers, or classifiers (such as for flat objects or round objects or vehicles or people). Translation is not easy.

Still, I hope IBM can make Say It Sign It better and better.

Folktales as digital picture-books in many languages

Saturday, September 15th, 2007

If you like traditional folktales, Digital EHON is the site for you. You can read folktales from Japan and elsewhere in the original language and translations into various other languages. Not only that, you can also download and listen to some of them in Japanese and English with pictures and text.

The audio picture-book players are only about 3MB each and download in seconds via DSL, but I get gibberish for the Japanese text. However, the Japanese text renders fine on the Web site itself.

Right now they have (at least in Japanese) tales with paintings from:

  • Japan (including some original stories)
  • Taiwan
  • Korea
  • Papua New Guinea
  • Mongolia
  • Mexico
  • Peru
  • Brazil
  • Bolivia
  • Sweden
  • Indonesia and
  • China.

The site has an index and at least some content in:

  • English
  • Japanese
  • Indonesian
  • Spanish
  • Norwegian
  • Swedish
  • Chinese
  • Korean
  • German
  • Italian
  • French and
  • Ami (a language of Taiwan’s indigenous Ami people).

EHON is a great place for multicultural/multilingual/translation fun.

Language note:
EHON is probably a play on words. Hon 本 is the Japanese word for “book.” Ehon 絵本 is “picture book” but could also be “E-book” (electronic book) with a change in pronunciation from “eh” to English “ee.”

Free audio phrase books for phones

Friday, July 6th, 2007

Just under a year ago (Mobile words) I wrote about Coolgorilla’s free audio phrase books for iPods. Those are now free with a purchase of something from travel/tourism site lastminute.com. But the new version, Talking Mobile Phrase Books, for cellphones/mobile phones is completely free for now at least (also sponsored by lastminute.com).

You can download French, German, Greek, Italian, Portuguese, and Spanish. They’re planning on adding Chinese, Japanese, Polish, Swedish, and other languages in the future.

You don’t have to be able to pronounce the foreign language; just find the expression you want and let the phrase book say it.

Press release

Thanks to Roy Forsdick, Managing Director, iDev Entertainment (apparently Coolgorilla is their brand) for informing me about this.

MS Office to embrace regional Britishisms

Tuesday, May 1st, 2007

Coming soon, from the people who brought you sickie (“sick day”; see my November post: MS Office embraces Australianisms):

British regional dialect words for Microsoft Office 2007 products.

You can offer your suggestions to judge Jonathan Robinson, curator of English accents and dialects at The British Library (site) via E-mail: dialect at microsoft dot com.

Speaking of The British Library, you can listen to British dialects (with Windows Media Player) from the archive:

Sounds Familiar?

See also:

British Sign Language mobile dictionary

Monday, March 19th, 2007

Now Brits who want to communicate with the signing Deaf can get British Sign Language (BSL) signs downloaded to their cellphones/mobile phones. The Centre for Deaf Studies at the University of Bristol (site) has developed a free BSL dictionary called Mobilesign, which should download fast enough to avoid large phone-service charges.

From the press release linked above:

Jim Kyle, Harry Crook Professor of Deaf Studies at the Centre for Deaf Studies said: “This is a first step to providing support to hearing people’s communication with Deaf people — anywhere and at any time. From our research, we have identified this point of contact as a major issue for Deaf people in shops and daily life. The next step for us will be to construct a phrasebook in order that more extensive interaction can be supported.”

Mobilesign

Signstation: BSL and Deaf awareness workplace materials

It’s nice to see technology working for Deaf people instead of against them (radio, movies going from readable intertitles to talkies, telegrams going to telephones). I hope we get something like this for American Sign Language (ASL), especially with the current work on sign-friendly video compression (see my post Better cellphone video for Deaf signers?).