Orphans of the Arabian Nights

The Arabian Nights is a tremendously rich document containing centuries of cultural thought, ideas and history. The Nights contains stories from an enormous geographical region: from Morocco to Ethiopia to modern Iraq to India. In this context it is no surprise to find an overwhelming number of different topics and subjects in the stories, yet they form a remarkably coherent collection.

In her wonderful Stranger Magic (go read it, if you haven’t), Marina Warner (2012) notes that a number of very famous stories in the Nights (‘Aladdin and the Wonderful Lamp’ and ‘Ali Baba and the Forty Thieves’) are not originally part of the Night but are ‘orphan’ tales probably to be contributed to the French translator Galland. Evidence for this claim is found in the first Arabic version of Aladdin which can be back-traced quite directly to Galland’s writings in French. Another sign of his “bricolage” as Warner calls it, is that the story of Aladdin

“pieces and patches many elements from different tales in the book, especially from ‘the true Aladdin’ (‘Aladdin of the Beautiful Moles’) and ‘Hasan of Basra’. […] Yet the plot of ‘Aladdin’, which upholds the rise of a worthless orphan boy to princely fortune, fame and power, oddly replicates the fate of the book itself, as does the story of Morgiana the plucky slave girl in ‘Ali Baba’, for she too marries up; it is as if Galland were unconsciously confessing his own craft and luck.” (Warner 2012, pp. 58).

In this post I want to explore the Nights from a distance. More specifically, using a technique called Topic Modeling, I want to investigate this idea that the “bricolage” of Galland can be observed from the many pieces and patches, or as I will call them, topical connections, between the orphan stories and the rest of the collection.

Over the last 10 years, Topic Modeling gained a lot of attention in Machine Learning and Information Retrieval. This technique allows researchers to browse a collection, not on the basis of single words, but on the basis of topics, such as ‘love’, ‘despair’, ‘war’ or ‘magic’. Scholars from the Humanities increasingly show interest in using these techniques although they also show a healthy skepticism towards the meaningfulness of applying these methods. The topic models generally provide information that most scholars are already aware of, because the topics are often of a very general nature. Although I wholeheartedly agree with these objections, I do find that a distant view on a sufficiently large collection can provide insights about the data and sometimes even proof for certain hypotheses, that are otherwise hard to obtain.

Constructing the Topic Model

There are quite some Topic Modeling toolkits available. One of the best toolkits that is also quite easy to use is the one that is included in Mallet (MAchine Learning for LanguagE Toolkit). I used the modern English translation of the nights by Malcolm Lyons (2009) from the Penguin Classics Series. Contrary to popular belief, the 1001 nights do not contain 1001 stories. There are about 260 different stories told over 1001 nights. Interestingly, night 261 seems to be missing from the Lyons edition. I constructed a corpus from Lyons’ (2009) edition that consists of 1000 documents (one for each night) plus the orphan stories of ‘Aladdin’ and ‘Ali Baba’.

I then run the Mallet Topic Model using 300 topics. The number of topics is always somewhat of a black art, and you need to experiment with a number of settings. The general idea is that if you use only a few topics, the model will provide a very general view on the data. If you choose many topics, you will obtain many fine-grained topical differences. The risk of having too many topics is that you lose generalization. For most corpora, 200 to 400 topics seems to be a good number.

Here are some of the topics learned by the Topic Model:

  • Topic 222: god, night, pray, men, prayer, grant, pious, blessing, granted, prayers;
  • Topic 209: ship, sea, captain, island, board, shore, sailed, water, city, wind;
  • Topic 155: fish, fisherman, sea, net, baker, water, cast, bread, give, day;
  • Topic 114: men, thousand, muslims, fight, army, riders, battle, killed, swords, infidels.

These topics seem to deal with religion, marine, fishing and war. Some topics are rather general and appear throughout the Nights. The following plot visualizes the ten most common topics in the nights as an area chart. The nights are on the x-axis and the probability of a topic occurring in a particular night is on the y-axis:

  • Topic 59: god, heard, asked, told, replied, don’t, hand, morning, made, night;
  • Topic 143: back, left, put, find, happened, leave, afraid, heard, good, thought;
  • Topic 111: night, morning, hundred, told, continued, heard, broke, king, fortunate, allowed;
  • Topic 121: sight, found, left, started, time, clothes, day, fell, walked, looked;
  • Topic 114: men, thousand, muslims, fight, army, riders, battle, killed, swords, infidels;
  • Topic 48: man, asked, told, back, gave, replied, shop, bring, home, don’t;
  • Topic 72: king, palace, ground, state, city, kissed, ordered, emirs, son, throne;
  • Topic 30: gharib, ajib, sahim, mirdas, brother, hundred, friend, al-kailajan, abraham, replied;
  • Topic 5: great, gave, time, men, brought, filled, honour, taking, joy, provided;
  • Topic 154: tears, left, time, god, recited, life, heart, lines, back, wept.

The most general and common topic in the Nights is about communication. No surprise. Other topics, such as topic 114 deal with war. The plot nicely visualizes where in the Nights wars are fought. These general topics don’t tell us much, but do function as a sanity check that our models is capable of finding common topics. Let’s now have a look at some relatively common topics that display a higher degree of granularity:

This mountainous landscape shows some interesting peaks of topic usage in certain nights. For example, around night 357-370, the Topic 204 shows a big burst. This topic has the following top words: prince, king, horse, princess, father, city, persian, palace, sorcerer, roof. In these nights Shahrazad tells the Sultan the story of the Ebony Horse. This story tells about a Persian Sage who brings an flying ebony horse to king Sabut. In return the king promises one of his daughters, but the princess is reluctant to marry this ugly and old man. A lot of adventures follow, but the point here is that, as you can see, many of the key words of the story are present in the topic. Topics such as Topic 204 are story-specific topics. Topic 2 is another example of this which shows a burst around night 550-600. During these nights Shahrazad tells the story of Sindbad. The words in Topic 2 provide somewhat of a summary of this story: sindbad, goods, island, voyage, god, large, friends, baghdad, sailor, merchants.

Aladdin & Ali Baba

According to Warner (2012), the stories of Aladdin and Ali Baba are in a way a reflection of the Nights and share many topics from many stories. Let’s have a look at some of the topics present in these two stories:

The most dominant topic in Ali Baba is Topic 201 which is represented by the following words: ali baba, captain, qasim, oil, wife, gold, jar, city, husain, abdullah, coin. These words beautifully summarize the story. Aladdin’s most probable topic is Topic 266 which contains the following words: aladdin, princess, palace, magician, sultan, mother, aladdin’s, grand, don’t, sultan’s, majesty. Again, the words seem to capture (some of) the essence of the story.

Reflections of the Nights?

How are Aladdin and Ali Baba related in terms of their topics to the rest of the Nights? To find that out we can use the topic distributions of all nights and the alleged orphans and compute the pairwise distances between them. This results in a matrix of distances between all stories. One downside of this is that is not straightforward to inspect such a large table. Therefore, I make use of another technique called t-SNE (developed by Van der Maaten & Hinton, info). This technique allows us to visualize the distances between the stories in a two-dimensional plot. To make a long story short, here it is (Right-click and open the image in a new tab, if it’s to small.):

The plot displays a number of clusters. Far to the right we have a cluster containing the nights in which the story of ‘Hasan of Basra’ is told. The blue cluster on the top contains the nights in which Shahrazad tells the story about ‘‘Ajib and Gharib’. Exactly why these nights are so far away from the other stories is something I would like to look into another time. For now it is intriguing to see that ‘Aladdin’ and ‘Ali Baba’ are placed right next to each other. What is even more striking is that they occupy the center of the plot, suggesting that the distances between them and the stories in the Nights is relatively small and that they have many topical connections with these other stories. Although, my analysis is in no way conclusive, given the central position of the orphan stories, it does make an argument for Warner’s idea that the “bricolage” of Galland can be observed from the many pieces and patches from the rest of the Nights.

How long is an Arabian Night?

When the next night came, Dinarazad said to her sister Shahrazad: ‘In God’s name, sister, if you are not asleep, then tell us one of your stories!’ Shahrazad answered: ‘With great pleasure! I have heard tell, honoured King, that…’

I’m planning to do a series of blog posts about Alf Laylah Wa Laylah, the Stories of One Thousand and One Nights. This collection of folk tales, collected over many centuries by various authors, translators, and scholars across West, Central and South Asia and North Africa, forms a huge narrative wheel with an overarching plot, created by the frame story of Shahrazad.

The stories begin with the tale of king Shahryar and his brother, who, both deceived by their respective Sultanas, leave their kingdom, only to return when they have found someone who — in their view — was wronged even more. On their journey the two brothers encounter a huge jinn who carries a glass box containing a beautiful young woman. The two brothers hide as quickly as they can in a tree. The jinn lays his head on the girl’s lap and as soon as he is asleep, the girl demands the two kings to make love to her or else she will wake her ‘husband’. They reluctantly give in and the brothers soon discover that the girl has already betrayed the jinn ninety-eight times before. This exemplar of lust and treachery strengthens the Sultan’s opinion that all women are wicked and not to be trusted.

When king Shahryar returns home, his wrath against women has grown to an unprecedented level. To temper his anger, each night the king sleeps with a virgin only to execute her the next morning. In order to make an end to this cruelty and save womanhood from a “virgin scarcity”, Sharazad offers herself as the next king’s bride. On the first night, Sharazad begins to tell the king a story, but she does not end it. The king’s curiosity to know how the story ends, prevents him from executing Shahrazad. The next night Shahrazad finishes her story, and begins a new one. The king, eager to know the ending of this tale as well, postpones her execution once more. Using this strategy for One Thousand and One Nights in a labyrinth of stories-within-stories-within-stories, Shahrazad attempts to gradually move the king’s cynical stance against women towards a politics of love and justice (see Marina Warner’s Stranger Magic (2013)).

The first European version of the Nights was translated into French by Antoine Galland. Many translations (in different languages) followed, such as the (heavily criticized) English translation by Sir Richard Francis Burton entitled The Book of the Thousand and a Night (1885). This version is freely available from the Gutenberg project (see here), and will be the one I will explore here.

I am intrigued by the suspense created by Shahrazad’s story-telling skills, especially the “cliff-hanger” ending each night with she uses to avert her own execution (and possibly that of womanhood). Every night she tells the Sultan a story only to stop at dawn and she picks up the thread the next night. But does it really take the whole night to tell a particular story?

I am not aware of any exact numbers about how many words people speak per minute. Averages seem to fluctuate between 100 and 200 words per minute. Narrators are advised to use approximately 150 words per minute in audiobooks. I suspect that this number is a little lower for live storytelling and assume it lies around 130 words per minute (including pauses) (cf. this). Using this information, we can compute the time it takes to tell a particular story as follows:

$$ST(t) = \frac{\textrm{number of words in t}}{\textrm{words per minute}}$$

So, a story of 4000 words would take approximately 4000 / 130 = 30 minutes. (To be honest, this actually seems quite fast to me.) I took Burton’s translation of Alf Laylah Wa Laylah and computed for each night how long it would take to tell the story. The plot below visualizes this for each story. I add a smoothing curve (in blue) for interpretability. On average, each night only lasts nine minutes. Remarkably short!

Then Shahrazad reached the morning, and fell silent in the telling of her tale…