Friendship, death, and writing in Michel de Montaigne's Essays

Introduction ¶

In the center of the first book of Michel de Montaigne’s (1533-1592) Essais we find his famous essay on friendship. We should not ascribe the central location of this essay to coincidence, in particular not when we take the introduction of this famous essay on friendship into consideration. In that introduction Montaigne compares his Essais with the work of a painter he had employed. This painter meticulously placed his paintings on the wall. In the middle of each wall he placed his best paintings, which showed all of his capabilities as a painter, and he filled the surrounding space with so-called grotesques, paintings that display fantastic and strange figures that are only enjoyable in that strange capacity. Montaigne’s analogy follows immediately:

And what are these things of mine, in truth, but grotesques and monstrous bodies, pieced together of divers members, without definite shape, having no order, sequence, or proportion other than accidental? (Montaigne 2010, 187).

However, Montaigne stresses that the analogy is not complete. Despite his Essais being like these grotesques, Montaigne deems himself incapable of producing the well-rounded central artwork. That is why he instead announces that as the 28th essay he will place an essay of his dear friend, Etienne de La Boétie, with a work from his youth that Montaigne deems fit to take this prominent place. In addition, sonnets from the same friend will make up the 29th essay. As the 28th and 29th of the 57 essays of the first part of the Essais, Montaigne thus places the work of his friend at the center of his labyrinth of grotesque writings; a labyrinth that thus, void of any intrinsic necessity, more or less accidentally and without order, floats around a focal point: around the friend, around their friendship. And in particular, as occurs so often in the history of the writing on friendship: around the deceased friend, the friend that has passed away.

In this essay I ask why the writing on friendship is so often connected to the death of the friend. In particular, what does this say about writing, and what does it say about the friendship? How do the motives of friendship, death and writing coincide in Montaigne’s essays?

Derrida on the love that mourns ¶

The testamentary character of friendship is emphasized by Jacques Derrida in his book The Politics of Friendship. Although Derrida distinguishes historical periods in the writing and thinking on friendship - namely: the Greek-Roman model, the Christian model, and “Nietzschean” thought on friendship (Berns 2013, 218) - he observes that the relation between friendship and the death of the friend is a recurrent theme, parallel to or throughout those periods. Derrida remarks that already in Aristotle the relation between friendship and survival is present - and with that the mourning of the other, the friend (even though in the Greek-Roman model the friend still circulates in the economy of the self). That connection between friendship and death concerns the durability and stability (bébaios: ‘stable, established, certain, assured’ (Derrida 2005, 15)) of the friendship, and in particular Aristotle’s preference for the activity of loving rather than the passivity of being loved. What else is friendship rather than a particular form of loving - as an activity?

According to Aristotle friendship does not primarily exist in a particular event that is passively endured, but instead in the activity of loving even before any situation of being loved arises. Derrida summarizes this position on the passivity of being loved:

It says nothing of friendship itself which implies in itself, properly, essentially, the act and the activity: someone must love in order to know what loving means; then, and only then, can one know what being loved means. (Derrida 2005, 8)

This privileging of the act has everything to do with knowledge. According to Aristotle the highest friendship is for the sake of what is good, relative to which friendship for utility and pleasure are mere derivatives, and is thus always characterized by durability, bébaios, through the presupposition that reason is being used in making decisions for the sake of the good. Now, the particular type of loving specific to friendship is as an activity accompanied by knowledge, whereas being the passive object of love can remain a secret to the one being loved. Conversely, the loving considered as an activity is never strictly secret: even if the love¹ is not proclaimed aloud, the love is always already at least proclaimed to the lover itself. In Aristotle’s view, the phenomenon of friendship can thus not, in its essence, primarily be understood as a passive and potentially unknowingly undergoing of a type of love.

As a result, Derrida argues that Aristotle’s view on friendship is embedded in a rational system of contrapositions and preferences for one over the other. Indeed, these preferences are traditional preferences of philosophy itself:

Loving will always be preferable to being-loved, as acting is preferable to suffering, act to potentiality, essence to accident, knowledge to non-knowledge. It is the reference, the preference itself. (Derrida 2005, 11).

Aristotle would accordingly claim that if a friend should choose between knowing and being known, he would choose knowing, precisely because knowing characterizes true friendship. In that context Aristotle claims that “we” (i.e. the Ancient Greek) praise the friend that keeps loving a dead friend, because in that scenario that friend knows, without the reciprocity of being known. The object of knowing can thus potentially be a dead friend or a lifeless object, whereas the knowing subject is necessarily alive in the act of friendship. Considering an object of knowing, there is thus always already the possibility of that object being dead. This holds accordingly in the act of friendship, which is necessarily accompanied by the (self-)knowledge of this love:

Friendship for the deceased thus carries this philía to the limit of its possibility. But at the same time, it uncovers the ultimate spring of its possibility: I could not love friendship without projecting its impetus towards the horizon of this death. (Derrida 2005, 12).

The limit case of loving the dead friend thus shows that in Aristotle’s view the object of love does not essentially have to exist. In other words, true friendship anticipates the death of the other. The possibility of loving a friend and possibility of the death of the loved one emerge from the very same origin:

I could not love friendship without engaging myself, without feeling myself in advance engaged to love the other beyond death. Therefore, beyond life. I feel myself – and in advance, before any contract – borne to love the dead other. (Derrida 2005, 12).

We thus see that that together with the heterogeneity between activity and passivity, act and potentiality, knowing and not knowing etc., an invisible line between life and death at the same time.
It is inherent to every friendship that one friend survives the other. Derrida will proceed to play out contradictions in Aristotle’s view on friendship in the usual fashion of deconstruction, transforming Aristoteles’ position into new insights along the way. Tracing the steps of that deconstruction is not the aim of this essay. Instead, the goal was to introduce the connection between friendship and death, because this thread returns throughout history in the thinking about friendship, for example in Cicero, Seneca, Augustinus, and also in Derrida himself, who published several memorials about friends. Another interesting author where this connection can be examined is Michel de Montaigne.

The essay as Grotesque ¶

What Montaigne adds to the aforementioned connection between friendship and death is, I argue, that in his essay on friendship textuality itself enters into relation with death as well. I mean that in a more concrete sense than for example Derrida, when he argues that writing is the principle of death² in a history of logocentrism, because writing breaks the ideality of the voice. The voice is there understood as an auto-affection that is seemingly not stained by the materiality of any signifier, fully present and self-present. This theme could perhaps shed a light on how Montaigne’s essay of friendship expresses the desire for the presence of a lost friend.³

But here I want to focus on the structure and status of the 28th essay itself, as well as its special position in the Essais as a whole, regarded in light of the remarks Montaigne makes about his own texts at the beginning of the essay on friendship. I already partly summarized those remarks in the introduction. But what we should add now is that the announced center piece dedicated to his friend Etienne de la Boétie - of essay 28 and 29, and as such also the center of the Essais as a whole - is remarkably absent. The “grotesques” of Montaigne thus circulate around an absent center, a central absence that we can understand in a negative fashion as a vanishing point. In the very heart of the Essais the disappearance of the friend is materially inscribed.

Moreover, in the rather bizarre introduction to his Essais Montaigne describes that he intended the essays as a self-portrait.

This, reader, is an honest book. It warns you at the outset that my sole purpose in writing it has been a private and domestic one. I have had no thought of serving you or of my own fame; such a plan would be beyond my powers. I have intended it solely for the pleasure of my relatives and friends so that, when they have lost me - which they soon must - they may recover some features of my character and disposition, and thus keep the memory they have of me more completely and vividly alive. Had it been my purpose to seek the world’s favour, I should have put on finer clothes, and have presented myself in a studied attitude. But I want to appear in my simple, natural, and everyday dress, without strain or artifice; for it is myself that I portray. My imperfections may be read to the life, and my natural form will be here in so far as respect for the public allows. Had my lot been cast among those peoples who are said still to live under the kindly liberty of nature’s primal laws, I should, I assure you, most gladly have painted myself complete and in all my nakedness. So, reader, I am myself the substance of my book, and there is no reason why you should waste your leisure on so frivolous and unrewarding a subject. (Montaigne 1993, 23).

We see first of all that the text anticipates the death of Montaigne himself, and is intended to function as a necrology for family and friends (here in the plural), in which Montaigne’s self becomes readable and recoverable. He adds immediately that, even though he is being honest in representing himself, he cannot offer a full-on nude portrait of himself, but only a portrait ‘in so far as respect for the public allows’. The self-portrait tries to conjure up the self in a lively fashion, but cannot do so completely. Montaigne’s remark reaffirms how such a narrative of the self is a construction, a simulation, which could be understood paradoxically as an illusion produced by the dissimulation of the self. He writes his self in the text as something that will essentially be absent. Who reads the Essais indeed has the idea, even if it is an illusion, that the voice of Montaigne speaks from the text itself, even though it is clear that structurally this particular presence signifies the absence, the death of Montaigne himself.
And Montaigne makes this insight even more explicit by anticipating his own death at the very moment of writing.

Besides this subtle textual self-renunciation in the self-portrait, it is also remarkable that in the very display and literary creation of his ‘self’ Montaigne circumscribes his own subjectivity with the words of others. In this light the comparison of his own essays with grotesques is not innocent at all, because Montaigne thus places himself in a position of marginality and eccentricity with respect to the central theme of death.

As Brad Epps puts it:

What arises is a self-portraiture beside itself, or better yet, a self-portraiture which consists of an elaborate cir-cumlocution, or en-framing, of the words and images of others: La Boétie, of course, but also Horace, Catullus, Ariosto, Cicero, Terence, and so on (Epps 1995, 41).

Montaigne’s autobiography is thus simultaneously a biography of his dead friends, and in particular of his dead friend (in the singular), Etienne de la Boétie. In his article Grotesque Identities Brad Epps considers the grotesque as a method of self-portraiture. The strangeness and even the monstrosity (a term that Montaigne himself uses in essay 28 with emphasis) of the grotesque thus becomes more insightful:

For if the grotesque is strange, even monstrous, it is in part because it styles the self as twisted round and shot through with otherness. (Epps 1995, 41).

This fascinates me because the blending in of the other in the self - an otherness so radical that it becomes a monstrosity - is a crucial element of how Montaigne describes the true friendship:

In the friendship I speak of, our souls mingle and blend with each other so completely that they efface the seam that joined them, and cannot find it again. If you press me to tell why I loved him, I feel that this cannot be expressed, except by answering: Because it was he, because it was I. (Montaigne 2010, 192).

So according to Montaigne, he and La Boétie where such good friends that one could no longer discern a rigid difference between ‘I’ and ‘you’, between ‘you’ and ‘I’. This is different than the reciprocity that is central to the Greek-Roman thinking on friendship, in which the otherness of the friend is from the onset considered to be a mirror of the self, on equal footing, and in which the otherness of the friend is thus immediately assimilated in the economy of the self. The equality and reciprocity of the Greek-Roman model of friendship are replaced by Montaigne by the ‘heteronomy, transcendence and infinity’ (Berns 2013, 220, my translation) that is so typical of the Christian idea of friendship⁴. That transcendence and infinity are illustrated clearly when Montaigne says about his friend:

he surpassed me infinitely in every other ability and virtue, so he did in the duty of friendship. (Montaigne 2010, 198).

By speaking of this infinite transcendence Montaigne’s praises his friend quite literally into heaven; almost as if the friend here replaces the position of God. What thus takes shape is some sort of negative theology through which Montaigne strictly distinguishes his friendship with La Boétie from ‘normal’ friendships, family ties, sexual relations with women (and from the ‘Greek love’). What is left after the negation of these expressions is beyond words. The only utterance that is left for Montaigne to express this friendship is “Because it was he, because it was I.” In this mystical experience - I think you can call it that - he can find no reason for the friendship outside of the singularity of the other. The ontological question what friendship is, as for example Aristotle asked it, is rendered inoperable and of no use for structuring and articulating the friendship (Berns 2013, 220).

This point is important for my overall argument because I want to show how the structure of the Essais incorporates, as it were, Montaigne’s thinking on friendship. That the essay is a grotesque means that the written self-display of Montaigne encircles the heterogeneous other, the infinite transcendence of the friend. There is thus a connection between Montaigne’s writing and his experience of friendship. But there are more connections to point out. Both friendship and the grotesques relate to death. I already highlighted the connection between friendship and death with the help of Derrida’s interpretation. With respect to Montaigne I would like to add that friendship does not only anticipate the death of the other, as we saw in Aristotle, but that friendship as Montaigne envisions it is so ideal, so mystical, that perhaps it could only take place if and only if the friend is dead.

The suspicion that gave rise to this essay is that that the ideal friendship of Montaigne only first takes place in the text, in the writing about friendship, in the grotesque writing of a self-portrait in which the dead friend is being remembered and the ideal friend is being born.

My main claim thus is that Montaigne’s friendship is fundamentally written.

And perhaps the grotesque nature of Montaigne’s writing thus structures his friendship. His writings are grotesque in sofar as they are scribbles in the margin of a central absence, an emptiness that is inscribed in essay 28 as an empty space at the place where La Boétie’s La Servitude Volontaire should have been. It is this central emptiness, the death of the friend, that perhaps inspired the writing of the Essais and the writing on friendship. Kuisma Korvonon states:

One important story about the Essais (…) is the one where Montaigne starts to write his book after the death of his friend Etienne de la Boétie – the story of an ideal friendship, with the text serving as its memorial. (Korvonen 2006, 78).

Montaigne’s friendship has a testamentary character, it exists by the grace of an epitaph, a series of testamentary signs that summon a ‘living’ image of the friend, while inevitably it is the very death of this friend that is its possibility and inspiration⁵. The form of the grotesque is appropriate for this structure of friendship. Epps states about this form:

The ornamental flourish of figures neither fish nor flow, the reticular profusion of cryptic signs and images, is the most visible stuff of the grotesque, but so too are death, burial, emptiness, creativity, excess, and exuberance: an entire thematics of mortality and vitality that heightens, and is heightened by, the significance of form (Epps 1995, 44).

In this regard the form of the 28th essay itself highly meaningful: its profusion of signs results from a central lack and emptiness, namely the strange and plural absence of the announced text from La Boétie. I say plural, because at the end of the 28th essay Montaigne first of all excuses himself for not placing the text of his friend that he promised, due to the controversial and unintended role it had started to play in its use by protestants under the name Le Contre Un⁶. But secondly, the piece that he promised to publicize instead, ‘produced in that same season of his life, gayer and more lusty’ (Montaigne 2010, 199), he also did not publish.

In the line of my argument this however make sense: which text could possibly live up to his image of the infinite transcendence of his friend? Both Montaigne’s grotesque writing and his notion of friendship encircle a void left by a central death, that is too ideal to be filled materially.

About this essay:

This essay is a translation and edit of a Dutch essay I wrote about five years ago. You can contact me if you are interested in the Dutch version.

Bibliography ¶

Berns, Gido. 2013. “De tijd van de vriendschap. Vriendschap, broederschap en democratie bij Derrida.” Tijdschrift voor Filosofie 75: 215-46.

Derrida, Jacques. 2005. Politics of Friendship. Translated by George Collins. London: Verso.

Derrida, Jacques. 1974. Of Grammatology. Translated by Gayatri Chakravorty Spivak. Baltimore: The John Hopkins University Press.

Epps, Brad. 1995. “Grotesque Identiteit: Writing, Death, and the Space of the Subject (Between Michel de Montaigne and Reinaldo Arenas. " The Journal of the Midwest Modern Language Association 28: 38-55.

Korhonen, Kuisma. 2006. Textual Friendship. New York: Humanity Books.

Kurz, Harry. 1950. “Montaigne and la Boétie in the Chapter on Friendship.” PLMA 65: 483-530.

Montaigne, Michel de. 2010. “On Friendship.” In Other Selves. Philosophers on Friendship, redactie door Michael Pakaluk, 185-99. Indianapolis: Hackett Publishing Company.

Montaigne, Michel de. 1993. Essays. Vertaald en ingeleid door J.M. Cohen. London: Penguin Books.

Schlossman, Beryl. 1983. “From La Boétie to Montaigne: The Place of the Text.” MLN 98: 891-909.

I consider friendship here as a specific form of love. This essay does not go into further detail about the relation between concepts of friendship and love. ↩︎
‘What writing itself, in its nonphonetic moment, betrays, is life. It menaces at once the breath, the spirit, and history as the spirit’s relationship with itself. (…) Cutting breath short, sterilizing or immobilizing spiritual creation in the repetition of the letter, (…) it is the principle of death and of difference in the becoming of being.’ (Derrida 1974, 25). ↩︎
For an exposition about the Essais from this more psychoanalytic perspective, consider ‘From La Boétie to Montaigne: The Place of the Text’ from Beryl Schlossman. He argues that Montaigne’s love for the friend cannot be seen apart from a homosexual desire, a possibility that Montaigne himself explicitly excludes in his essay. ↩︎
Although Berns emphasized together with Derrida that Montaigne’s position cannot be seen as a radical departure from the Greek-Roman model of reciprocity. That is not of direct concern for us though. ↩︎
To clarify: it is death that give cause to place a tombstone to remember the deceased person, and to praise his friendship. The placing of a tombstone for a living person is nonsensible. But even if the person for whom the tombstone is intended is still alive, the tombstone as such still presupposes his death. ↩︎
For a history of this piece and its protestant renaming see the article ‘Montaigne and La Boétie in the Chapter on Friendship’ by Harry Kurz. ↩︎

Two methods for exporting EPUB annotations (.annot)

See here for a follow-up.

My personal goal for this summer break was reading more, as I really enjoy it but do not schedule enough time for it during the many hectic days throughout year. I always enjoy reading a book, but somehow the threshold for doing some project behind my pc is lower than simply sitting down in a chair with a good book. A complication for reaching my goal was however that I would go backpacking for three weeks throughout Europe. I needed to pack very lightly, and even bringing a single book would be a major compromise to that. This is where, despite being a bit of a chauvinistic philosopher that prefers the touch of “real” books, the e-reader comes into play. I purchased a Kobo Clara HD, and I have to say that the experience has been great. During my travels I finished “Crime and Punishment” from Dostojevski, read “Slaughterhouse Five” from Kurt Vonnegut, and read half of the uncomfortably thick “The Brothers Karamazov”, also from Dostojevski. And even now that I am home I notice how much easier it is to pick up the e-reader, compared to a book.

During reading, I made many annotations and notes on my Kobo. Now that I am home, I was wondering how to export these notes to my pc, because that would save the trouble of manually finding back citations on the Kobo itself, which is slow, and perhaps typing them over by hand, which is even slower. To my surprise, there was no default exporting option for annotations.

Method 1: adjusting the Kobo configuration file ¶

A reddit user however found a solution. This solution was suggested for another Kobo version, but also works for my Clara HD. I summarize the solution here for completeness:

Connect your Kobo to your computer.
Find and open “Kobo eReader.config” in the Kobo drive. Mine is at /.kobo/Kobo/, relative to the root of your Kobo e-reader.
Add the following code, including the newline. This section is brand new, so it’s probably easiest to just add it at the bottom of the file:

[FeatureSettings]
ExportHighlights=true

Eject Kobo and boot it up.
This adds another option in the menu that is available when reading books, namely to “Export highlights” under the “Notes” tab. After entering a filename the annotations will be saved to the root directory of the Kobo.

The export function produces a plain text file, starting with the title of the book, followed by a separate paragraph for each annotation. Notes are displayed in a similar manner, as such:

The original citation goes here
Note: this is my smart comment

And voila! With this method you have fast access to all your annotations in an open text format, so you can directly use it in an editor of your choice.

Method 2: customize the exporting to your own needs by parsing the annotation files ¶

However, if for some reason you want to export your annotations in a different manner, then you can always find the full xhtml markup with all annotations at “/Digital Editions/Annotations/books/". If we inspect it, we see that the xhtml does not really contain much more relevant information than we already exported. Per annotation, we also have the date at which we made the annotiation, as well as some non-human-readable identifiers. Having the date of an annotation is not essential, but if you intend to archive your notes, dates would give insight in your lecture of for example a few years back, and add some flexibility. One could for example later sort the notes on date to distinguish notes from a first and a second reading.

What I would have liked to include in my export was some more structure, for example grouping notes by chapter. What I also think is weird with the default export, is that the author of the book is never listed, and neither is the publisher of the book, which is handy for later reference. Another argument for writing our own “export function” is the possibility of immediately using a specific output format of choice. For example, I currently store my notes in Markdown on Github, so we could export the notes immediately using Markdown syntax. Another idea is to at least number the annotations, given the absence of an ordering in chapters and the unavailability of a meaningful page numbering with the epub format.

If someone knows how to parse chapters and pagination from .annot files, please hit me up!

Solution with a Python script ¶

The annotation files with the .annot extension are written in xhtml. For parsing xhtml we can use the lxml xml parser. Consider this remark on their site:

Note that XHTML is best parsed as XML, parsing it with the HTML parser can lead to unexpected results.

I like using Python, and luckily Python has a nice package called BeautifulSoup that offers a simple interface for using the lxml parser.

The Python script I wrote extracts the title, author, publisher and writes them to a file in the YAML format, which can be used within Markdown files and is supported both by Github Markdown and Pandoc Markdown (the two dialects I use). Pandoc’s default LaTeX engine for producing pdf files actually knows how to read the YAML entries and display them as a default LaTeX titlepage, which allows you to directly create a smooth pdf without writing any LaTeX.

The script also distinguishes between annotations and notes, and displays them differently. All annotations are displayed in a numbered list. Notes are indented as block quotes, directly below the annotation to which they belong. Because the list itself is also already indented, I double the indentation as such “> > “. In Pandoc Markdown this adds extra indentation, in Github Markdown the extra “>” does not do anything, but is also not necessary since blockquotes receive a different color on Github.

This is the script:

import os
import sys
from bs4 import BeautifulSoup

args = sys.argv[1:]

if not args:
    print('usage: kobo_export.py filename')
    sys.exit(1)

filename = args[0]

try:
    with open(filename, "r", encoding="utf-8") as f:
        soup = BeautifulSoup(f, "lxml-xml")
except FileNotFoundError:
    print("The annotation file was not found")

title = soup.find('title').get_text()
author = soup.find('creator').get_text() 
publisher = soup.find('publisher').get_text()
annotations = soup.find_all('annotation')

# YAML metadata
metadata ="""---
title: {}
author: {}
publisher: {}
---

""".format(title, author, publisher)

export = []
export.append(metadata)

for i, annotation in enumerate(annotations):
    date = annotation.date.get_text()
    citation = annotation.target.find('text').get_text()
    export.append('{}. "{}" ({})\n\n'.format(i,citation, date))
    note = annotation.content.find('text')
    if note:
        export.append('> > ' + note.get_text() + "\n\n")

with open(filename + ".md", "w", encoding="utf-8") as output:
    output.writelines(export)

The result looks good in plain text, on Github as well as a pdf when produced from the Markdown with pandoc. Consider these extracted annotations from Emil Cioran’s very gloomy youth work:

Plain text: ¶

---
title: On the Heights of Despair
author: E. M. Cioran
publisher: 
---

0. "In illness, death is always already in life. Genuine ailment links us to
metaphysical realities which the healthy, average man cannot understand. Young
people talk of death as external to life. But when an illness hits them with
full power, all the illusions and seductions of youth disappear. In this world,
the only genuine agonies are those sprung from illness. " (2019-08-26T11:46:10Z)

...

6. "The vulgar interpretation of universality calls it a phenomenon of quantitative
expansion rather than a qualitatively rich containment." (2019-08-23T10:19:09Z)

7. "Each subjective existence is absolute to itself. For this reason each man lives 
as if he were the center of the universe or the center of history. Then how could
his suffering fail to be absolute? I cannot understand another's suffering in
order to diminish my own. " (2019-08-24T08:21:54Z)

8. "One of the greatest delusions
of the average man is to forget that life is death's prisoner." (2019-08-26T11:38:32Z)

... 

36. "The melancholy look is expressionless, without
perspective. " (2019-08-31T07:28:00Z)

> > De afwezige blik in het oneindige externaliseert de ruimtelijkheid 
die volgens Cioran intern bij de melancholie hoort

37. "The sharper our consciousness of the world's infinity,
the more acute our awareness of our own finitude" (2019-08-31T07:29:48Z)

Github ¶

See this gist.

Pdf through LaTeX ¶

Dynamic BibTeX bibliography paths with spaces

Although LaTeX is amazing in many aspects, I often encounter relatively small issues that somehow take way too long to fix. Today I encountered a very specific use case that gave me a headache, and I want to write up my solution so I never have to think about it again.

The scenario ¶

I’m currently working on my bachelor thesis for Artificial Intelligence, which is due in a week, so I have no time to waste. My thesis lives in a github repo, so that I always have my latest work available depending on whether I work from my laptop running Arch Linux, or from my desktop running MS Windows. My bibliography file is also saved in that repo, so loading the bibliography file from LaTeX is as trivial as \bibliography{thesis}, which loads the file called thesis.bib.

However…

I’m using Mendeley as my reference manager, and in the past exported a group of references manually to a bib file. However, currently I’m updating my references very frequently so that manual copying becomes an annoyance. It turns out that Mendeley has a BibTeX synchronization option that keeps bib files up to date automatically. You can either synchronize one bib file for your whole bibliography, or create a bib file per group of references. The latter option is appropriate for me, because I grouped together all references for my thesis. Unfortunately, you cannot choose an export folder per group. Instead, all bib files will be exported to a single directory. It does not make any sense to store all my bib files in the repository for my thesis, so I had to put the folder somewhere else on my system.

This is where the trouble starts. This situation created two issues for me.

Because the bibliography file now lives outside the repository on my desktop, I would not have access to it on my Linux laptop without manually copying files again.
I now have to provide a path, but both my Windows path and the Mendeley export files contain spaces in them.

Solutions ¶

In order to solve the first issue, I loaded \usepackage{ifplatform}. This allows LaTeX to do an operating system check. But in order to do so, you need to give the compiler explicit access to your shell through a shell-escape. I did so with the following command: pdflatex -shell-escape -job-name="thesis" master.tex

The idea is that I will specify the bibliography path both for my Windows and Linux system within a conditional, so that I can work on my TeX files from both systems without having to adjust anything.

Solving the second issue was a pain. I had a lot of trouble making LaTeX deal with spaces in my Windows path. This issue never occurred before because I can straightforwardly use relative paths that are completely contained within my repo and thus do not point to different directories on different systems. Ultimately I found a solution that worked. If you want to get around spaces in LaTeX on Windows, either 1) rename whatever contains the space, 2) use a legacy DOS path.

In order to get the DOS variant of your path, you have to open your command prompt (not PowerShell, it seems), and run dir /x. Do this for all folders that contain spaces, as this path representation does not contain any spaces. These paths however do contain ‘~’, which you need to escape with \string.

Combining these two fixes produced the following solution:

\ifwindows
\bibliography{C:/Users/EDWINW\string~1/Bib/ARTIFI\string~2}
\fi
\iflinux
% the bib command for linux
\fi

The Windows-style corresponding path was C:\Users\Edwin Wenink\Bib\Artificial Intelligence-Bachelor Thesis. (Note how that also uses ‘' instead, which is annoying because that is an escape sequence in LaTeX.

Okay granted, the easier solution would have been to go for option 1 by making Mendeley not export any spaces and then still go for relative paths… But that is a statement by Captain hindsight. What I would have done ideally, is simply make a reference to my home folder with ~ like you would do on Unix based systems, but LaTeX doesn’t support that feature and I could not find a quick hack. Let me know if you do!

This domain joined the IndieWeb!

I joined the IndieWeb!

What does that mean? For the long version, I recommend reading An Introduction to the IndieWeb. Here is a super short version:

This web domain now is my main online identity, and I can use my domain as a way of authentication with a) “rel=me” links b) and my domain name via IndieAuth

Examples:

1a: I was invited to the Mastodon instance of @arjen and verified my identity as “Edwin Wenink” by linking from Mastodon to my domain, and then from my domain to Mastodon as such: < a rel="me” href="https://idf.social/@edwin">Mastodon</ a>. IndieWeb applications look at these “rel=me” links as an identity claim, and can confirm that these two domains point to each other. As a result, you can see my domain with a green check mark on my profile.

1b. I logged in with my domain name on webmention.io using IndieAuth. Because my domain and GitHub were linked through “rel=me” links, I could authenticate using GitHub, while using my domain name instead of GitHub credentials.

My content now follows the microformats2 format, which allows other members of the IndieWeb and related applications to find and parse my content and my online identity in a unified manner.
I can POSSE content to other sites if I want to, and feed responses back into my own website using webmention.io. POSSE simply means that you publish everything on your own website, and “syndicate” a linked copy to other places. This can be done in such a way that the responses to your copied post on that other website are fed back into your own website again through “webmentions”. This thus facilitates all kinds of interaction with social platforms or other blogs without leaving my own website. Most importantly, contra usual social networks, all data of this interaction is controlled through my own domain, collected in one place, belonging to and shaping a sensible online identity.

In principle this interaction requires that other services also follow IndieWeb standards, but luckily there are services such as bridgy that are able to translate e.g. tweets into “webmentions” following the microformats2 format. You can either handle these webmentions yourself in order to display them on your website, or let another service handle the webmentions. I do the latter since I have a static website.

So for example, let’s assume there exists a possible world in which I would tweet. Then I could post tweets from my own website by POSSEing the tweets, feeding back the responses to webmentions.io with bridgy, and maintain all my tweets including responses, even if Twitter goes bankrupt or becomes super evil. I could for example also post comments on GitHub pull requests on my own website, and then syndicate them to the appropriate place on GitHub. There is even a bridgy for federated networks.

To setup everything, I simply followed the steps of indiewebify.me, the sole purpose of which is to help you make the transition easily. Most of it is relatively straightforward if you read up on the underlying principles, but I have to admit I got lost in the IndieWeb wiki at least four times before getting the point and finding the right links. So I hope this post provides some pointers if you are interested.

To interact with the webmention.io API in order to show webmentions under this post, I used this JavaScript gist. What’s also very nice is that I can subscribe to an RSS feed of webmentions coming in, so I keep up to date about responses to my website in real-time.

There’s still much to learn for me, but my website now fulfills the minimum requirements to be part of the IndieWeb. But a more interesting question is perhaps: What does this mean all mean for you?

It means that you can now, in addition to my not-so-regular “regular” comment system, react here to my posts through your own service. As long as your reaction follows microformats2, it can be displayed directly under this post. In contrast to my normal comments, which I store on my own domain, the reply will thus live on your own social network/site/domain. I could of course decide to maintain a repository of copies of responses, but nevertheless you maintain your authority over your data on your own domain. What you see now under my post is merely a link pointing to your response, without any reference to a central repository. In this way a decentralized interaction between individual personal websites takes shape, which can be the basis of a network of federated conversations.

Isn’t that how the web was supposed to be? Really a web.

If you make your website IndieWeb compatible, let me know below through a “webmention”. You can submit your reaction to be displayed by filling in the URL of your reaction (again: look at microformats2). You can see an example reaction here, which is linked below in the brand new “Webmentions” section. To conclude, I added some useful links on my blogroll.

Vacancy Recommender Hackaton with Spark

BigData Republic organized a small hackathon for the Big Data course I currently follow at university. The challenge was to build a job recommendation system using real data from one of their clients, RandStad, which is a big employment agency. To my surprise, I ended up with the highest score and went home with a nice book as a prize. I was fully convinced that the score I achieved was very low, and I know for a fact that the road to victory had way less to do with intelligence than with strategic pragmatism. I will not share the Spark notebook itself, as the data we worked with is not open and much of the code was already provided by BigData Republic. Nevertheless I did gain some insights that I would like to share.

The challenge ¶

Employment agencies such as RandStad want to show customers looking for a job the most relevant vacancies, given their preferences. The challenge for this hackathon was to build a recommender system that predicts a top 15 of vacancies, that can be shown to the user.

Data ¶

All data was anonymized.

A dataset containing information about the behavior of clients in the webinterface of RandStad. It stores whether users opened a particular vacancy, started an application or finished a vacancy, alongside further information about that vacancy, such as how many hours per week it is, the wage per hour etc.
A dataset of user profiles storing user preferences, such as the desired wage, minimum and maximum working hours, and maximum travel distance.
A dataset of vacancies, of which we will make a selection for recommendation.

Architecture of the solution ¶

The basic model used for recommendation is Collaborative filtering using alternating least squares.

There are two basic ingredients for this type of recommendation systems:

We have some data of users using some items, e.g. buying products in a supermarket. We can represent this in a user-item matrix. However, most users do not buy all items, and most items are not bought by all users, so this matrix is sparse, i.e. mostly filled with zero-entries.
We thus need some way to associate users with products they didn’t buy yet so we can potentially recommend those products, based on the knowledge we already have of user preferences for particular products. In other words, zero-entries need to be filled in with a preference estimation. The Collaborative Filtering with ALS technique does this through finding a factorization of the user-item matrix into two matrices with lower dimensions, that map users onto a number of latent factors (a “user profile”), and these latent factors back unto the items (an “item profile”). With ALS one tries to find two matrices that approximate the bigger input matrix when they are multiplied with each other. Based on these smaller estimated matrices with latent factors, it is possible to re-compute the user-item association matrix, which now has preference scores for items that previously had zero-entries.

To implement this model in Spark, there are two major things to take into consideration:

Implicit versus explicit feedback ¶

Preferences of users for particular products can be explicit, for example when you ask users to rate the products they buy on a scale from 1 to 10 in a questionnaire. However, one can also have an implicit measure of preferences. If for example a particular customer very often buys cucumbers, we can infer from that that user has a preference for cucumbers, even though we do not have an explicit normalized rating of cucumbers.

When it comes to Big Data, it is more likely that you have implicit preference data at your disposal. In the case of this hackathon, the indirect information we have of customer preference is a log of what vacancies users click on in the vacancy search machine of RandStad. If users click more on a particular type of vacancy, e.g. for management functions, we can infer this user prefers management functions, rather than for example being a cashier in a supermarket.

Cold-start problem ¶

Another challenge for this setup is the so-called cold start problem. Computing an user-item association matrix for a given set of users and items is computationally quite expensive. But in the case of a big employment agency, new job vacancies come in continuously. Unless you retrain the whole model, you then cannot recommend these new vacancies, which obviously is very undesirable. At the same time, it is prohibitive to continuously redo all your work to include these vacancies in real-time.

The workaround suggested by the people from BigData Republic and used in this hackathon, is to not train the recommendation model on user-vacancy preferences, but instead on user-function preferences. This is a good solution because function titles are not as volatile as individual vacancy descriptions. In other words, if a new vacancy comes in, we already know the preference of a user for that function title, because the ALS model is trained on many other vacancies with the same function description.

We thus end up with a model like this (written in Scala):

 val als = new ALS()
  .setMaxIter(20)
  .setRegParam(0.001)
  .setRank(10)
  .setUserCol("candidate_number")
  .setItemCol("function_index")
  .setRatingCol("rating")
  .setImplicitPrefs(true)
val model = als.fit(grouped_train)

grouped_train is the data of user clicks where vacancies are grouped under their function name.

Recommending vacancies ¶

But given that basic model, we have a recommendation score for functions, and not vacancies. If we take the top 3 preferred functions for a user, and then join all vacancies on these function descriptions, then we end up with a very large list of recommended vacancies for a user.

Therefore the rest of the work in the hackathon was to come up with a good way of selecting a top 15 in this long list of vacancies. This is done by joining in profile data containing further user preferences such as the desired wage, working times, and maximum traveling distance. Based on that information you can either filter out vacancies, or integrate these preferences in a final weighted recommendation score.

The end result of this whole process is a top 15 of vacancies to first display to the end user.

Parameter optimization, weighing factors for a final prediction ¶

Everyone used the same general approach with the ALS model, so what distinguished my solution from others where 1) model parameters and 2) further scoring and processing of vacancies based on profile data.

This is where the hackathon really started feeling “hacky” to me.

A major practical limitation was that I was running a Spark notebook on a real-life data problem, within docker, on an old ThinkPad with limited computing power and memory. This effectively resulted in the Spark notebook kernel dying on me regularly, so running the whole data pipeline even once was quite a hassle. Using fancy techniques to search for optimal parameter settings where thus out of the question for me, and I had to resort to playing around with parameters manually.

Especially because running the whole process took a while, I really wanted to be smart about what parameter combinations I tried out. But the somewhat disappointing answer (not a bad answer though) I got from one of the BigData Republic people was that there were no very specific rules of thumb, for example for choosing the amount of latent factors in the ALS model. Normally, instead of having 12Gb of working memory, similar Spark code would be run on a cluster with 1TB of working memory… which allows automated search for the best parameter settings.

From there on pragmatism took over. With respect to model parameters, the adagium “higher is better” did not hold for me, first of all because it made my pc crash, and secondly because the risk of overfitting on the training data became larger. So w.r.t default ALS paramaters, I actually only lowered them: less iterations and less latent factors in the matrix factorization.

The largest improvement in my final score was achieved by using profile data and weighing various factors differently. We computed a score for whether the vacancy matched the preferred working hours or not, and a normalized score for how far away the job is from the candidate. These factors, together with the recommendation score for the function title of a particular vacancy, were weighed together to produce a final score per vacancy. It turned out that people care a lot about how far the job is, and I gave this factor a very big weight of 10:1 compared to the recommendation score for the actual function title (but note that only vacancies for the top 3 function descriptions were taken into account, so the ALS model already fulfilled its purpose).

Result and reflection ¶

The final score for the competition was a very simple recall measure, i.e. what percentage of the vacancies candidates actually applied for (can be extracted from the dataset of browsing behavior) was recommended in the top 15 vacancies by the recommendation model. My final recall score on a test set was 16.8% (19.8% on the validation set). A baseline performance of 2.9% for comparison was calculated by always predicting the 15 most popular vacancies.

I thought my score was pretty low (and I’m sure it is) so I was very surprised to win, but given that all competitors were beginners and faced similar hardware issues as I did, the playing field of recall scores was more or less between 13-17%. People with more interesting ideas about parameter optimization where probably not successful in their efforts due to serious hardware limitations. Perhaps people also put more effort in optimizing their ALS model, only to see it overfit on the training data and really drop in score on the test data. The overall impression I am left with, is that real data science is extremely hard to do properly. For the mortals not designing the algorithms and data structures themselves, the most intelligence is required for choosing the right methods for the problem at hand, and making smart design decisions on what information to exploit. But apart from that, I have the feeling that the average attitude is: please don’t ask too about the internals of the algorithms or the meaning of a parameter setting. I suspect that for many people in the data business “data science/engineering” is mostly slapping together pre-existing models and making computers crunch a lot on optimizing them.

Tools used ¶

Docker
Scala
Spark ML
Spark Dataframes
Spark SQL
My poor old ThinkPad