Two perspectives on a plain-text writing convention

Many Vim users, if not most, are programmers. Many of the blogs you will find online about Vim are geared towards programming, along the lines of: “what’s the best plugin to use for turning Vim into an IDE”. On this website I’ve instead focused more on Vim as a tool for writers and note-takers. But, having a background in programming and Unix as well, I took some typical Unix conventions to my writing in Vim (but these conventions are not necessarily specific to Vim). I thought I’d share these with you briefly and explain their rationale.

One of the beauties of Vim is, however, that it’s a tool you can make fit your own needs. A philosopher and friend, Boris van Meurs, uses Vim for his daily note-taking and I thought it would be great if he’d offer a contraposition to my own considerations from the perspective of a writer.

Edwin: One line, one sentence ¶

Vim is a power tool in itself, but is even more powerful when integrated in a rich environment of Unix tools. To offer better integration with common utilities in Unix-y ecosystems, I stick to the following convention:

One line, one sentence.

I’m a quite firm believer in this convention and usually go to lengths to convince collaborators to stick to it as well. Of course, this convention does assume you are writing in markup or markdown where line breaks do not end up in the final text (such as latex).

A first argument for this convention is that most command line utilities operate on lines. grep is a classic example. If you grep a file on a particular search pattern, this tool will return you all matching lines (not sentences). Now, if you write a long paragraph on a single line (so without any hard breaks between sentences), grep will return a large blob of text for each search hit, not the actual sentence in which the search pattern was found. Especially when grep is just the beginning of your pipeline and you want to perform some further actions on the search results, this may be undesired behavior.

Secondly, having one sentence per line also makes collaboration easier. When I have ten sentences in a paragraph all on the same line, I’d end up saying things like “the sixth sentence on line X, that starts with Y". When instead I have a sentence per line, I can just mention the line number without any ambiguity about which sentence I’m talking. If I want to do some operation on that sentence, I can just directly apply it to that line as well.

Thirdly, a more practical point is that navigation in Vim is easier with one sentence per line. For example, again consider the case that you have a paragraph of ten sentences on a single line. Now try to navigate efficiently to the sixth sentence. If you have one sentence per line, this can be done with 5j. If not, you have some options but they are all a bit more laborious. Some people may be tempted to scroll through the line to the location of interest. When softwrapping lines, you can also traverse visual lines with gj. You can of course also use forward search with / to jump to the sentence of interest. Besides, Vim is smart and still knows what a sentence is, so you can skip five sentences with 5). But all these things take a bit more cognitive load (for me at least). For one, it will be hard to see how many sentences you have to skip ahead, whereas with one sentence per line that’s easy to see, especially when line numbering is enabled with set number.

In general, sticking to the convention will just lead to behavior that is semantically a bit more consistent. Common shortkeys like dd will delete a single sentence instead of the whole paragraph. When you actually want to delete a paragraph, I think it’s semantically more clear and just as easy to leverage Vim’s understanding of what a paragraph is and do dip (delete inner paragraph) or dap (delete around paragraph). Similarly, it is more consistent when j actually moves you to the next sentence and not the next paragraph, for which Vim uses the curly bracket }.

Fourthly, an even more practical point is that Vim tends to slow down for very long lines, amongst other things due to syntax highlighting. I enjoy that Vim feels snappy and lightweight and I like to keep it that way. Boris will mention some workarounds for this particular issue though.

Boris: One line, one paragraph ¶

Unlike Edwin, I do not use VIM for writing code but for text writing. I write my philosophy dissertation in a TeX-file that I edit with VIM (and vimtex). About a year and a half ago, I was still using MS Word for this, as so many unenlightened folks are in their dimly lit caverns of Untruth, unaware that my salvation was close.

My epiphany came when Edwin converted me to VIM as a useful tool for writing my philosophy dissertation. After figuring out how to efficiently write LaTeX in VIM with vimtex, I now use it for every step of the research process: from taking notes to painstakingly constructing the body of my arguments. For this, I always stick to the following convention:

One line, one paragraph.

This means that I just keep typing away on a single line of code until I think I am ready to wrap up my paragraph. Why do I choose to do so from a writer’s perspective?

Well, first, in the way that I use VIM, I do not have much advantage of Edwin’s paradigm when it comes to collaboration, navigation, and semantics:

In my lonely lot as a philosophy researcher, I do not get to collaborate with anybody. And at the seldom occasion that I do work together on a text, chances are that this person is so used to MS Word that she will freak out if I would suggest working in other software like LibreOffice, let alone VIM.
As to navigation, I can navigate on a macrolevel using vimtex’s table of contents, and on a microlevel using “/". That suits my purposes, and I barely ever pay attention to what number any line is. I simply do not encounter situations in which this is necessary.
The only disadvantage for me is that I am less flexible when it comes to semantics, but VIM is flexible enough to work around this. To use Edwin’s example: I cannot easily use “dd” to delete a single line, but then I just use “d$” or “dt.” Still so much better than MS Word.

So, given that the advantages of Edwin’s method are not very effective on me, what are the advantages for me of sticking to one line per paragraph?

First, it helps me to build my paragraphs. When I write my text I always start thinking from structure: each sentence should add something to the development of the core message of a paragraph (yes, I am now overanalyzing this paragraph while I am writing it and, no, I do not think it is perfectly structured). I can do this more easily if I actually see the sentences embedded into the text that it will be a part of. Admittedly, this can, strictly speaking, be done by Edwin’s method, but one needs to put effort in visualizing how the sentences will look when assembled, which distracts me from writing.
Another big advantage is that this method is a bit of cheating in that it makes VIM comes as close to ‘what you see is what you get’ as possible. It also helps me to put myself in the perspective of the reader, who, ultimately, is my sole judge. Perhaps when writing code this is not so important, but I really need to pay close attention to how my text will appear. It is easier if VIM helps me in this, without me having to compile my text, which complicates things as you are scanning two files at the same time.

To go short, using VIM as a text editor, I prefer to write entire paragraphs per line to help me visualize the end result of the text. Writing is an exercise in empathy with the reader. This exercise is easier when the text is already displayed in the form of soft-wrapped paragraphs rather than choppy single lines.

Issue ¶

There is a recurring issue when writing long lines when using vimtex. It gets slooooooow. Like, really slow. Especially in navigation modus, the screen is laggy when one attempts to jump through the lines and the paragraphs.

Of course, this is unacceptable, as Edwin also mentions, because many of us use VIM for its light-weight nature. Actually, I myself started writing my PhD to avoid lag in LibreOffice (even before my MS Word days), which took ages to load the large number of references I had included using the Mendeley plugin.

If the cure for slowness is more slowness, one has not proceeded much.

Luckily, the solution is simple in this case. What turned out to be the issue was that vimtex’s syntax highlighting can mess things up. It is easy enough to fix this.

Just add:

let g:vimtex_matchparen_enabled = 0

to your .vimrc.

That fixes it. Quite simple, right?

Enjoy using VIM!

Extracting Kobo EPUB Annotations

In a previous post I outlined two methods for extracting annotation files from Kobo e-readers. One method was to enable the Kobo export function. Personally, I wasn’t very happy with the default export format, so I also wrote up a quick and dirty code snippet that hinted how to write your own custom export script. Personally, I want my notes to be formatted in Markdown so that I can easily convert them to pdf, html, or you name it. My preferred tool for that is pandoc.

There has been some interest in that script, so I decided to extend on it a bit.

Getting the code ¶

You can download the latest version of the script in this github repository. You can either download the code as a zip, or clone the repository if you know how to use git with git clone https://github.com/EdwinWenink/kobo-notes.git.

Usage ¶

DISCLAIMER: the script works fine for my annotation files (Kobo Clara HD), but please note it is not extensively tested.

The script is written in Python 3, so you need that have that installed on your system. You can download Python 3 here. The script mostly uses default Python modules, but you’ll need to install the BeautifulSoup module. To do that, open a terminal with access to your Python environment and run pip install beautifulsoup4.

To learn how the script works, open aforementioned terminal, and run (this assumes the script is in your terminal’s current working directory) python ./kobo_export.py --help (on Windows the forward slashes are replaced by backward slashes). This outputs instructions on the usage of the script:

usage: kobo_export.py [-h] [-f FILE | -d DIRECTORY] [-o OUTPUT]

Extract KOBO annotations as Markdown files

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  path to annotation file to process
  -d DIRECTORY, --directory DIRECTORY
                        root directory of all annotations
  -o OUTPUT, --output OUTPUT
                        location of output folder (default: current folder)

As you can see, all flags are optional. You can either select a single annotation file to be processed with the --file flag. You need to provide a valid path to that annotation file. Alternatively, if you want to process all annotation files in a directory, you can specify that directory instead. In both cases, you can also specify a directory where you want the extracted markdown files to be placed.

If you do not provide a file nor a directory, the script will recursively look for annotation files in the current folder and its subfolders. If you do not provide an output directory, all files will be written to the current directory.

Combining the options, running the script looks like this:

python ./kobo_export.py --directory "./Dostoyevsky, Fyodor/" --output ./markdown/

This reads all annotations files from Dostoyevsky and puts the extracted notes in a folder called ‘markdown’.

On Windows I encountered an annoying situation. If you have a folder name with spaces (Kobo does this), then the backslash separator actually escapes the closing quote… If this happens to you on Windows, you can solve this as follows:

python .\kobo_export.py -d '.\Dostoyevsky, Fyodor\\' -o .\markdown\

Formatting ¶

The extracted notes will have a YAML header with meta information.


---
title: On the Heights of Despair
author: E. M. Cioran
publisher: Unspecified
---

With respect to the previous post I changed several things. Extracted notes are numbered and sorted in order of occurrence. For some reason they weren’t sorted before. 0.280 is the progress indicator: the note was made at 28% progress in the book. I now only display the date, without timestamp to avoid clutter. By default there’s a lot of weird line breaks in the quotes, so I sanitized that a bit.

Highlights without annotations look as such:


46. "Solitude is the proper milieu for madness." --- *0.280, 2019-12-31*

47. "In comparison with despair, skepticism is characterized by a certain amount of dilettantism and superficiality. I can doubt everything, I may very well smile contemptuously at the world, but this will not prevent me from eating, from sleeping peacefully, and from marrying." --- *0.288, 2019-12-31*

48. "On the heights of despair, nobody has the right to sleep." --- *0.288, 2019-12-31*

And annotations are displayed as such:


7. "X insults me. I am about to hit him. Thinking it over, I refrain.  Who am I? which is my real self: the self of the retort or that of the refraining? My first reaction is always energetic; the second one, flabby. What is known as “wisdom” is ultimately only a perpetual “thinking it over,” i.e., non-action as first impulse." --- *0.107, 2020-07-28*

> > Wisdom as non-action as first impulse

Converting to other file formats ¶

Because the extracted notes are valid Markdown, you can easily convert them to whatever text format using pandoc. Pandoc is very simple to use. The following command is an example of how to convert one of your notes to pdf.

pandoc mythoughts.epub.md -o mythoughts.pdf

Let me know below if there are any issues etc.!

On Bayesian likelihood

15/10/2020 Extended with an explanation of MAP; minor fixes and changed the title

Take Bob. Bob is not feeling so great and has a runny nose. This is an observation that may depend on various other conditions. Perhaps Bob has a cold, perhaps he has allergies, or perhaps he unfortunately picked up COVID-19. Given that we know Bob has a runny nose, which one of these potential explanations is more likely (assuming for the sake of simplicity that these are the three options)?

Bayes’ theorem allows us to formulate an answer to that question. Let R=True stand for the observation that Bob has a runny nose, and let H be the variable indicating the three hypotheses. Then Bayes’ theorem looks as follows:

$$P(H|R)=\frac{P(R|H) \times P(H)}{P(R)}$$

Where e.g. P(H=cold|R=True) should be read as “the probability that Bob has a cold, given that we know he has a runny nose”. Our confidence in each of the three hypotheses depends on several factors.

To start with, it depends on how likely that hypothesis is in the first place. For example, even during the current pandemic, if you look at the whole population it is still more likely you have a regular cold than COVID-19. This is called the “prior” probability of the hypothesis. Our confidence in the hypotheses also depends on the conditional probability P(R|H), which answers questions like: “Assuming I have COVID-19, how likely is it then that I develop a runny nose?” This probability is commonly called the “likelihood” of the hypotheses. Finally, Bayes’ theorem normalizes the whole bunch into a proper probability by taking into account the “marginal” probability of someone (a random individual from the overall population) developing a runny nose. Bringing it all together, Bob is for example more likely to have a regular cold than COVID-19, if 1) a cold occurs more commonly across the population and 2) almost all people who have a cold have a runny nose (i.e. a cold would be a good “explanation” of the symptoms).

In this terminology we can informally rewrite Bayes’ theorem as:

$$posterior\ probability = \frac{likelihood \times prior}{marginal\ probability}$$

The resulting probability of Bayes’ theorem is usually called the “posterior” probability, because it expresses how much our “prior” confidence in H={h1, h2, h3} has changed after we learn that Bob has a runny nose. Of course, these probabilities change again once Bob takes a test for either of these ailments. And even if Bob takes a test, Bayes’ theorem allows us to take into account the probabilities of false positives and negatives.

In short, Bayes’ theorem is pretty awesome because it can be used to express how the probability of one event depends on related possible events. The application of Bayes’ theorem is not necessarily Bayesian though. If you are a typical Bayesian, you would also interpret the involved probabilities as “credences” or “degrees of belief”, and then apply the process of conditionalization (the diachronic application of Bayes’ theorem over an “old” and “new” moment in time) in order to express how our (subjective) beliefs change when we learn new information.

Now, there’s one term in Bayes’ theorem that caused some confusion when I first considered it more closely. That’s the so-called “likelihood” P(R|H). Bayes’ theorem is derived from the definition of conditional probability. As a conditional probability, we would for example read P(R=True|H=Covid) as follows: “How likely is it that Bob has a runny nose, assuming he has COVID-19”. In that case, we talk of the likelihood of the data, in this case the observed symptom of Bob. In the literature however, P(R=True|H=Covid) is also sometimes called the likelihood of the hypothesis.

Personally I found it helpful to have a look at the terminology of maximum likelihood estimation. In this case our “hypotheses” are parameters for some parametric model that we are using to describe our data. We are then trying to the find the parameters such that this model best describes the data (cf. finding the best hypothesis). I’ll get technical for one minute and then recap in more understandable language.

Assume we have a parametric model with parameters $\theta$, e.g. a probability density function $p_{theta}(x)$. Then the *likelihood* of this parametric model can be written as $$L(\theta|X) = \prod_{x \in X} p_{\theta}(x)$$ As an aside, we usually take the log-likelihood $l(\theta|X) = log \prod_{x \in X} p_{\theta}(x)$¹.

What we are doing with maximum likelihood estimation is finding the best parametric model given the observed data, which means that we want to choose the parameters of our model such that the data is most likely under the assumptions of this parametrized model. This is, by definition, the task of finding the maximum likelihood:

$$ \hat{\theta} = \underset{\theta}{argmax\ }l(\theta|X)$$

where $\hat{\theta}$ is known as the maximum likelihood estimate (MLE). In other words, the MLE is the parameter (cf. “hypothesis)” for which the data is most likely. Since this would be the “best” or “most likely” model, we understand the likelihood of our hypotheses in terms of how probable it is that we observed out data, assuming the hypothesis were true. So in plain language: a good hypothesis for some model assigns a high probability to the observed data.

The Bayesian approach differs from standard maximum likelihood estimation in that it does not straightforwardly assume there is a “true” parameter $\theta$. Instead, we allow uncertainty over our parameters and incorporate this by defining a prior distribution over $\theta$. When taking into account our prior beliefs, the matter of finding the most likely parameter/hypothesis is then to find the posterior distribution. This is the so-called Bayesian MAP problem, namely finding the maximum a posteriori probability. When we write out the probability of our parameters/hypotheses in terms of Bayes’ rule, we get:

$$p(\theta|X) = \frac{ p(X|\theta)q(\theta) }{ \int_{\theta \in \Theta} p(X|\theta)q(\theta) d\theta }$$

Where $q(\theta)$ is the distribution of our prior beliefs over the possible parameters. The above formula assumes that $\theta$ is continuous, but this is not important for now.

MAP is then defined as finding the most likely parameter, so $\underset{\theta}{argmax\ } p(\theta|X)$. Because everything in the denominator is only for normalization and does not depend on our current hypothesis, we can ignore it in the maximization operation. So we get $\underset{\theta}{argmax\ } p(\theta|X)= \underset{\theta}{argmax }\ p(X|\theta)q(\theta)$. So again we see that finding the most likely parameter/hypothesis is a matter of finding the parameters that makes the data most likely (but now also taking into account the prior credence of the hypothesis itself). If we have equal prior belief in all our hypothesis, this is the same as maximum likelihood estimation.

So when P(R=True|H=cold) was called the likelihood of the hypothesis H=cold, this is because that hypothesis assigns a very high probability to the symptom of a runny nose and is thus a likely “explanation” of that symptom (where we understand explanation bare bones in terms of probability).

I still think this way of talking can be slightly confusing and it seems to ignore the prior. But the bottom line is that a hypothesis is more likely when it makes the data, that we observed and know to be the case, more likely.

The logarithm over a product is the sum over the component’s logs, i.e. $log \prod_{x \in X} p_{\theta}(x) = \sum_{x \in X} log p_{\theta}(x)$. The logarithmic function is “monotonically increasing”, which guarantees that the parameter that will maximize the log-likelihood will also maximize the regular likelihood. A sum is easier to work with and this way we also avoid numerical underflow due to the joint probabilities becoming extremely small. ↩︎

Secondary sorting in Python

Let’s say we want to compute the mode of a series of numbers, meaning that we pick the value that occurs most. This is easy enough: we sort on the amount of occurrences, assuming we have some datatype that tracks the amount of occurrences per value. However, we need to deal with the edge case of two values occurring the same amount of times. In other words, after having sorted on occurrences, we need to sort on the value to break the tie.

If we pick the largest value, both the primary and the secondary sorting use the same sorting order. Python’s sorted and sort (the in-place variant) accept tuples as sorting keys, so that you can straightforwardly perform the secondary sorting in one line. First, we get (value, count) tuples:

from collections import Counter

values = [1, 2, 2, 5, 5, 7, 10]
counter = Counter(values)
counts = counter.items()

counts looks like this:

dict_items([(1, 1), (2, 2), (5, 2), (7, 1), (10, 1)])

To reiterate, we want the numbers with the largest count first (2 in this case) and then either pick the smallest or the largest number as a tie breaker. We start by picking the largest value, for the sake of argument. For each tuple x, which looks like (value, count), we first sort on the count ( x[1]) and then on the value (x[0]). We can provide these sorting keys as a tuple. Because we want the biggest counts (primary) and biggest values (secondary) in the beginning of the list, we use a descending sorting order with reverse=True.

values_by_count = sorted(counts, reverse=True, key=lambda x: (x[1], x[0]))

This outputs:

[(5, 2), (2, 2), (10, 1), (7, 1), (1, 1)]

We see that the tuples with the highest counts are in the beginning of the list, and that for the ties with count 2, the highest value (5>2) is listed first.

But what if we want to have the biggest counts first (descending sorting order), while instead picking the smallest value in case of a tie (ascending sorting order)? The handy one liner above assumes that we use the same sorting order for both the primary and secondary key!

So how can we maintain this ease of syntax while using different sorting orders?

Because we work with numerical data, we can use a little hack. We can call sorted using the default ascending sort order, but nevertheless sort on the counts in a descending fashion by sorting on the negative of the counts. So we write:

sorted_counts = sorted(counts, key=lambda x: (-x[1], x[0]))
mode = int(sorted_counts[0][0])

Whereas in the former example the outcome was 5, the outcome now is 2. If you do not have numerical data (e.g. counts), you would have to make multiple sorting calls.

Study Tip: Quiz yourself in Vim

Update 24-09-2022: you can install this functionality as a plugin by writing Plug 'EdwinWenink/vim-quiz' in your vimrc. It contains updated Vimscript and offers two commands and handy default bindings that you can override. See the README.

I haven’t written much for this website recently because it’s that time of the year again: exams and project deadlines. Here’s a taster of what’s going on in my studies: experimenting with gradient boosting decision-trees for forecasting; writing about Bayesian approaches to inference to the best explanation; developing and testing a podcast app; programming autoencoders, a GAN, and learning about generative modeling; writing an ethics policy brief on an AI-related issue.

This is also the first real stress test of my note-taking system and a chance for me to evaluate if I can pick the fruits of my labour. Did consistently putting effort into my note taking system promote comprehension? Retention? Does it help me study more effectively? Especially now that the Covid situation resulted in an abundance of take-home exams, perhaps my notes will share in some of the heavy lifting.

What I can already report on in any case, is that taking notes in Vim makes studying more enjoyable. I’m not just studying for this one exam, to get this one grade. Instead, I care about adding interesting thoughts and connections to my beehive of notes. They may come in handy many years into the future.

However, when taking notes of lecture material I make my notes in a different style than usual. Even though I still isolate interesting concepts and solidify them into their own note ( the principle of atomicity), my lecture notes are generally way longer than Zettelkasten-style notes. Another thing I like to do - and this is topic of this post - is to leave questions for myself throughout my notes. I consistently do that using an extremely simple convention: I leave a Q in bold in my study notes with the question. I make all my notes in Markdown, so in plain text that looks like **Q**.

You will never write **Q** in a normal sentence, so this term is really easy to search on without getting “false positives”. To review these questions, just do a quick vimgrep on the current file. You don’t need fancy search tools if you just search the current file. I just make a quick mapping \nq ( ‘\’ is the default leader key) with the mnemonic “note quiz”.

nnoremap <leader>nq :vimgrep /\*\*Q\*\*/ %<CR>

It looks a bit awkward because the asterix needs to be escaped in order to not be interpreted as a regex wildcard (Update: the plugin version of this code uses a proper escape function for this.). This just searches for our pattern in the current file (indicated in Vim with %) and does not need our confirmation (<CR> “presses” Enter for us). By default, vimgrep populates the quickfix window with the search results and opens it automatically.

I like that the question is previewed at the bottom of your screen like this:

(1 of 8): **Q**: Why is the triangle mesh so useful for real-time rasterization?

You can navigate the study questions with :cnext (:cn for short) and :cprevious (:cp for short). If you accidentally close the window, you can reopen with with :cwindow (:cw for short).

Okay, now forget this again.

Just install vim unimpaired which maps :cprevious and :cnext to [q and ]q. This is perfect for our purpose, because we can read these mappings as “previous question” and “next question”. The square brackets do not look intuitive, but you get used to them in a few minutes. Notice that the opening bracket is on the left of the closing bracket on an English keyboard, so that “left” corresponds to “previous”, and “right” to “next”.

If you want to have all questions in a separate file, that’s easy enough: just focus on the quickfix window and save it as a file! E.g. :w study_questions.txt.

By the way, I often also answer my own study questions below the question. This is not a problem, because jumping to a question will usually display it at the very bottom of the screen, so I can’t accidentally have a sneak peak at the answer.

Of course you can do this better, and you can extend it easily to also search other notes. I currently have no need for that, and I really wanted this to be a ten second hack. After all, I’m supposed to be studying now!

N.B. ¶

You can of course also write a standalone script to collect the questions and dump them in a file. On Unix-like systems, you could do something like this:

if [[ -e $1 ]]
	then
		filename="Questions $1"
		count=$(grep -Fc '**Q**' "$1")
		report="$count Questions extracted from $filename"
		printf "# $report \n \n" > "./$filename"
		grep -Fn '**Q**' "$1" | sed -e 's/\*\*Q:*\*\*:*//g' | sed -e 's/^/- /' >> "$filename"
		echo $report 
	else 
		echo Provide a file as an argument
fi

I wrote that script maybe two years ago and didn’t bother to check it again, so use at your own discretion.

Pasting the collected questions ¶

ADDITION 2021-03-16:

You can also create a function to paste the contents of the quickfix window in your current buffer.

" Paste from quickfix list (handy to collect the questions somewhere)
nnoremap <leader>pq :execute PasteQuickfix()<CR>

:function! PasteQuickfix()
:   for q in getqflist()
:       put =q.text
:   endfor
:endfunction

This works well, but one annoying little detail is that this function sends you to the top of the file. I don’t really know why. If you do, please let me know!