In chapter 4 of his book “Human Compatible”, Stuart Russell discusses various harmful applications of Artificial Intelligence (AI) technology, including the ways AI increasingly makes surveillance more effective, to the point where the “Stasi will look like amateurs by comparison.” We should point out that surveillance is not just “observing,” but is a method for controlling the behavior of people. Okay, so what’s new? Well, AI technology introduces new ways of controlling people. Because we live such a big part of our lives online nowadays (including our consumption of facts, news, our communication with peers etc.), “AI systems can track an individual’s online reading habits, preferences, and likely state of knowledge, they can tailor specific messages to maximize impact on that individual while minimizing the risk that the information will be disbelieved.” Forms of coercion can be made more effective by using personalized strategies (e.g. Russell mentions blackmail bots that use your online profile). A more subtle way to alter people’s behavior is to “modify their information environment so that they believe different things and make different decisions” (my emphasis). In addition to more accurate personalized profiles, AI systems can constantly adjust themselves to be more effective, based on feedback mechanisms such as mouse clicks or time spent reading.
After discussing new phenomena such as deep fakes, Russel makes a bit of a leap and contemplates the various ways in which governments (on the less friendly side of the spectrum) may use AI-supported surveillance to directly control its citizens by implementing “rewards and punishments based on behavior”. In a European context this section may sound slightly unrealistic, but it’s not that unrealistic if you consider a certain… Asian country using surveillance to experiment with evaluating civilians using point systems. If you are not worried about governments in particular, you may think of similar, but less severe and less obvious strategies used by big internet corporations, or the upcoming era of health-related apps. One example I’m thinking of is insurance companies given you a lower health insurance bounty when you use smart watches to show you are moving enough per day (certainly important, but a very specific, limited, unilateral concept of health).
In any case, what these examples share is that “such a system treats people as reinforcement learning algorithms, training them to optimize the objective set by the state” (my emphasis, and again: you could perhaps substitute “state” by your favorite big evil corporation). So not only does the technology build up profiles based on behavioral feedback, the humans themselves, Russell suggests, will be evaluated with a scoring function, as if they themselves work like these algorithms.
What I appreciate about the upcoming argument is that Russell specifically targets those states/companies with a “top-down, engineering mind-set”, which is at first sight quite reasonable and which I suspect to be pervasive. It can also the mindset of techno-optimists with good intentions. But then again, however extreme the surveillance, it’s always for “the greater good”, so good intentions only go so far. Technocracy comes into the picture when we ask exactly who decides what is good, and how we measure that?
Russell reconstructs this engineering-style reasoning as follows, with respect to governments:
Or you can come up with some alternative version, like: it is healthier to not drink alcohol and better to have less alcohol-related violence; we have studied and understood the effects of alcohol; therefore it is better if we monitor everyone, punish those who drink alcohol, and reward superfood-munging yoga hipsters (is that a thing?).
Then Russell provides three arguments against this top-down engineering technocracy mindset:
“First, it ignores the psychic cost of living under a system of intrusive monitoring and coercion; outward harmony masking inner misery is hardly an ideal state. Every act of kindness ceases to be an act of kindness and becomes instead an act of personal score maximization and is perceived as such by the recipient. Or worse, the very concept of a voluntary act of kindness gradually becomes just a fading memory of something people used to do. Visiting an ailing friend in hospital will, under such a system, have no more moral significance and emotional value than stopping at a red light.
To stick with Russel’s example: I think the issue here is not that there would no longer be acts of kindness; but I’m wondering how one would recognize them as such in this (hypothetical?) state. Under this regime, the only way to “prove” to its recipient that something is in fact an act of kindness under the all-seeing eye of a scoring metric, would be to show 1) that you understand how your actions are evaluated and 2) then act against the maxim of score maximization. It would not be sufficient to be altruistic and selfless, a kind act would have to be self-destructive (or one might say, extremely altruistic). Perhaps only then the receiver would see rationally, without relying on empathy and good faith, that some intrinsic moral and human value is the best explanation for the shown behavior, rather than score optimization. That’s of course a hypothetical reflection under the assumption of all-pervasive surveillance; otherwise another option would be to communicate covertly.
The paradox of this extreme example is that by trying to optimize desirable human behavior (e.g. visiting that ailing friend) and by quantifying the human values that they promote (e.g. kindness), you lose the quality of what is sought after, similar to the capitalist perversion of friendship or love when they become part of an economy of (monetary) exchange. Friendship or love cannot be part of a contract, because that would imply you can demand some utility from the other and enforce this demand. (I am not denying that friendships and love relationships can have and in fact do have utility. I am rather saying that only the cynical and the sociopathological would argue they are about utility and, as a consequence, can be quantified. But hey, perhaps I’m too romantic.)
In other words, the deeper issue that Russell addresses with this extreme example is that the true objective would not be captured by any explicitly stated objective. This is a fundamental problem of what Russell calls the “standard model” of AI, where “intelligent” machines are optimizing a “purpose put in the machine”:
Second, the scheme falls victim to the same failure mode as the standard model of AI, in that it assumes that the stated objective is in fact the true, underlying objective. Inevitably, Goodhart’s law will take over, whereby individuals optimize the official measure of outward behavior, just as universities have learned to optimize the “objective” measures of “quality” used by university ranking systems instead of improving their real (but unmeasured) quality.
I was unfamiliar with Goodhart’s law, but Russell’s explanation is exceptionally clear (and the university example is awfully accurate). Marilyn Strathern’s paraphrase is elucidating: “When a measure becomes a target, it ceases to be a good measure.” Russell points out that this problem applies when humans design intelligent machines and algorithms to minimize some loss function (the standard model of AI), and he provides many examples where AI optimizing towards some objective has adverse effects. This problem is just as bad when we give humans the algorithmic treatment, so to say. In economics and game theory the issue is people gaming the measure, which may in fact be to the detriment of what you are trying to promote.
Finally, the imposition of a uniform measure of behavioral virtue misses the point that a successful society may comprise a wide variety of individuals, each contributing in their own way.”
The latter point is relatively self-evident and adds to the earlier reasoning: can you capture the quality you are trying to promote by subjugating people to a relatively uniform quantitative measure?
Russel’s book in particular deals with the problem of specifying objectives in AI technology designed to optimize towards a given goal. But you could perhaps generalize his argument to technocracy in general: technocracy firstly assumes you have an (engineer) class of people that know what is best, which is both optimistic and paternalistic, but secondly, even when they do know what is best, it is quite a fundamental issue whether you can also translate that into procedures that truly achieve the intended result.
N.B. I’m reading this book in EPUB format so I can’t quite add page references. All citations can be found in Chapter 4, Surveillance, Persuasion, and Control, section “Controlling your behavior”
This is the third post in a series of sorts about note-taking in Vim. I have silently kept playing around with the system outlined in the previous posts ( -1, -2). Some things I have abandoned, some are improved and some are changed. I have inserted several updates (marked as “UPDATE”) in the previous posts in case you are curious. If we have reached some sort of equilibrium at the end of this series I’ll make sure to create a place where people can easily download all relevant configuration and used scripts, but for now everything is a matter of “voortschrijdend inzicht,” a beautiful Dutch phrase that’s hard to translate and certainly hard to pronounce for most of my readers. Given the fact that my previous posts on Vim are well-received and several people are trying it out, it’s time to pick up writing again and start chipping away at the backlog.
To give you a taster of what’s to come:
In this post, I want to discuss a seemingly minor issue that will nevertheless potentially have a big impact on your workflow. It concerns the quick creation of new timestamped notes in your note directory or Zettelkasten, and then easily creating a correctly formatted Markdown link to it from another note. If you are impatient, you can have a look at the screencast below. If not, let me give a brief introduction to show you where the potential workflow issue is.
The authors of this nice Zettelkasten blog argue that you should give up trying to categorize your notes in hierarchical folders and instead should throw everything into one big flat Zettelkasten. This is scary, because notes that do not have many interconnections with other notes may be forgotten when the Zettelkasten gets big (it will be forgotten by you for sure, but also “forgotten” by the Zettelkasten itself if it lacks links). Nevertheless, I’m making the transition because I want to commit to the idea of my note collection being dynamic, organic, an entity of its own, rather than it being a static dump. In order to make this transition, you start to fully rely on your tools. Since I’m hacking together my own tools some issues came up, in this case with using timestamps in filenames.
The
O.G. Zettelkasten of Luhmann had an extensive naming convention for organizing notes, but it was more of a necessary evil because computers were not in the picture yet.
Given that we now have digital means of naming, searching, and linking notes, a strict naming convention for the notes is an unnecessary complication that blindly applies an “analogue” mindset to a digital solution.
The authors from Zettelkasten.de are however strong proponents of the more superficial organization of notes by their time of creation, which they do by inserting a unique timestamp at the beginning of the filename.
For me, the best argument for this approach is that unique timestamps are a good way of recovering links through potential filename changes.
The main reason I did not use them was however that I used Vim’s default filename/path completion (C-x C-f
in insert mode) when making Markdown links.
This worked fine for me as long as filenames are meaningful, but this just doesn’t cut it anymore when all filenames start with a timestamp, as you would have to manually start typing the timestamp.
An early adapter of my Vim experiment,
Boris, did however use long complex timestamps and noticed interlinking was getting in the way of his workflow.
Since he now makes all his notes for his PhD in Vim, I certainly do not want to be responsible for trouble!
So here it goes …
Before we solve the bigger issue, let’s add some convenience. When using timestamps, manually typing out the date and time is a pain in the ass. Each timestamp needs to be a unique identifier, so this means you at least also want to include the time of day, potentially up to the amount of seconds if you regularly make multiple notes within a minute. I don’t personally, but the code below is very easy to adjust to your own needs.
First, we declare a variable that holds the location of our Zettelkasten, so we may use it in multiple places without having to retype the whole path.
let g:zettelkasten = "/home/edwin/Notes/Zettelkasten/"
Second, we want to define our own custom command that 1) pre-fills all the stuff we don’t want to type, namely the timestamp and the extension (I always use markdown), and 2) that prompts you for the name of your note:
command! -nargs=1 NewZettel :execute ":e" zettelkasten . strftime("%Y%m%d%H%M") . "-<args>.md"
This will produce a filename like “201704051731-my_awesome_note.md”.
Don’t bother with understanding this.
Writing it certainly gave me a headache because I’m new to Vimscript.
What is interesting for you is “%Y%m%d%H%M” because it indicates how you want to format your datetime.
You can read about this by typing :help strftime
and otherwise
this is a good resource.
Now all we have to do is declare a mapping to call our command.
I use the “
nnoremap <leader>nz :NewZettel
Done! Now let’s solve the real problem of effortlessly linking to this note. Warning: it gets pretty sexy ahead.
The main issue was that we never want to type timestamps in order to reap the benefits of path completion to get a Markdown link to the file we want. Now that we are at it, having to format a Markdown link like [description](link) also takes time, so let’s automatize that as well.
My new solution is to rely on my fuzzy file finder to find a file and automatically create a markdown link to it. I use CtrlP with ripgrep, but fzf is also a great choice, see fernando’s comment for a fzf solution. This is a great solution because the fuzzy nature of it allows you to ignore the timestamp altogether. But it also allows you to search on a partial fragment of the time and a part of the note title. I can imagine you for example remember making a note about Zettelkasten somewhere in 2020, but you don’t quite remember the exact date (unless you are Rain Man) and neither the precise name of the file. No problemo! Boot up CtrlP and search on “2020Zettelkasten”. We can extend CtrlP to then automatically create a markdown link to the matching file, with Ctrl-X. Have a look at the short screencast I made.
I started with code provided in this StackExchange post and adjusted it to create valid Markdown links:
" CtrlP function for inserting a markdown link with Ctrl-X
function! CtrlPOpenFunc(action, line)
if a:action =~ '^h$'
" Get the filename
let filename = fnameescape(fnamemodify(a:line, ':t'))
let l:filename_wo_timestamp = fnameescape(fnamemodify(a:line, ':t:s/\(^\d\+-\)\?\(.*\)\..\{1,3\}/\2/'))
let l:filename_wo_timestamp = substitute(l:filename_wo_timestamp, "_", " ", "g")
" Close CtrlP
call ctrlp#exit()
call ctrlp#mrufiles#add(filename)
" Insert the markdown link to the file in the current buffer
let mdlink = "[".filename_wo_timestamp."]( ".filename." )"
put=mdlink
else
" Use CtrlP's default file opening function
call call('ctrlp#acceptfile', [a:action, a:line])
endif
endfunction
let g:ctrlp_open_func = {
\ 'files': 'CtrlPOpenFunc',
\ 'mru files': 'CtrlPOpenFunc'
\ }
I just love it. Irregardless of whether I will use timestamps in my filenames, this will greatly speed up interlinking notes in my Zettelkasten.
EDIT 09/02/2021: a previous version of the post did not escape the +
regex modifier, which is necessary in the vim regex dialect.
As a result, the timestamp was not correctly removed in the created link descriptions.
The screencast below uses the old incorrect version.
EDIT 22/09/2022: I updated the regular expression and additionally remove underscores from file paths.
The behavior of the regex is as follows: 20210340-note.md
becomes [note]( 20210340-note.md )
; note.md
becomes [note]( note.md )
and index_notes.md
becomes [index notes]( index_notes.md )
.
When I’m busy I am usually very motivated to do side projects, but paradoxically I find it harder to stay motivated and productive when I find myself with more time on my hands. My guess is that many people noticed a similar effect during this Corona crisis, thinking something along the lines of “well at least I now have time to do project X that I’ve been meaning to do for a while”, only to find that it’s not that easy to stay motivated after being locked in the house for several days without external structure. I felt the vague need to do a project, but more specifically I wanted it to be a project that would be collaborative and social, to make up for the social interaction lost due to Covid-19. Well, here’s the perfect project: setting up your own “tilde club” along the lines of tilde.club or tilde.town (see here for my tilde.club account). There’s something in this project for people of different interests and skill sets. My main interest was to gain some experience with setting up a web server and being a system admin of a UNIX-like system. For my non-technical friends, it was a fun project because it provided a safe environment to get to learn the command line and learn how to write web pages. As for the content of those web pages: your own fantasy is the limit. Okay fair enough: your skill level is also a limit, but you can make fun wonky webpages with basic html and css. Here’s a quick write up of the steps to get you started.
Our server is hosted on Google Cloud Platform. They have an “always free” tier for which very specific restrictions apply both for the hardware and the location of the Virtual Machine (it has to be located in a specific part of the US). A virtual machine is basically a computer program that emulates a computer system on which you can run a particular operating system as if it were a regular computer as you have it at home. In this quick guide I’m assuming you are running Linux. Usually, providers offer readymade VMs with operating systems pre-installed, so you won’t have to do this manually.
If you opt for the always free tier, your resources will be limited (580 Mb RAM). That’s okay, in fact, it’s a fun challenge in itself. It makes us think about how we use our shared computer and also introduces (hopefully) some sense of responsibility: you share the computer with other users and if you consume all resources they are left out. 580 Mb should be more than enough for your friends and for running a lightweight web server for hosting some web pages
For the setup instructions you’ll need a user account with root privileges (“administrator rights”, if you’re coming from Windows). You can make one from the start (skip ahead) or just use the web interface that the provider offers.
When people login they are greeted with some information about the system and a welcome message. You can customize this welcome screen entirely. Maybe come up with a nice logo for your server and display some useful instructions for new users.
To edit the message of the day, just edit the /etc/motd
file, for example with nano or vim (so vim /etc/motd
).
This is just a text file and everything you place in there will be printed verbatim to the welcome screen.
There are also a number of scripts whose output is displayed.
You can list all of them with ls /etc/update-motd.d
.
You’ll see that there are several scripts sorted by a prependend number.
The number in the filename indicates its order of execution.
For example, on Ubuntu you will find a script called 10-help-text
and another called 50-motd-news
and the first will be executed before the latter.
So if we then want a custom script to be run, before these scripts, we could create a bash script called 01-custom
.
For example, on login, I want to show which other users are online:
#!/bin/sh
echo "Who is logged in?\n"
users | tr ' ' \\n | uniq
You can enable or disable some of these scripts by toggling whether they are executable, so there is no need to delete these scripts if you do not want them to run.
To disable all scripts in this folder, run sudo chmod -x /etc/update-motd.d/*
.
sudo
is to “do” this action with “super user (su)” rights, chmod changes the files permissions, and the *
expands to all files in the folder.
To then enable your own script, instead run sudo chmod+x /etc/update-motd.d/01-custom
.
Now that we have a welcome message, we should add some users to our system.
Creating a new user is easy enough, you simple run:
adduser [name]
, so for example adduser edwin
, and then fill in the details in the prompt.
However, you’ll likely want to have stuff ready for new users on their first login.
The adduser
command reads a “skeleton” directory called /etc/skel
.
Everything you put in there will be copied to the home folder of the new user.
If you want to set your server up like a “tilde” club I can for example publish a website under myurl/~edwin
, then it is handy to already provide a public_html
folder with a default index.html
file.
You could also leave some further instructions for new users in a README file and provide some minimal configuration files, for example a .vimrc
for Vim.
The first user we will make is our own admin account with sudo rights, so we can proceed using that account instead of the root user that the webinterface offers.
I’m assuming you already made your account with the command from above.
Then set a password, and add your account to the “sudoers” group.
Check all groups with groups
to check if indeed there’s a sudo group.
Interestingly, on Google Cloud Platform the “sudo” group is replaced with a group called “google-sudoers”.
Set the user’s password: passwd [name]
.
Add the user to the sudo group: usermod -aG sudo edwin
.
We do not have a graphical environment, so users will have to connect with a shell session on our server.
We do that with a tool called ssh
for secure shell.
For setting up ssh as a user, I can refer you to the tilde.club tutorial on ssh. The tricky part with ssh is to make sure all file permissions are correct.
On the server side, we need to ensure the “ssh daemon” (sshd
) is running and that people can connect through the port we specified (22 by default).
Check the status of the ssh daemon: systemctl status sshd
.
Open port 22 for ssh traffic: sudo ufw allow 22
or sudo ufw allow ssh
.
Enable the firewall: sudo ufw enable
.
Before enabling the ssh daemon to allow incoming ssh connections, we should decide whether to allow people to login with their passwords or only using a ssh key.
The latter is the more secure option and is strongly recommended, but requires more effort from users (again, see the tilde.club guide for ssh).
The sshd configuration can be found at /etc/ssh/sshd_config
.
Google Cloud Platform already created this file for me with sensible defaults.
If not, you should read a tutorial on this file because setting it up wrongly is likely to make your system vulnerable.
Assuming everything else is set up correctly, just choose whether you want to login with passwords or only with keys, by setting PasswordAuthentication
to either “yes” or “no”.
FYI, you can check information about shh login attempts in /var/log/auth.log
.
E.g. run grep ssh /var/log/auth.log | less
.
Now we’ll allow our users to publish everything in their public_html
under [yourdomainurl]/~[user]
. The “~” is why these type of servers are now called “tilde” communities, and they indicate that you’ve reached a particular userspace on the shared domain.
You still see it sometimes for old school sites of academics, see for example this great page on
Prof. Dr. Style.
We’ll use Nginx because it is very lightweight and easy to setup.
This
guide from DigitalOcean is great and straightforward.
It also instructs you to set up the firewall ufw
correctly, like we did before for ssh
.
Assuming you keep the default settings, you can edit your homepage with:
vim /var/www/html/index.nginx-debian.html
The next thing we want to do is set up nginx to publish web sites “usedir”-style, meaning that we allow each user to publish a website from a public_html
folder in their home directory.
I found
here how to do that in Nginx.
Open the default configuration file at /etc/nginx/sites-available/default
and make sure it matches the following:
# Default server configuration
#
server {
listen 80 default_server;
listen [::]:80 default_server;
# SSL configuration
# listen 443 ssl default_server;
# listen [::]:443 ssl default_server;
root /var/www/html;
# Add index.php to the list if you are using PHP
index index.html index.htm index.nginx-debian.html;
server_name example.com www.example.com;
location ~ ^/~(.+?)(/.*)?$ {
alias /home/$1/public_html$2;
index index.html index.htm;
autoindex on;
}
}
These should be default except for the block starting with “location”.
Now we just start or restart Nginx and everything should work!
systemctl restart nginx
All users can now make their own website, which will be live immediately.
Linux systems have this interestingly named tool called finger
that shows you the personal information of other users, as well as their current plans and the project they are working on.
To set your own details, “change finger”: chfn
.
Additionally, you can notify others what you are up to by making a plan at ~/.plan
.
You can also indicate others of a project you are working on by making ~/.project
.
All this information will be included when people “finger” you.
Output of finger ejw
:
Login: ejw Name: Edwin Wenin
Directory: /home/ejw Shell: /bin/bash
Office: Netherlands
On since Wed Feb 26 08:40 (MST) on pts/18 from 145.116.163.127
3 seconds idle
No mail.
Project:
Experimenting with old school Unix stuff
Plan:
Think about fun projects to add to my page
Your ~/.plan file can be arbitrarily long, so this is basically the first blogging tool ever. You could “finger” someone and read their latest “blogpost” there.
You can change your shell with chsh
.
For example:
timedatectl set-timezone Europe/Amsterdam
Now that people can login, we can try to communicate with other users.
See who’s online with who
.
You can write to all, with wall
.
This will dump your message on the screen of all logged in users, interrupting their work.
This is fun or annoying, depending on who you ask and how often you do this.
Others can clear their screen and continue working with Ctrl-l
.
You can also write to individual users, simple say write [user]
.
It can occur that a user is logged in multiple times, in that case you need to be more specific, for example write edwin pts/0
.
You can find all that info under who
.
Also handy to know: you quit writing a message, type Ctrl-D
.
This does not allow us to write messages to people that are offline. For this however, we can use mail!
When you “finger” yourself, it will say “No mail”. Let’s change that.
Setting up mail can be a hassle, but using mail only locally luckily works out of the box.
Find out if a mail client is installed:
dpkg -S /usr/sbin/sendmail
Install the postfix
MTC ( Mail Transfer Client).
This will spawn a prompt asking you how you want to use postfix
.
Indicate you only want to use it locally.
sudo apt install postfix
Luckily for us, postfix contains a default config for local use only. Now we can send mail between users, great!
Use the sendmail
command to send mail.
Local mail arrives in /var/mail/[username]
Now we need a way to read mail.
For this we’ll use mutt
.
Install mutt: sudo apt install mutt
Then open your local mailbox with mutt -f /var/mail/[username]
Try sending yourself an email, and then finger yourself again to see if your “Mail” entry is updated. Done!
If you want to make your club more social with instant messaging, consider installing an IRC daemon for IRC chat.
Don’t forget to have some fun making your own stuff for your little community.
In my friend group we have this quite random beaver theme going on.
So I extended the cowsay
command with an ASCII beaver (
source) and made it spit out random beaver-related quotes:
----------------------------------------
< Save a tree, eat a beaver - Zac Hanson >
----------------------------------------
\ .---.
\ @ @ )
\ ^ |
[|] | ##
/ |####
( |####
| / |#BP#
/ |.' |###
_ `` )##
/,,_/,,____#
If you place this command in /usr/local/bin
and make it executable, other people can also use it.
Because all users are on the same machine, you can really create some community based content.
For example, I asked my friends to put a ~/.digest
file in their home directory where they can give a daily tip for music, culture etc.
Then I made a script that reads those files and compiles them into a webpage with all daily tips.
Fun aside, your little community could probably use some handy scripts.
Another script I enjoyed making was one that finds the most recently updated web pages from all public_html
directories. I then make a webpage out of those as well, so it’s easy to keep track of what others are doing.
This might be one of the most convoluted oneliners I ever wrote, but it works:
find /home/ -path "*public_html*html" -type f -ls -printf "%T@ %p" | sort -k 1nr | head -n 15 | cut -d " " -f 2 | sed "s/\/home\///" | sed "s/\/public_html//"
You can automate the running of such a script using crontab -e
.
By the way, once you have scheduled your script for timed repetition using cron
, it just happens to be that the cron daemon sends you mail on the mail you have just configured!
If you follow these steps and play around with them, you’ve created your own little corner of the internet. I’d like to think that’s also a small form of resistance against the modern internet dominated by tech giants, bots and ads. Besides, sometimes old and simple technology is just the coolest.
Published in Turning Magazine, AI & Health edition, February.
Artificial Intelligence (AI) promises great benefits for health care.1 The use of machine learning techniques can speed up diagnoses, in some cases increase their accuracy, and ideally would play a role in prevention. But these promising techniques can challenge our privacy because they require large amounts of sensitive patient data to be gathered and analyzed. Additionally, we can expect a surge in tech companies taking interest in the health sector because hospitals themselves are not likely to have the infrastructure and expertise for handling big data. But if the modern-day equivalent of sapientia est potentia (“knowledge is power”) is data est potentia, then we should think through how much power we want to give to the tech giants. If health research is included in the ongoing “datafication” of the world, are we moving towards a situation where academic hospitals need to buy expensive licenses from Google or Microsoft to access large databases for health research? And to what extent do we want tech companies with flexible morals to be in control of our health data?
Imagine that you overhear someone talking about “Project Nightingale.” You might think this is a reference to a new James Bond movie or some CIA operation, but it is not. It is a medical data sharing project of Google and Ascension, which is the second-largest healthcare provider in the U.S. that manages 2600 healthcare locations, including 150 hospitals2. Somewhere in 2018 Google and Ascension made a deal without consulting doctors and patients or providing them an opt-out, that implies that the electronic health records of up to 50 million patients will be transferred to Google, making them available for the development of AI applications3. Last month (November 2019) a whistleblower working on Project Nightingale reported concerns that this immense data transfer - the biggest in health care so far - might be in breach with the relevant rules on data privacy (HIPAA)4, which has spawned extensive media coverage567 and a federal inquiry.8
The transferred records are reported to include names, addresses, family relations, allergies, radiology scans, medication use and medical conditions. A lack of appropriate anonymization leads to the fear that Google employees working on the project might look into these files. A previous data transfer between Google’s DeepMind and the Royal Free hospital in 2017 had been criticised for having an inappropriate legal basis for another reason9. The UK’s national data guardian pointed out that the deal was justified as directly benefiting patient care, whereas instead it seemed to be primarily used for testing DeepMind’s “Health Streams” app.10
Helen Nissenbaum’s concept of “contextual integrity” is useful for understanding privacy concerns about these developments.11 Consider how we disclose personal and confidential information to a doctor so that the doctor can adequately take care of us. Even though confidential information is shared and you give away some control over your data - the doctor perhaps has to get a second opinion from a colleague - you would almost certainly not think of this as a privacy intrusion. But there is an intrusion if your doctor shares that same information with your employer. In other words, the norm for what is an appropriate flow of information heavily depends on the context and cannot simply be extrapolated. Especially AI students should know about relevant notions of data privacy because there is a good chance they will be confronted with personal data in their future jobs. We have to realize that alongside AI’s promises for the benefit of society, AI techniques can also be used to track our behavior and build personal profiles, for example for undesirable surveillance or political microtargeting1213.
Contextual integrity is upheld in the European General Data Protection Regulation (GDPR) through a purpose limitation principle: personal data can only be collected for a legitimate purpose that is stated in advance and cannot be further processed for other purposes (cf. GDPR art 5(1)(b))1415. The ethical concerns about Project Nightingale are thus not only about confidentiality, but also about whether patient consent can validly be extended to the use of data for future machine learning applications. Doctors connected to the Ascension network surely did not explain to their patients: “Anything you say can and will be used (against you?) by Google. You have the right to remain silent.”
Philosopher Tamar Sharon coined the push of tech companies into the health sector the “Googlization of health research”16. This catchy phrase expresses how tech companies are reshaping health research through crowdsourcing data collection, for example by delegating it to the users who gladly track their own health with running apps and smart watches. Apple facilitates medical research by using the iPhone and Apple Watch. This month Google announced they will be buying FitBit for 2.1 billion dollars17. This is a clever move. Whereas Google’s storing of traditional health data from hospitals is under huge scrutiny, we don’t see riots over Google buying FitBit and potentially using it to track your health. This type of technology could allow a form of liquid surveillance that we would find disconcerting if performed by a centralized authority.
We can avoid paternalism by noting that consumers may share personal data to get some utility in return as long as they are aware of this trade-off. They can read the terms and conditions and decide to opt out. But one should at least realize that most people either have no time to read all conditions, do not understand their legal language, or otherwise feel pressured into accepting them because they need the service. Is this type of consent really informed and explicit? In particular people with health problems are more easily nudged into sharing health data. Consequently, the idea of a voluntary trade-off might be a fallacy 18. As Sharon points out, this makes the use of apps to gather sensitive health data “morally dubious” (p.5).
Sharon points out an additional dynamic specific to apps that collect or require health data, namely that they almost without exception promote their service as altruistic: if we all crowdsource our health data for research we can solve nasty diseases together. The cynical but fair point Sharon makes is that people are most willing to share their sensitive data “when altruistic modes of behavior and financial profit-seeking overlap; and this in ways that are often not transparent.” (p.6). Tech companies know how to utilize this psychological mechanism.[*]
Consequently, Morozov argues that when we really care about privacy we should not just come up with stricter privacy laws, but should also offer a robust intellectual critique to battle and limit the “information consumerism” by which consumers sell out for free apps and material benefits19. Consumers should realize they are still paying but with a new currency: their personal data. Consider the Dutch “a.s.r. Vitality program” that offers up to 8% cashback on health insurances if you show you move enough with a FitBit20. Massive use of similar applications will lead to a normalization of self-tracking, to the point where not participating provides you with serious social and financial disadvantages. Consider an analogy: cashless payment is handy for consumers, but its normalization leads to more services refusing cash and thus forcing us to pay cashless if we need that service.
Such a normalization would leave those that are aware that they pay with their privacy behind with a sense of resignation that the battle over personal data has already been lost. Next to laws protecting against personal data being used without consent, we thus need intellectual activism about the “Googlization” of health research to show people what they are selling before we have lost control over our health data.
[*] This paragraph is edited in the printed version in a way that I do not fully approve of
See for example https://www.aidence.com/blogs/early-lung-cancer-detection-with-ai-a-guide-for-patients/ ↩︎
https://www.theguardian.com/technology/2019/nov/12/google-medical-data-project-nightingale-secret-transfer-us-health-information ↩︎
https://www.dhcs.ca.gov/formsandpubs/laws/hipaa/Pages/1.00WhatisHIPAA.aspx ↩︎
https://www.wsj.com/articles/google-s-secret-project-nightingale-gathers-personal-health-data-on-millions-of-americans-11573496790 ↩︎
https://www.nytimes.com/2019/11/11/business/google-ascension-health-data.html ↩︎
https://www.nationalreview.com/news/google-gathering-health-care-data-on-millions-of-americans-with-secret-project-nightingale/ ↩︎
https://edition.cnn.com/2019/11/12/tech/google-project-nightingale-federal-inquiry/index.html ↩︎
https://www.theguardian.com/technology/2017/may/16/google-deepmind-16m-patient-record-deal-inappropriate-data-guardian-royal-free ↩︎
Nissenbaum, Helen. (2010). Privacy in Context - Technology, Policy, and the Integrity of Social Life. ↩︎
Zuiderveen Borgesius, F. J., Möller, J., Kruikemeier, S., Ó Fathaigh, R., Irion, K., Dobber, T., … de Vreese, C. (2018). Online Political Microtargeting: Promises and Threats for Democracy. Utrecht Law Review, 14(1), 82–96. DOI: http://doi.org/10.18352/ulr.420 ↩︎
https://www.spiegel.de/netzwelt/netzpolitik/donald-trump-und-die-daten-ingenieure-endlich-eine-erklaerung-mit-der-alles-sinn-ergibt-a-1124439.html ↩︎
See page 77 of Chris Jay Hoofnagle, Bart van der Sloot & Frederik Zuiderveen Borgesius (2019) The European Union general data protection regulation: what it is and what it means, Information & Communications Technology Law, 28:1, 65-98, DOI: 10.1080/13600834.2019.1573501 ↩︎
Sharon, T. (2016). The Googlization of health research: From disruptive innovation to disruptive ethics. Personalized Medicine, 13(6), 563-574. DOI: https://doi.org/10.2217/pme-2016-0057 ↩︎
https://www.theverge.com/2019/11/1/20943318/google-fitbit-acquisition-fitness-tracker-announcement ↩︎
Turow, Joseph & Hennessy, Michael. (2015). The Tradeoff Fallacy: How Marketers are Misrepresenting American Consumers and Opening Them Up to Exploitation. SSRN Electronic Journal. 10.2139/ssrn.2820060. ↩︎
https://www.faz.net/aktuell/feuilleton/debatten/ueberwachung/information-consumerism-the-price-of-hypocrisy-12292374-p1.html ↩︎
The Raven paradox, coined by Carl Gustav Hempel in the 40s, formulates an interesting problem with inductive inference, more specifically, enumerative induction.
Inductive inference is a type of reasoning where you infer a hypothesis or proposition after observing a series of data. This movement from observation to hypothesis is important for scientific reasoning in empirical science.
For example, if you observe a thousand black ravens, you may reasonably conclude that all ravens are black. However, you would probably not arrive at this conclusion after only seeing a few black ravens. In other words, the more black ravens you see, the more confidence you gain in your proposition that all ravens are black.
The question is, of course, how we can justify this increase in confidence? A well-known problem with inductive reasoning is that, unlike deduction, it is not logically valid: the fact that you’ve seen a thousand black ravens does not logically entail that the 1001th raven will also be black. In other words, the correctness of inductive inference cannot be shown deductively (but also not using induction, because that would indefinitely shift the burden of proof to having to proof your proving method… ).
The Raven paradox is a second logical challenge, but rather concerns the usefulness of induction as a description of how we become more sure after more observations. The claim that all ravens are black can be cast in the form of a logical implication:
for all x:Rx -> Bx
where Rx means “x is a raven” and Bx means “x is black”. Remember, this proposition is inferred from the many observations black ravens that are, supposedly, instances of this rule.
But Hempel points out that when we rewrite this statement to the following logically equivalent proposition, things get counter-intuitive:
for all x: ~Bx -> ~Rx
For reference, we can see that these two formulas are indeed logically equivalent with a simple truth table:
p | q | ~q | ~p | p -> q | ~q -> ~p |
---|---|---|---|---|---|
0 | 0 | 1 | 1 | 1 | 1 |
1 | 0 | 1 | 0 | 0 | 0 |
0 | 1 | 0 | 1 | 1 | 1 |
1 | 1 | 0 | 0 | 1 | 1 |
We already knew that seeing more black ravens increased our confidence that all ravens are black. But what rewriting the proposition suggests is that we then should also be more confident that all ravens are black when we encounter any object that is not black and not a raven. Although being logically equivalent, this statement suddenly no longer accurately describes how we become more sure that all ravens are black. Does my confidence that ravens are black increase when I see a yellow banana? Seeing a yellow banana seems completely irrelevant when considering the proposition that all ravens are black, but the logical formalization of the inferred rule on black ravens does not express this.
Peter Lipton points out that this issue of relevance similarly plagues existing models of scientific explanation, particularly the so-called Deductive-Nomological model. In the summary of Lipton, this model states that “an event is explained when its description can be deduced from a set of premises that essentially includes at least one law”. So to cast this into the raven example, our hypothesis may be that “All ravens are black,” from which we can deduce that if we encounter a raven, we predict it to be black. If this indeed turns out to be the case, this would be support for our hypothesis.
But Lipton points out that a successful prediction could be construed as support for any hypothesis, logically speaking.
If our hypothesis is B
(all ravens are black) and we observe indeed many black ravens, then our confidence in B
increases, but logically speaking, so would our confidence in the hypothesis B \/ P
because adding a disjunction is truth preserving: if B
is true, then B \/ P
is necessarily also true.
And P
might mean anything here, for example that “Cows can fly.”
Granted, the hypothesis “All ravens are black or cows can fly” is a really shitty hypothesis. But the main point is the same as before: how do we convincingly show and formalize that our hypothesis is relevant for our observation?
One interesting concept that does not disqualify inductive reasoning altogether is the idea of “inference to the best explanation,” which goes back to Pierce’s conception of abduction, and fits well with Bayesian formalisations. The phrase “best explanation” suggest at least that we compare two explanations, rather than taking observations directly as support for any one given hypothesis. How confident we are that we live in a world where all ravens are black (i.e. that this is a good explanation of the fact that so far we only seeing black ravens) does not only depend on how many black ravens we have seen, but should be expressed in relation to our confidence in other possible explanations. If my alternative explanation is that cows can fly, I’m certainly more sure about all ravens being black. Of course, in empirical science a good explanation should depend on a meaningful contrast with alternative explanations (so not some alternative hypothesis about banana’s or flying cows). Whether you have enough confidence to reject a null hypothesis depends on its formulation in contrast to the experimental hypothesis.
In any case, the Raven paradox raises interesting questions on what are relevant or good explanations in inductive reasoning.