Building a Memex by Andrew Louis

The implicit lie of the blank page

Recording the truth was paramount for ten-year-old Alison Bechdel as she started keeping a diary: “I obsessed with making sure my diary entries bore no false witness,” she writes. But doubts crept in — how could she be absolutely sure of anything that happened? To maintain honesty, she started obsessively inserting the phrase “I think” after facts she couldn’t be sure about:

It was a sort of epistemological crisis. How did I know that the things I was things were absolutely, objectively true. My simple, declarative sentences began to strike me as hubristic at best, utter lies at worst. All I could speak for was my own perceptions, and perhaps not even those. The most sturdy nouns faded to faint approximations under my pen. My I think were gossamer sutures in that gaping rift between signifier and signified. To fortify them, I perseverated until they were blots

This is told in Fun Home: A Family Tragicomic by Alison Bechdel. It’s a graphic memoir about her father’s suicide, his homosexuality, and the author’s own coming out story.

(“Bechdel, Bechdel, hmm, that name sounds familiar — does she have anything to do with that famous test about women characters in movies?” Yes!)

To trace through the events of her childhood, she relies heavily on her journal entries to understand how she felt at the time. Unfortunately:

But as I aged, hard facts gave way to vagaries of emotion and opinion. False humility, overwrought penmanship, and self-disgust began to cloud my testimony until in this momentous entry, the truth is barely perceptible behind a hedge of qualifiers, encryption, and stray punctuation.

Inspired by the algebra she was beginning to learn in school, she started substituting particularly sensitive words with letters. As Bechdel tries to match up memories with journal entries from this period, a lot is only hinted at with ellipses or left out entirely:

My earnest daily entries had given way to the implicit lie of the blank page, and weeks at a time are left unrecorded.

But despite all the complexities and omissions, having “ground truth” from this era of her life was more useful than not:

There was a lot going on that summer. I’m glad I was taking notes. Otherwise I’d find the degree of synchronicity implausible.

I’ve definitely experienced the same thing with my journals and records, now available in my Memex. It’s useful to have so much personal history accessible but sometimes it’s difficult to get enough clarity about a feeling to commit it to permanent record. It’s even harder if we have a suspicion that these records might at some point be viewed by others.

I’d love if this Memex project can help people create archives free of vague sentences, ellipses, and omissions. This means getting the security and product design right — this is not a “move fast and break things” app. Designing the data storage in a non-centralized way is clumsy in many ways but it’s important for users to never feel an admin might read their data. Likewise, social graphs and sharing functionality are anti-features for a product like this. If users worry that they’re one accidental tap away from revealing something sensitive, they revert to performing the same online personas they have on other parts of the internet, full of ellipses and omissions.

Archiving a digital world

Archiving digital history is a challenging problem. Even the professionals are having a hard time:

Over the last 40 years, archivists have begun to gather more digital objects—web pages, PDFs, databases, kinds of software. There is more data about more people than ever before, however, the cultural institutions dedicated to preserving the memory of what it was to be alive in our time, including our hours on the internet, may actually be capturing less usable information than in previous eras.

What’s the canonical version of a digital artifact when content is personalized to our online personas and our feeds are shaped by algorithms which are inscrutable to even their creators?

This problem is sketched out in a great article by Alexis Madrigal: “Future historians probably won’t understand our internet, and that’s okay” (via Stacey — thanks for the link!). One idea:

Lynch’s suggestion is radical for the archival community. Archivists generally allow other people to document the world, and then they preserve, index, and make these records available. Lynch contends that when it comes to the current social media, that just doesn’t work. If they want to accurately capture what it was like to live online today, archivists, and other memory organizations, will have to actively build technical tools and cultural infrastructure to understand the “performances” of these algorithmic systems. But, at least right now, this is not going to happen.

But there are even simpler challenges. In 2010, the Library of Congress announced that they’d be archiving every single tweet but years later, nothing has been released and the library is struggling with basic problems like how to deal with the volume (500M tweets / day), or police around accounts that become private.

The Memex was conceived in 1945 by Vannevar Bush because he felt our species couldn’t stay on top of the output of the information age (or as he put it: “we’re being buried in our own product.”) We now have orders of magnitude more data to make sense of.

If you’re into this topic, this new podcast by the Kitchen Sisters looks promising:

We are about to embark on a new NPR and podcast series, The Keepers — stories of activist archivists, rogue librarians, curators, collectors and historians. Keepers of the culture and the cultures and collections they keep.

Beta program update

It’s not just the Library of Congress that’s struggling with archiving Twitter content. Ages ago, I started collecting my tweets and favs. Because Twitter isn’t strict about the case of usernames (for example, @realdonaldtrump is the same account as @realDonaldTrump), I decided to automatically downcase all usernames to help with de-duplication but this has turned out to be a bad decision because casing generally matters on most other types of URLs. To fix this, I’ve had to rehydrate all the twitter objects in my dataset via the API and it’s been complicated, especially since so much content is no longer even available.

Dealing with this silly issue is an example of the type of work I’ve been doing over the last few weeks to prepare for more people using the app. I have to finalize as many of these schema decisions as possible because they’re hard to fix retroactively, especially since I don’t have a single production database across all users. I don’t want to end up with a combinatorial explosion of configurations across different installs.

Usually, technical debt is talked about as a problem for big corporations but it’s also tricky to adjust schemas and importers when I have over 10M objects in my personal database already and a lot of the data can’t just simply be re-initialized. There’s a funny tension between keeping my primary customer happy (i.e. me!) and my other goal of bringing this project to more people.

I’m doing an install of the Memex for a friend later this week and if it goes well, I’ll be more comfortable ramping up some more users in the new year.

Demoing the Memex

Last week, I got to demo my project to a friendly audience of developers. It was a good excuse to do a quick round of interface tweaks and improvements. Some examples:

I also did some example queries based on the StrangeLoop road trip I did with Max this fall. Here’s a query for photographs taken during long drives with Max:

The Memex on stage

Demoing my project last week was fun. I’m pretty excited that I’m going to have the opportunity to do the same thing again in March, but for 50x the amount of people — I got invited to speak at Emberconf in Portland!

I’ll also be speaking at Starcon in Waterloo in two weeks (as mentioned previously) and I’ll try to share some of the research that I do for the talk in a future edition.

If you come across any upcoming events is lacking in Memex-ey content, let me know!