Building a Memex by Andrew Louis

Pass by reference; pass by copy

Ted Nelson came up with the term “hyperlink” in the 60s and imagined them working quite differently from today’s. Here’s how researcher Belinda Barnet describes how they would have worked in Nelson’s Project Xanadu:

[…] no links would ever be broken, no documents would ever be lost, and copyright and ownership would be scrupulously preserved. The Magical Place of Literary Memory: Xanadu. In this place, users would be able to mark and annotate any document, see and intercompare versions of documents side by side, follow visible hyperlinks from both ends (‘two-way links’) and reuse content pieces that stay connected to their original source document.

Unfortunately, Project Xanadu always ran a bit better in Ted Nelson’s head than as working code.

Tim Berners-Lee took an implementable subset of Nelson’s ideas and the World Wide Web was born. Links on the web are one-way and a linked object doesn’t know what links to it. The object is also the only place where this data stored which means if that server goes down, the data goes with it.

In programming terms, the web uses pass by reference instead of pass by value.

Bret Victor talks about how the Library of Alexandria used pass by reference and things didn’t work out too well for it. On the other hand, the original Memex used pass by value. Same with nature:

It’s interesting that life itself chose Bush’s approach. Every cell of every organism has a full copy of the genome. That works pretty well — DNA gets damaged, cells die, organisms die, the genome lives on. It’s been working pretty well for about 4 billion years.

Obviously we shouldn’t oversimplify and declare one to be better over the other but it’s a good exercise to imagine these alternative histories of how the internet could have gone.

In last week’s newsletter, I talked about importing my high school MSN logs. One frustrating thing about reading through these old conversations is that the majority of the links didn’t work anymore. Not having easy to access to the content of the links makes a lot of conversations hard to follow (“thoughts on this? [link]”). In the short term, I think I’m going to generate archive.org link based on the time period of the conversation and add them to the Memex interface.

But the more I work on this project, the more I play around with the idea of saving an archive of every link visited. It’s neither technically hard nor expensive to store. I’m pretty convinced I could save every link I look at as well as all my full photo collection and the rest of the Memex data for less than $5/mo in Amazon S3 charges. 99% of these links will never be looked at again but for the times it’s useful, it’s very useful.

When was this photo taken??

Photos are a big part of our personal data archives. In addition to the memories they capture, we also use them to record whiteboard notes or remember items on a shelf of a store. They should be one of the most important datasets in a Memex.

But there’s one technical problem that’s made me hesitate starting to import all my photos: it’s very hard to tell exactly when a photo was taken.

A JPEG photo generated by your phone or camera has three different types of timestamps available. Here is why each of them can’t be relied on:

My general solution is using the EXIF timestamp (without timezone) and then using GPS latitude/longitude to determine which timezone the photo was taken in. For photos without geo data, we can either use a default timezone or try to figure it out based on earlier/later photos.

“Who cares about such extreme accuracy?” you might ask. Being able to build an accurate timeline of sequential events from diverse data sources is at the core of the magic of this Memex. It would still be possible to view photos in approximate order otherwise but being able to collate them with other data sources let’s me answer queries like “find photos taken while with Michal” or “find photos of notes taken inside a train.”

I haven’t started working with video files from my phone yet but apparently they’re even more complicated.

If you’re a developer working on an API, for the sake of all future obsessive archivists, please make sure to add accurate timestamps!

Importer system update

An update on the project to turn this Memex into an installable app: the data importing system can now be managed through the Electron app.

Here’s a recap of how data gets into the Memex:

Here’s a screenshot from the dashboard:

If you’re interested in beta testing the Memex and are willing to get your hands dirty a bit, please shoot me an email!