The Memex learns to read
My memex learned how to read this week. Through the magic of the Google Cloud Vision API , I’m now running OCR and automatic label generation on each image that I import. This’ll let me do things like find all my photos of beaches, search for phrases in screenshots posted to twitter, or find name matches from a page of a book I’ve taken a photo of. I’m very excited!
Here’s how the algorithm sees a page of a book I took a photo of:
It also recognized that that Max is having a conversation with a black mammal:
I ingest about 5000 images per month on my personal account and the API pricing works out to $5/mo. I can probably bring the amount down if I’m smarter about which type of images to run the OCR/labelling on.
In the original 1945 Memex essay, the author Vannevar Bush talks about being able to create links between different documents and using these to find information later. Most people consider this idea to be the genius of the Memex but Gordon Bell (mentioned in last week’s newsletter) has a different take: he thinks Bush’s links were a “failure of imagination” because he couldn’t conceive of computers powerful enough to build fulltext search indexes. I enjoy thinking about how his mind would be blown if I showed him how I could search across the millions of items in my memex with automatically-generated labels and transcriptions.
Progress was made towards the goal of an easily-installable version of the app: last week’s version of the Docker setup didn’t have working image processing but I got that sorted out. This week, I’m going to focus on integrating the importers (both those that work on data from the local file system and those that call out to external APIs).
Twitter and API frustrations
My current importers catch follows/unfollows, favourites, and tweets that I publish but not mentions, retweets, or tweets that are faved. I looked into expanding the importers to save better records of interactions but I found out the Twitter REST API doesn’t even have an endpoint for showing users who’ve liked a tweet and they seem to have no interest in fixing this. Tracking retweets isn’t simple either. Worst. API. Ever.
It was another frustrating reminder to me that the hardest part of this project is extracting high-quality data from the services we use.
A recap of some API frustrations
- Twitter’s API is a mess of inconsistencies and omissions and has extremely low rate limits that prevent any sophisticated integration
- Youtube removed the ability to pull your viewing history
- Instagram removed API access almost entirely
- Facebook is removing the messaging history API next month
Few of the platforms we use regularly have an incentive to export user data through an API and this problem will just get harder if this project ever becomes widely-used.
As part of the OCR work mentioned earlier, I build a generalizable hook system that can run different processors on data when it gets added to the database.
Examples of some hooks I’m working on:
- Look for links in text and associate the containing entity with the entity of the link. This would help with the “what’s that article about x that person y sent me?” use case.
- A hook that watches for follow/unfollow activities from a service like Twitter and updates the database to mirror the social graph.
- When a visit to a tweet or reddit thread shows gets logged, trigger another importer to fetch the details of that third-party object.
The system is designed to be user-configurable so you could create your own workflows and data processes.
Another edition of Andrew’s Memex Book Club
This week, I read a book (my 21st of the year!) by local cyborg Steve Mann entitled Cyborg .For most of his life, he’s had computers/cameras attached to his body recording his life and broadcasting it to his web site or private system. Here’s a picture of his rigs over the years:
I never took classes with Steve Mann at UofT and I didn’t know much about him beyond him being the weird guy with head-mounted cameras.
The book was an interesting positioning of his life’s work as one long performance art piece where he challenges society to think through their relationships with technology. It’s also very political in framing how cyborg technology must be used to empower individuals instead of giving institutions and corporations power over our lives with “smart devices” or “smart homes” which end up being tools of surveillance and soft power, digital versions of Foucault’s Panopticon.
Some other interesting things:
- He experimented with using surveillance paraphernalia clothing to make statements through fashion.
- In the 90s, Steve Mann got tired of being asked to remove his head-mounted camera gear so he grew his hair through a mesh headcap with the camera attached, sealed it to his head by knotting his hair behind the mesh, and then telling anyone who gave him problems that he could remove the camera but they would be responsible for any adverse health effects of ripping his hair out.
- How so much of wearable computing history has been driven by the desire to beat casinos, including an early attempt by Claude Shannon.
Excited by my new OCR ability, I took photos of the endnote pages and am excited to know that I’ll be able to search and find matches on the names and titles that are mentioned. For those who say “phooey, you can already use Google to search books”, the use case of looking for something you know you’ve read before is very different from one where you’re doing broad research on topic; no single provider like Google will ever have a good dataset for the first use case.
Thanks as always for reading!