Using my Memex to chat with my past
I got to showcase my Memex at EmberConf in Portland earlier this month! For those who haven’t seen a demo yet, it’s a pretty good overview of the state of the project. You can watch it as part of the archived conference livestream here.
Some of the queries I showed:
- finding all links I’ve looked at that are referenced in podcast episodes (5:05:32)
- showing my complete history of a search term like “ember” including early conversations and photo results via OCR (4:59:01)
- a query for long walks on the beach (5:08:05)
- using a burrito to find a quote from a book I was reading (I’m conflicted about using these types of examples — I don’t want people to remember me as the guy who catalogs his burritos but it also makes the project memorable). (5:11:48)
People seemed to like it!
- It made some people feel less weird about their own tracking habits.
- A few friends said “I finally get what you’re doing” (perhaps an indictment of my communication skills over the last few years)
- Someone else wondered “This man CAN NOT be married. Or in a relationship.” (I am)
- My dad wondered if I was still upset about the birthday party incident (5:06:43)
For those who are reading this email because they were in attendance: welcome to the newsletter! I’d love to chat. Send me a note ([email protected]) about what parts of the project appeals to you, whether you’re doing any tracking right now, and if you’d be interested in trying a beta version.
A week in San Francisco
After EmberConf wrapped up in Portland, I made the tech pilgramage to San Francisco. Here are my geographic trails from the week:
A few highlights from the trip:
A visit to the Internet Archive
For more than two decades, the Internet Archive has worked to preserve digital history. Every Friday, they host community lunches (just email them in advance) and I visited last week.
They’re housed in a former Christian Science Church. The Archive’s founder, Brewster Kahle, was driving by the building one day and thought the front of the church looked like their logo and bought it when it went up for sale. The staff work on the basement level and the sanctuary level is used for events. Also in the sanctuary: hard drive racks mounted on the walls and pews containing 120 ceramic figures of people who’ve contributed to the Archive for more than three years.
Ted Nelson, a recurring character in these newsletters, shows up to these lunches frequently but he wasn’t in attendance this time — maybe I’ll catch him another time. He recently donated all his computer junkmail to the Archive and a lot of is scanned and available.
Alternative techniques for recording history (a trip to SFMOMA)
My project and newsletters deal with themes of remembrance and personal history. I visited the recently-renovated San Francisco Museum of Modern Art and saw a few art pieces that explored different ways of capturing history.
Emily Jacir — linz diary (2003)
Emily Jacir stood by the same public fountain in Linz at the same time every day for 26 days and let herself be captured by a public webcam. She journalled what happened around each photo and added these as captions to the images.
Jorge Otero-Pailos - The Ethics of Dust, Old United States Mint, San Francisco (2016)
Jorge Otero-Pailos preserved part of the history of San Francisco’s old mint building using a unique method: he cured liquid latex on the walls of the building and peeled it off, capturing the layers of grit, brick, and debris from decades of history.
Has Haacke — News (1969/2008)
Has Haacke brought messy politics into the supposedly neutral space of the gallery by continuously printing the day’s news. The history spills out and gets preserved on the gallery floor in beautiful piles. The first version in 1969 used a telex machine that printed off news from a press agency. The updated version at SFMOMA used a dot matrix printer connected to the internet. The artwork’s label specified the medium was “RSS newsfeed, paper, and printer”.
Organ concert at St. Mary’s
St Mary’s Cathedral hosted a concert in celebration of Bach’s 333rd birthday. It was great to hear my favourite composer being played on my favourite instrument in my favourite SF architecture.
For SF residents: the church does weekly organ concerts on Sunday afternoons.
(Fine, this has nothing to do with the Memex. But I had a great time.)
Training a chatbots to speak like me
The good news: I got accepted to speak at !!con, the conference which requires each talk have at least one exclamation mark in the title!! The bad news: I now have less than two months to actually learn and implement what’s promised in the talk description.
The title of my talk: “Talking to my past self (without introducing temporal paradoxes!)”. Based on an idea from someone at last year’s conference, I’m going to train a chatbot on my high school chat logs and get it to talk to another chatbot trained on newer logs.
There are two general approaches to building chatbots. The retrieval-based model tries to pick the best response out of an existing pool of messages. The generative model uses more sophisticated techniques to create new responses. There’s also the dimension of conversation breadth: closed domain chatbots can deal with only a narrow topic like the weather; open domain chatbots can converse about a wide range.
The dream version of this project would result in a brash teenage chatbot that argues with a wiser, more mature chatbot about religion and politics. Unfortunately, this would require the invention of a generative open-domain chatbot which are years and years away. Solving this problem is basically equivalent to solving the Turing Test / Artificial Generative Intelligence (i.e. HAL 9000 from 2001).
There are many more open research problems in this space like getting a bot to maintain a stable personality or making a bot not always pick the easiest answer (a bot made by Google learned to respond with “I love you” to almost every input). Beyond the theoretical problems, there are lots of other challenges. For example, I have a few million of my own messages to train these bots on which sounds like a lot of data but isn’t actually that much.
As I start actually working on this project, I suspect this project will result in bots that produce mostly gibberish with occasional awkward funny exchanges. But that’s ok! The audience is friendly and I think the talk will end up being more about the adventure and less about the result. A subtext of the talk is definitely “here’s why we all still hate interacting with chatbots.”
I got started this weekend by setting up a script to build the training corpus (basically, a giant array of pairs of messages from my past). I fed some data into a basic retrieval-based chatbot package which produced some plausible output in simple situations:
>>> chatbot.get_response('there?') <Statement text:ya> >>> chatbot.get_response('hows it going') <Statement text:Good> >>> chatbot.get_response('watsup') <Statement text:well I was going to ask about tonight but now there isn't that much time>
The next step will be getting started in Tensorflow and using seq2seq to do more sophisticated modelling.
If you have any ideas on how I should approach this project, send it over!