Modelling my brain in the Memex
I aspire to make my Memex a memory augmentation system that holds an external version of what’s in my brain. Below is an explanation of how I model the data.
(This week’s newsletter topic is inspired by attending the Toronto Elixir meetup this week where Ben Moss gave a talk on Event Sourcing.)
Imagine we’re building a simple todo app. Typically, we’d build it as a CRUD app with API endpoints to do things like create new todos, view all todos, or mark an item as completed. To redesign the app using Event Sourcing, we’d reorient everything around a log of events (“user created todo at 12:12pm”, “user marked item abc as completed at 12:20pm”). The state of the system (the list of all todos and their statuses) would be derived from this log and would never be manipulated directly.
For my Memex, I’ve borrowed a lot of the philosophy of Event Sourcing and applied it to modelling my personal history. My Memex stores things I’ve encountered like people, places, articles or photos but I never directly add them directly. Instead, it always comes in through an activity like “read Wikipedia Article X at 5:30pm.” The graph of things I know and want to save in my database is always tied back to activities on the timeline of my life. I never have the problem of getting a search result and wondering “how did this get in here??”
Event Sourcing is closely related with the CQRS (Command and Query Responsibility Segregation) design pattern which is simply about separating the part of your system that handles new data from the part that retrieves it. In my Memex, I use a version of this pattern: all data comes in through a single endpoint that receives activities (“met Alice at 5:30pm”) and another separate system for queries (“find all people I know who match ‘Alice’). A benefit of splitting up things this way is I can give the importers limited write-only access and if there’s a bug or security issue, it’s easy to go through the activity log and remove the activities from a certain time period without the overall state of the database being compromised.
Part of the process of turning my personal project into an app for others is adding interfaces for features that were only accessible through the console. One example: viewing a graph of data from a query. I’ve had this functionality almost from the start but this week, I gave it a proper interface.
Here’s an example of a chart of github activity in the last year, plotted by number of activities per week, further segmented by verb (“tasked”, “completed”, “liked”, etc):
We can select how we’d like to aggregate (count, sum, average) and on what field (duration, quantity), as well as the size of the time buckets (week, month, etc). We can also select if we’d like to further segment the results by a secondary category (like verb, provider).
I have a few beta users that I’ll be getting up and running this week and I’m excited about starting to get feedback/bug reports from others. If you have want to give the Memex a shot, send me an email.
I was hoping I’d have a few users by now but a few scheduling issues got in the way last week. But that gave me enough time to address some of the last-minute issues that came up —
The Ninety-Ninety Rule
The process of developing software — much to the frustration of project managers — follows the Ninety-Ninety Rule:
The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time
A lot of stuff from the second ninety percent came up this week:
- I was originally planning for the Electron app to load the frontend Ember app from an external CDN in a
- In development mode, Electron gets the user’s full shell environment; I found out in production it doesn’t. This led to the the installer scripts failing because of silly things like the
dockercommand not being available in the $PATH that Electron was using.
- To update the API/importer code, I have a script that
git pull’s the latest code and runs
rails db:migrateto update the databases if necessary. This causes the schema file to be updated. In theory, the generated schema file should be the same coming from my development environment as Docker but lots of weird small differences (like order that postgres extensions are listed) kept causing uncommitted changes to the schema file. I’m now just doing a
git reset —hardas part of the updating script but it feels a bit like a bandaid solution.
The lesson: always get to a production environment as early as possible and iterate from there. This is especially true as now that this Memex project is transitioning from a personal project that worked for me into software that has to run smoothly for other people.