Credit: she founded the project and provided the first version of the scraper.
A while ago my partner in the organization started message-analyzer because she thought it would be interesting to analyze the message data between us. She managed to scrape text messages out of both iMessage and Messenger (the two chat softwares that we use), put them together, built something that could decide which one of us a messaging is coming from. I believe the highest success rate she got to was 86%.
I was looking around in the project after she got most of it done and noticed this file called app.py that runs a Flask application and serves the text messages on a web server. Since I’m pretty much a frontend developer now (no), I came up with the idea of displaying all of our messages on a web page, hopefully merging contents on both Apple and Facebook platforms.
I started with iMessage. It wasn’t too hard to simply take the output of the function that she wrote and serve it over the api endpoint. For the frontend I decided to try out Vue.
It wasn’t long before I got to the following:
The main component simply requests all messages and pass each data to a Message component. I added pagination for some convenience.
Message component looks like this:
It just displays the message content. If hovered, the delivered time is shown as a tooltip.
It all looked good, but how about attachments? There were hundreds of interesting images, stickers and files that we sent each other. It would not be as interesting if those were lost for the web page.
To show attachments, I dug deeper into how Apple stores messages.
Inspired by my partner, Apple stores messages in a sqlite database located in ~/Library/Messages/chat.db, so I took the liberty of looking at the schema.
Three tables caught my attention: attachment, message_attachment_join, and message.
attachment: filename message: ROWID message_attachment_join message_id attachment_id
The message_id matches with the ROWID on the message table. filename is actually a path to the attachment file on the local machine. With these information at hand, I revised the sqlite query to
After the messages and attachments are selected, I served the attachments over the api endpoint ‘/attachments’, and voila pictures on the page!
I later also displayed reactions to messages but I’d like to get to scraping Messenger soon.
Scraping Messenger is a little more tricky: my partner did it by scrolling up all the way to the top, saving the html file and extracting information from there. However, since the data is parsed once already by the Messenger frontend, it was a little difficult to get the dates and attachments as well as the messages.
I went into Chrome devtools and saw that the juicy request was to the url facebook.com/graphqlbatch. Ah so they use their own product. What’s frustrating is that each request at most retrieves ~200 messages, and Chrome doesn’t let me copy multiple request responses at a time.
I tried to reverse engineer how the requests are formatted, but was stuck at figuring out how the message count offset was sent. I came to the idea of writing a Chrome extension to capture the web requests.
The only API that allows you access to response bodies is devtools. Creating an extension is also easy – just need to have a manifest.json file that specifies the extension and some js scripts to be run by the browser, so I did this:
and used pyauthogui from my partner’s code to automatically scroll up like an idiot. I was able to get all messages in the devtools window of the devtools window (no typo). The repository is here.
All that was left was parsing the data retrieved and making sure both message sources end up having the same format when returned by the Flask server. Messenger had more attachment types and multiple attachments so it took me longer.
Due to privacy reasons, I can’t do a demo here :/ well mostly it’s just that I’m too lazy to put up a page with fake message data.
For future features I plan to do searching, improve pagination, style the Messenger system messages (“you waved at each other”), and make the UI prettier and easier to use.