Scraping iMessage and Messenger Messages and Displaying with Vue Frontend

Credit: she founded the project and provided the first version of the scraper.

A while ago my partner in the organization started message-analyzer because she thought it would be interesting to analyze the message data between us. She managed to scrape text messages out of both iMessage and Messenger (the two chat softwares that we use), put them together, built something that could decide which one of us a messaging is coming from. I believe the highest success rate she got to was 86%.

I was looking around in the project after she got most of it done and noticed this file called app.py that runs a Flask application and serves the text messages on a web server. Since I’m pretty much a frontend developer now (no), I came up with the idea of displaying all of our messages on a web page, hopefully merging contents on both Apple and Facebook platforms.

iMessage

I started with iMessage. It wasn’t too hard to simply take the output of the function that she wrote and serve it over the api endpoint. For the frontend I decided to try out Vue.

It wasn’t long before I got to the following:

The main component simply requests all messages and pass each data to a Message component. I added pagination for some convenience.

Message component looks like this:

It just displays the message content. If hovered, the delivered time is shown as a tooltip.

It all looked good, but how about attachments? There were hundreds of interesting images, stickers and files that we sent each other. It would not be as interesting if those were lost for the web page.

To show attachments, I dug deeper into how Apple stores messages.

Inspired by my partner, Apple stores messages in a sqlite database located in ~/Library/Messages/chat.db, so I took the liberty of looking at the schema.

Three tables caught my attention: attachment, message_attachment_join, and message.

attachment:
    filename
message:
    ROWID
message_attachment_join
    message_id
    attachment_id

The message_id matches with the ROWID on the message table. filename is actually a path to the attachment file on the local machine. With these information at hand, I revised the sqlite query to

After the messages and attachments are selected, I served the attachments over the api endpoint ‘/attachments’, and voila pictures on the page!

I later also displayed reactions to messages but I’d like to get to scraping Messenger soon.

Messenger

Scraping Messenger is a little more tricky: my partner did it by scrolling up all the way to the top, saving the html file and extracting information from there. However, since the data is parsed once already by the Messenger frontend, it was a little difficult to get the dates and attachments as well as the messages.

I went into Chrome devtools and saw that the juicy request was to the url facebook.com/graphqlbatch. Ah so they use their own product. What’s frustrating is that each request at most retrieves ~200 messages, and Chrome doesn’t let me copy multiple request responses at a time.

I tried to reverse engineer how the requests are formatted, but was stuck at figuring out how the message count offset was sent. I came to the idea of writing a Chrome extension to capture the web requests.

The only API that allows you access to response bodies is devtools. Creating an extension is also easy – just need to have a manifest.json file that specifies the extension and some js scripts to be run by the browser, so I did this:

and used pyauthogui from my partner’s code to automatically scroll up like an idiot. I was able to get all messages in the devtools window of the devtools window (no typo). The repository is here.

All that was left was parsing the data retrieved and making sure both message sources end up having the same format when returned by the Flask server. Messenger had more attachment types and multiple attachments so it took me longer.

Due to privacy reasons, I can’t do a demo here :/ well mostly it’s just that I’m too lazy to put up a page with fake message data.

For future features I plan to do searching, improve pagination, style the Messenger system messages (“you waved at each other”), and make the UI prettier and easier to use.

Center a div with unknown width – CSS trick

To center a div, the typical  margin: 0 auto;  css attribute requires a known width for it to work, either a percentage or a fixed number of pixels. I came across this StackOverflow answer the other day that solves this perfectly. Sadly it’s not even the accepted answer 😕

#wrapper {
   position: relative;
   left: 50%;
   float: left;
}
#page {
   position: relative;
   left: -50%;
   float: left;
}

Note that in css, position: relative;  means relative to the block’s “normal” position, so it’s an offset to its original position.

Percentage values for left are interpreted as the percentages of the parent width. In this case, page div should be wrapper‘s only child, so their widths would be the same, hence the left: -50% for page moves the block to the left by half of the width of itself, along with the 50% offset of its parent, it’s centered! 😋

Nginx reverse proxy + Docker mount volume

Today I did two things for my blog project: added a proxy on my Nginx server for the api connection and mounted /data/db directory from the host to the docker container to achieve data persistency.

First, Nginx proxy.

The idea/goal isn’t that complicated. There are three docker containers running on my production machine for the blog, by service names:

  1. blog-api, the api server that listens to :1717
  2. web, the frontend nginx server that listens to :80
  3. mongodb, the mongodb database that listens to :27017

Before today, I had to configure the port in the frontend code, so that the frontend calls the api endpoints with the base url and the port number. If this didn’t bother me enough, for the image uploading, all of the urls for embedding the images in the posts have the port numbers in them, so they look like this:

vcm-3422.vm.duke.edu:1717/uploads/image.png

It is against intuition for the port number to be shown to users, so I began looking for a solution. Nginx turns out to have this reverse proxy configuration that allows you to proxy requests to some location to some other port number, or even a remote server. It’s called “reverse” proxy because unlike “normal” proxies, the nginx server is the “first server” that the client connects to, whereas in other proxies the “proxy server” is the first and the nginx server would be behind the proxy.

With some trial and error, I came to this:

http {
	upstream docker-api {
		server blog-api:1717;
	}

	server {
		listen 80;
		root /app/dist;
		location / {
			try_files $uri /index.html;
		}

		location /api {
  			rewrite /api/(.*) /$1  break;
			proxy_pass http://docker-api;
			proxy_set_header X-Forwarded-Host $server_name;
			proxy_set_header X-Real-IP $remote_addr;
		}
	}
	
	# other configs
}

The upstream server name following the upstream keyword can be arbitrary, but the server name blog-api matches with my docker service name and will serve as the host name for the second hop as shown in the proxy_pass field below.

The location /api block does the proxying. The first line rewrites the request so that the /api part of the url is stripped since “api” is only used for triaging. The next few lines are pretty standard. They basically send along the original headers.

Voila! When I built and up’d my docker-compose services, I can see the blog posts showing up just like before. However, when I randomly tested image upload, I got a Request Entity too Large 413 error in the browser console. Apparently this is caused by the new nginx config, but how?

After some Googling, it turns out that nginx has a setting in HTTP server called client_max_body_size, which defaults to 1M. What’s more, from the documentation it says any request with body larger than this limit will get a 413 error, and this error cannot be properly displayed by browsers! Okay… so in the server block I added a

client_max_body_size 8m;

and everything works just fine 👌.

 

For the docker volume configuration, I came up with this requirement for myself because my mongodb data had been stored in the docker container – it gets lost when the container is removed. To back up the data easier and to have more confidence in the data persistency, I wanted to mount the database content from the host machine instead.

So in my docker-compose.yml,I added this simple thing to the mongodb service:

volumes:
      - /data/db:/data/db

Yes two lines and I included in my post. What can you do about it 🤨

Now I can just log in my production machine and copy away the /data/db directory to back up my precious blog posts data 🙂

 

Reference:

Use NGINX As A Reverse Proxy To Your Containerized Docker Applications

Daily bothering with launchd and AppleScript 

Credit: All of the following “someone” is this one.

This morning I noticed this repo was forked into my GitHub organization. I’m still not sure what the original intent was but I interpreted as a permission/offer to contribute. Since the repo’s name is “simplifyLifeScripts”, I spent some time pondering upon what kind of scripts would simplify lives, or more specifically, my life. I then came up with this brilliant idea of automating iMessage sending so that my Mac can send someone this picture of Violet Evergarden on a daily basis:

Violet

In the past I had to do this manually by dragging the picture into the small iMessage text box, which was simply too painful to do (I blame Apple). How cool and fulfilling would it be to sit back and let the Apple’s product annoy an Apple employee!

After some GOOGLing I came across this snippet of AppleScript that lets you send an iMessage with your account:

on run {targetBuddyPhone, targetMessage}
    tell application "Messages"
        set targetService to 1st service whose service type = iMessage
        set targetBuddy to buddy targetBuddyPhone of targetService
        send targetMessage to targetBuddy
    end tell
end run

Basically it takes iMessage service from the system and tell it to send the message to a person given a phone number.

Since I also have to send an image as attachment, I added to this piece so it became:

on run {targetBuddyPhone, targetMessage, targetFile}
    tell application "Messages"
        set targetService to 1st service whose service type = iMessage
        set targetBuddy to buddy targetBuddyPhone of targetService

        set filenameLength to the length of targetFile
        if filenameLength > 0 then
            set attachment1 to (targetFile as POSIX file)
            send attachment1 to targetBuddy
        end if

        set messageLength to the length of targetMessage
        if messageLength > 0 then
            send targetMessage to targetBuddy
        end if
    end tell
end run

It now takes one more parameter that’s the file name. The script converts the file name to a POSIX file and send as attachment. I also added two simple checks so that I can send text and/or file.

The next step would be to automate the process. Just when I was ready to Google one more time someone pointed me to Apple’s launchd, which is similar to unix’s cron. launchd lets you daemonize pretty much any process. One needs to compose a plist (a special form of XML) file and put it under /Library/LaunchDaemons/, then the daemon would start as one of the system start up items.

Following the official guide, I made the following plist file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.billyu.botherlucy</string>
    <key>ProgramArguments</key>
    <array>
        <string>osascript</string>
          <string>/Users/billyu/dev/simplifyLifeScripts/sendMessage.applescript</string>
        <string>9999999999</string>
        <string>Daily Lucy appreciation :p</string>
        <string>/Users/billyu/dev/simplifyLifeScripts/assets/violet.png</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>0</integer>
    </dict>
</dict>
</plist>

The ProgramArguments key is mapped to an array of arguments used to execute the process wrapped in the daemon. In my case, I just run osascript to execute the AppleScript at the absolute path, with the phone number, text message, and the image absolute path as parameters. The phone number is obviously censored.

The other key, StartCalendarInterval, is a handy way to run the job periodically. Any missing key will be filled with “*” wildcard. In this case, the process would be run every day at 00:00. I later changed it to 22:00 after realizing my computer might be shut down at midnight. Can’t miss the bother window.

To avoid restarting my laptop, after copying the file to the launchd directory I did sudo launchctl load {plist file path} so the daemon would start right away.

I did some testing with sending the message every minute and it worked perfectly. It’s worth noting that this is one of the few things that just worked the first try.

Excited for 10pm tonight! Although someone else might not be.