Scraping iMessage and Messenger Messages and Displaying with Vue Frontend

Credit: she founded the project and provided the first version of the scraper.

A while ago my partner in the organization started message-analyzer because she thought it would be interesting to analyze the message data between us. She managed to scrape text messages out of both iMessage and Messenger (the two chat softwares that we use), put them together, built something that could decide which one of us a messaging is coming from. I believe the highest success rate she got to was 86%.

I was looking around in the project after she got most of it done and noticed this file called that runs a Flask application and serves the text messages on a web server. Since I’m pretty much a frontend developer now (no), I came up with the idea of displaying all of our messages on a web page, hopefully merging contents on both Apple and Facebook platforms.


I started with iMessage. It wasn’t too hard to simply take the output of the function that she wrote and serve it over the api endpoint. For the frontend I decided to try out Vue.

It wasn’t long before I got to the following:

The main component simply requests all messages and pass each data to a Message component. I added pagination for some convenience.

Message component looks like this:

It just displays the message content. If hovered, the delivered time is shown as a tooltip.

It all looked good, but how about attachments? There were hundreds of interesting images, stickers and files that we sent each other. It would not be as interesting if those were lost for the web page.

To show attachments, I dug deeper into how Apple stores messages.

Inspired by my partner, Apple stores messages in a sqlite database located in ~/Library/Messages/chat.db, so I took the liberty of looking at the schema.

Three tables caught my attention: attachment, message_attachment_join, and message.


The message_id matches with the ROWID on the message table. filename is actually a path to the attachment file on the local machine. With these information at hand, I revised the sqlite query to

After the messages and attachments are selected, I served the attachments over the api endpoint ‘/attachments’, and voila pictures on the page!

I later also displayed reactions to messages but I’d like to get to scraping Messenger soon.


Scraping Messenger is a little more tricky: my partner did it by scrolling up all the way to the top, saving the html file and extracting information from there. However, since the data is parsed once already by the Messenger frontend, it was a little difficult to get the dates and attachments as well as the messages.

I went into Chrome devtools and saw that the juicy request was to the url Ah so they use their own product. What’s frustrating is that each request at most retrieves ~200 messages, and Chrome doesn’t let me copy multiple request responses at a time.

I tried to reverse engineer how the requests are formatted, but was stuck at figuring out how the message count offset was sent. I came to the idea of writing a Chrome extension to capture the web requests.

The only API that allows you access to response bodies is devtools. Creating an extension is also easy – just need to have a manifest.json file that specifies the extension and some js scripts to be run by the browser, so I did this:

and used pyauthogui from my partner’s code to automatically scroll up like an idiot. I was able to get all messages in the devtools window of the devtools window (no typo). The repository is here.

All that was left was parsing the data retrieved and making sure both message sources end up having the same format when returned by the Flask server. Messenger had more attachment types and multiple attachments so it took me longer.

Due to privacy reasons, I can’t do a demo here :/ well mostly it’s just that I’m too lazy to put up a page with fake message data.

For future features I plan to do searching, improve pagination, style the Messenger system messages (“you waved at each other”), and make the UI prettier and easier to use.

Daily bothering with launchd and AppleScript 

Credit: All of the following “someone” is this one.

This morning I noticed this repo was forked into my GitHub organization. I’m still not sure what the original intent was but I interpreted as a permission/offer to contribute. Since the repo’s name is “simplifyLifeScripts”, I spent some time pondering upon what kind of scripts would simplify lives, or more specifically, my life. I then came up with this brilliant idea of automating iMessage sending so that my Mac can send someone this picture of Violet Evergarden on a daily basis:


In the past I had to do this manually by dragging the picture into the small iMessage text box, which was simply too painful to do (I blame Apple). How cool and fulfilling would it be to sit back and let the Apple’s product annoy an Apple employee!

After some GOOGLing I came across this snippet of AppleScript that lets you send an iMessage with your account:

on run {targetBuddyPhone, targetMessage}
    tell application "Messages"
        set targetService to 1st service whose service type = iMessage
        set targetBuddy to buddy targetBuddyPhone of targetService
        send targetMessage to targetBuddy
    end tell
end run

Basically it takes iMessage service from the system and tell it to send the message to a person given a phone number.

Since I also have to send an image as attachment, I added to this piece so it became:

on run {targetBuddyPhone, targetMessage, targetFile}
    tell application "Messages"
        set targetService to 1st service whose service type = iMessage
        set targetBuddy to buddy targetBuddyPhone of targetService

        set filenameLength to the length of targetFile
        if filenameLength > 0 then
            set attachment1 to (targetFile as POSIX file)
            send attachment1 to targetBuddy
        end if

        set messageLength to the length of targetMessage
        if messageLength > 0 then
            send targetMessage to targetBuddy
        end if
    end tell
end run

It now takes one more parameter that’s the file name. The script converts the file name to a POSIX file and send as attachment. I also added two simple checks so that I can send text and/or file.

The next step would be to automate the process. Just when I was ready to Google one more time someone pointed me to Apple’s launchd, which is similar to unix’s cron. launchd lets you daemonize pretty much any process. One needs to compose a plist (a special form of XML) file and put it under /Library/LaunchDaemons/, then the daemon would start as one of the system start up items.

Following the official guide, I made the following plist file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "">
<plist version="1.0">
        <string>Daily Lucy appreciation :p</string>

The ProgramArguments key is mapped to an array of arguments used to execute the process wrapped in the daemon. In my case, I just run osascript to execute the AppleScript at the absolute path, with the phone number, text message, and the image absolute path as parameters. The phone number is obviously censored.

The other key, StartCalendarInterval, is a handy way to run the job periodically. Any missing key will be filled with “*” wildcard. In this case, the process would be run every day at 00:00. I later changed it to 22:00 after realizing my computer might be shut down at midnight. Can’t miss the bother window.

To avoid restarting my laptop, after copying the file to the launchd directory I did sudo launchctl load {plist file path} so the daemon would start right away.

I did some testing with sending the message every minute and it worked perfectly. It’s worth noting that this is one of the few things that just worked the first try.

Excited for 10pm tonight! Although someone else might not be.

Progress on blog rewrite

All right, this is where I say I can actually get something done.

Achievements for the blog project include:

  • APIs for log in/out, posts CRUD, comments CRUD, like/dislike
  • 93% coverage on APIs mentioned above
  • Using React-Redux to maximize data reuse and minimize the number of API calls
  • Using universal-cookie to store the logged in state (okay this might not deserve a stand alone bullet point)
  • Using Docker (Dockerfile and docker-compose) to automate the deployment process.

Today, lucky for you, I’ve decided to talk about how docker-compose in this project works.

Docker is the company driving the container movement and the only container platform provider to address every application across the hybrid cloud.

^ From Docker’s self introduction. What that means for me is that with proper usage, I wouldn’t have to set up production machines with all the dependencies that my project needs whenever I would like to deploy. Ideally all I would have to do is to write Dockerfiles and docker-compose.yml, install Docker and let Docker handle the rest.

In this blog project, separating the backend and the frontend, the dependencies (required on the environment, not the npm ones) are:

  • backend:
    • MongoDB
    • Node/npm
  • frontend:
    • Node/npm (for building)
    • Nginx (for serving)

With these in mind, I was able to write a Dockerfile and a docker-compose.yml for the backend following documentations and random StackOverflow answers online:


FROM node:carbon


COPY package*.json ./

RUN npm install

COPY . .

RUN npm run build-server


RUN ["chmod", "+x", "/app/"]

CMD ["node", "build/server.js"]


version: '3'
      context: ./
      dockerfile: Dockerfile
    restart: always
      - mongodb
      MONGO_URL: mongodb://mongodb:27017/blog
      - "1717:1717"
    command: bash /app/ mongodb:27017 -- node build/server.js
    image: mongo:latest
    restart: always

The Dockerfile specifies the config for the blog-api container, while the docker-compose.yml tells Docker how my blog-api container relates to the mongodb service container.

Several things to notice:

  • Each Docker container is like a VM by itself, so the WORKDIR is the directory in the container, and when I do a ‘COPY . .’, naturally it copies from the current directory in the host to the current directory in the container.
  • Notice how I copied the package.json file first and npm installed before copying anything else. The reason for this is that Docker uses a layering cache system that is able to reuse previous versions of images if nothing changes in Dockerfile. Therfore if I only change some api route file, I wouldn’t have to wait for the long npm install process again.
  • wait-for-it is a tool to wait for a process to listen to a port before doing something. It has automatic retires that is very useful in this case. I could, however, just let blog-api restart always as is, but this tool doesn’t have as much overhead.

Later I added another Dockerfile for the frontend, which looks like this:

FROM nginx

RUN apt-get update

RUN apt-get install -y curl wget gnupg

RUN curl -sL | bash

RUN apt-get install -y nodejs


COPY package*.json ./

RUN npm install

COPY . .

RUN npm run build

RUN cp -a /app/dist/* /usr/share/nginx/html

RUN cp /app/nginx.conf /etc/nginx/

This image extends from nginx, so the default CMD starts up the nginx server. I need nodejs for building the static files, so I added the couple lines there. The last two lines copy the static files to nginx’s serving directory and my config file to nginx’s config directory.

With the frontend added, I added one more service to docker-compose.yml:

      context: ./
      dockerfile: Dockerfile-frontend
    restart: always
      - "80:80"

This simply links my container for the web frontend to docker-compose so that I wouldn’t have to manually start up every container. Instead, I would only have to do docker-compose build and docker-compose up -d.

I also added automatic seeding for the MongoDB database but I’m too lazy to paste the steps here again so screw you.

This following point is unrelated to Docker, but I spent some time on it and felt like it would be interesting to include here. It is my nginx.conf file. Since I’m building the frontend with React single-page-serves-it-all pattern, I have to make sure that the nginx server returns the index.html file no matter what the sub url paths are. The only exception is that the client is requesting some js or resource file. With this in mind:

server {
    listen 80;
    root /usr/share/nginx/html;
    location / {
        try_files $uri /index.html;

It tries to file the file specified in the uri first, before returning index.html regardless. 404 is handled on the frontend by my React application.

For the next step, I’ll be working on attachments to posts as a feature request from this person.

Rewrite(?) of this blog

I will be working on building a prettier blog (?) soon. The current tech stack selection is:

  • MongoDB
  • React
  • Express
  • Bootstrap
  • Travis
  • Docker
  • Mocha
  • Webpack

Tentative Feature List:

  • posts CRUD
  • users CRUD with admin
  • comments CRUD
  • posts like/share
  • categories and highlights of posts
  • hmmm I think that’s it, not too shabby

I’ll try to extract string literals specific for me in case some bored person on GitHub ever wants to reuse it (no).

Dumping this database and converting into MongoDB is going to be a PAIN IN THE ASS, so wish me luck LMAO.

If anyone’s reading this, hit me with some pretty frontend framework so I don’t have to freaking tweak the CSS and wanting to kill myself all the time.

Two blog posts in a day I’m on fire.


UPDATE ON April 28, 2018:

Repository is up here. Let’s see how much I can get done before my finals.

My Projects

I’ve decided to make a summary of my past projects here. I have spent most of my free time on iOS development, and also explored some web development using PHP and JavaScript. In the past summer I used JavaScript to work on an educational software for a CS professor here at Duke.

I’ve ordered them according to my personal preference:)

  1. DukeCSA

DukeCSA (on GitHub) is the iOS app started by Jay Wang (currently a senior at Duke) to fit the needs of Duke Chinese Student Association. I joined the team around Christmas 2015. It combined many useful functionalities:

  • events post – users can view upcoming and past events hosted by DukeCSA. They can sign up or comment on the events in the app.
  • Q&A – students can ask their peers about life at Duke. This section is like Quora for Duke.
  • Class Database – users can view a massive (1000+) collection of comments on courses offered here at Duke to help them make choices.
  • Crush – users can express their secret admiration to others. If there is a match, both users will get notifications.
  • Web event poster – a web interface for the CSA committee to post a new event. The event will then be saved to our database and all users will be notified. The user does not need to write any code.

short demos:
notification indication

web interface

Read more about iOS projects


2. JFLAP web

JFLAP (Java Formal Language and Automata Package) is an educational software about finite state machines, Moore and Mealy machines, Turing machines etc. I worked on building the online version of JFLAP and integrating JFLAP into OpenDSA (Data Structures and Algorithms) project.

The job included designing and implementing the user interface, optimizing and implementing the algorithms and migrating Java version to JavaScript. I learned about formal languages and automata as well as software development.

short demo:

more about JFLAPmore about OpenDSAdevelopment blog, web demo


3. 3D iOS games

I also learned about 3D iOS game development. Below are demo videos of them:

Marble Maze – gravity-controlled



4. Tank Battle

This is a homework project in my software development class, but I treat it more than that. The game features elements such as stone, brick, grass and water. The player needs to protect the base and eliminate enemies. The game also uses permanent storage to present a leader board.


The design comes from the classic video game battle city.


5. Blog Post System

A blog post system written mainly with PHP. Responsive to both desktop and mobile devices. Users are able to view all posts without logging in and post articles or comments when logged in. Data is stored in MYSQL database. APIs are also built for possible iOS app development in the future.

demo: (It’ll probably be more fun if you could read Chinese)


6. Wheeshare

(my first iOS app!). This is an iOS app that promotes sharing among Duke students. I completed this project with grant from Duke CoLab, my current employer.
On the platform, students are able to post their belongings to lend, or to browse through the available items and request to borrow with one click. Students can also easily manage their posts.


Memo on C Language Programming

Starting yesterday I have been reading the official manual of C programming language from GNU. I have finished 17 chapters out of 20-something chapters of the whole manual. I realized that the best way to learn an open-source stuff is indeed to read the documentation/help manual.

Yesterday I covered basic knowledge of the programming language such as type, expression, function, and pointer. It didn’t take much time because I have read other materials on C before. I did spend some time understanding pointers since in Java there is no explicit definition of pointer.

Today’s new materials include IO, String operation and “making” a program with multiple C source code files. I’m going to write down here things new to me.


  1. File IO:
#include <stdio.h>
#include <stdlib.h>

int main() {
	FILE *stream;
	stream = fopen("shit.dat", "w");
	int my_array[2][2] =
	size_t object_size = sizeof(int);
	size_t object_count = 4;

	if (stream == NULL) {
		printf("shit.dat could not be created\n");
	printf("file opened for writing\n");
	fwrite(&my_array, object_size, object_count, stream);
	stream = fopen("shit.dat", "r");
	if (stream == NULL) {
		printf("shit.dat could not be read\n");
	printf("file opened for reading\n");
	fread(&my_array, object_size, object_count, stream);

	for (int i = 0; i < 2; i++) {
		for (int j = 0; j < 2; j++) {
			printf("%d ", my_array[i][j]);
	return 0;

Most important functions for input/output with files would be fopen, fclose, fread, fwrite, getline and fprintf. According to the manual, it is suggested to use fread, getline and fwrite since they are safer than the rest, some of which are already deprecated. It’s worth noting that the second and the third parameters of fwrite and fread are of type size_t. Other than this, this part is pretty easy.


2. Combination of getline and sscanf

getline is a safe method, if you pass in an uninitialized string pointer, the program will create a buffer of a proper size for you and populate the variable. However, if you use methods like scanf instead, you may encounter buffer overflow errors, which can be very common. getline returns a line of text before a linebreak from a stream, which can be either stdin or a file.

Then, sscanf is used to read stuff of a specific type or a format from the string. This combination, according to the manual, is much better than using scanf alone, since it avoids many errors.

Example code:

#include <stdlib.h>
#include <stdio.h>

int main()
	int args_assigned = 0;
	size_t nbytes = 2;
	char *my_string;
	int int1, int2, int3;

	while (args_assigned != 3)
		puts("Please enter three integers separated by whitespace.");
		my_string = (char *) malloc(nbytes + 1);
		getline(&my_string, &nbytes, stdin);
		args_assigned = sscanf(my_string, "%d %d %d", &int1, &int2, &int3);
		if (args_assigned != 3)
			puts("Invalid input!");
			printf("Three integers: %d %d %d\n", int1, int2, int3);
	return 0;

It doesn’t matter that my_string is initialized with a very small size: getline will take care of that.



ARGP is such a strong tool!! With this, it’s very easy to parse parameters passed to the program and provide the users with usage explanations and documentations interactively.

The boss function is argp_parse, which takes four parameters: 1. parameter options, in a struct type, 2. a function to handle the option and parameter fields, 3. a string describing the arguments format, 4. a string that documents the program.

There are so many options available for customization. Although it’s hard to remember all of the parameter types and requirements, in actual development process I can just copy the old example piece of code and continue happily from there.

Example code:

#include <stdio.h>
#include <argp.h>

const char *argp_program_version = "argex 1.0";
const char *argp_program_bug_address = "<>";

/* This structure is used by main to communicate with parse_opt. */
struct arguments
	char *args[2];
	int verbose;
	char *outfile;
	char *string1, *string2;

 * 	OPTIONS. Field 1 in ARGP.
 * 	Order of fields: {NAME, KEY, ARG, FLAGS, DOC}.
static struct argp_option options[] = 
	{"verbose", 'v', 0, 0, "Produce verbose output"},
	{"alpha", 'a', "STRING1", 0, "Do something with STRING1 related to the letter A"},
	{"bravo", 'b', "STRING2", 0, "Do something with STRING2 related to the letter B"},
	{"output", 'o', "OUTFILE", 0, "Output to OUTFILE instead of to standard output"},

 * PARSER. Field 2 in ARGP.
 * Order of parameters: KEY, ARG, STATE.
static error_t parse_opt (int key, char *arg, struct argp_state *state)
	struct arguments *arguments = state->input;
	switch (key)
		case 'v':
			arguments->verbose = 1;
		case 'a':
			arguments->string1 = arg;
		case 'b':
			arguments->string2 = arg;
		case 'o':
			arguments->outfile = arg;
		case ARGP_KEY_ARG:
			if (state->arg_num >= 2)
			arguments->args[state->arg_num] = arg;
		case ARGP_KEY_END:
			if (state->arg_num < 2)
	return 0;

 * ARGS_DOC. Field 3 in ARGP.
 * A description of the non-option command-line arguments that we accept.
static char args_doc[] = "ARG1 ARG2";

 * DOC. Field 4 in ARGP.
 * Program documentation.
static char doc[] = 
"argex -- A program to demonstrate how to code command-line options and arguments.\vFrom the GNU C Tutorial.";

 * The ARGP structure itself.
static struct argp argp = {options, parse_opt, args_doc, doc};

 * The main function.
 * Notic how now the only function call needed to process all command-line options and arguments nicely is argp_parse.
int main (int argc, char **argv)
	struct arguments arguments;
	FILE *outstream;
	char waters[] = "Some long sentence";

	/* Set argument defaults */
	arguments.outfile = NULL;
	arguments.string1 = "";
	arguments.string2 = "";
	arguments.verbose = 0;

	/* Where the magic happens */
	argp_parse(&argp, argc, argv, 0, 0, &arguments);

	/* Where do we send output? */
	if (arguments.outfile)
			outstream = fopen(arguments.outfile, "w");
		outstream = stdout;

	/* Print argument values */
	fprintf(outstream, "alpha = %s\nbravo = %s\n\n", arguments.string1, arguments.string2);
	fprintf(outstream, "ARG1 = %s\nARG2 = %s\n\n", arguments.args[0], arguments.args[1]);

	/* If in verbose mode, pring song stanza */
	if (arguments.verbose)
		fprintf(outstream, "%s", waters);

	return 0;

When it runs it really behaves like a “legit” GNU open source software!


I also read about makefiles: its rules, targets and variables that can simplify the code. I guess tomorrow I’ll read more about C. If I finish this manual I’ll take a look at the GNU Make manual.

Anyway, it’s cool that a book originally written 30 years ago is still not outdated at all.


First day on JFLAP project

Today is my first day working on JFLAP project! JFLAP is an educational software that teaches students about automata and turing machines etc. As instructed by my supervisor, I created a blog here. I’m just going to copy and paste what I write there to here daily.

Here’s my first day blog:

Today is my first day of working on JFLAP, and I actually did quite a lot. I got two books in the morning: Formal Languages and Automata by Linz and JFLAP by Rodger and Finley. I finished reading the first two chapters of Linz’s book. The first chapter introduces some basic concepts that include language, grammer and automaton, while the second teaches me about deterministic finite automata (DFA) and nondeterministic finate automata (NFA). I learned that these two automata may seem different, but they are able to transform to one another.

I then downloaded version 7 of JFLAP and tested the software following the JFLAP intro book. The software is very easy to use and since I read the chapters in Linz’s book, the graphs were familiar to me. I find that there are also many other features besides DFA and NFA. Hope that I can learn about them later in the summer.

In the afternoon I was given access to a server. To my frustration my account is not a sudoer, which means for now I can only build this blog with html and css. I will see if I can install wordpress later. That will make this website much prettier. In either case there will be many changes to this page for sure.

A lot left to learn, wish myself best of luck.

Blog Post System using PHP

Today I would like to talk in a little bit more detail about my blog post system written in PHP.

The main page looks like this:

Screen Shot 2016-05-08 at 5.08.49 PM

First, the database structure:

Screen Shot 2016-05-08 at 5.28.27 PM

The structure is actually pretty straightforward: one table for user authentication, one for posts and one for comments. For user authentication, password hashed with md5 is stored in the database. When the users attempt to log in, their hashed input and the one in the database is compared, a traditional approach. For each post, two main pieces of information are topic and content. They form the body of a post. Author is stored simply as the username. Date is stored as a formatted string instead of UNIX timestamp because somehow I could not get that to work :(. For comment, its associated post is stored as articleId. When I present the comments of each article, I query the database for this articleId. This might be slower than other methods such as keeping references, but since I’m not storing a million blog posts, this works just fine.

Recently I finished paging and comment system. For paging, I first query the post table and get the total number of posts. Then according to articles_per_page variable set in config.php I query the table for more times with a LIMIT to present posts only for a specific page. Page index is given with a GET request. If there is not such information in $_GET, the default value is set as 1, obviously.

For now, comments can only be viewed after you click on each single article to see the details. At the bottom of the article, I query the comment table to look for the articleId. A helper method does this and returns the comments as an array of objects. I then simply use a loop to traverse the array and echo them out on the page.

Posting comment is a little bit different: the post request is handled by another php file which does not present anything. After storing the comment into the database, the script routes back to the earlier article. In the POST request, only content is passed. articleId is passed with the super global variable $_SESSION. I’m not sure if this is the best way, but it is surely easier to write than the curl method that I found online.

Several problems I encountered:

  1. For creating the post, not only do I need to verify the user is logged in when the page is presented, I also need to verify when the post request is received. Because softwares such as Postman can easily create a post request and flood the database.
  2. For frontend, I find that the CSS argument clear: both is amazingly useful. I used float a lot for my page design, so a lot of times I want to keep divs stable.
  3. Typo is a bitch, especially those inside double quotes. When coding on a server there is no nice IDE that reminds me there is a grammar mistake or a typo, so I really need to be careful about these. Sometimes one typo took me twenty minutes to debug.
  4. Security. When I gave my address to my friend to test it. He hacked the site with simple javascripts easily, which forced me to filter any input that the users are giving to the site. Now I blocked the word script completely, so evil people cannot alert me every time I get on the blog.

Things that I will be working on:

  1. Keep user input in session. In my project, when the user hit “comment” or “post” but they are not logged in, they are directed to the log in page and directed back but the input is lost. I definitely don’t want them to type all over again, so caching inputs is a good idea.
  2. Move log in/out to the main page as a small popup. Right now when the users click on login, they are directed to another page to put in their username and password. However, keeping them in the same page will save users’ attention loss.
  3. Adding styled text and images in post. Maybe I could add some buttons so the users can upload images for posts. I have to be careful though because some users such as my friend could upload bad things to my beloved server.

That’s pretty much it. I am just done with my finals yesterday and good news is I got a perfect score on the algorithm final! Yayyy. For this summer I plan to learn more about iOS and building projects with PHP, Swift and maybe a little Javascript. My friend told me modern websites are mainly written with Javascript so I want to learn about that.

It’s been a while

Yes it has been a long time without a post on here. To be honest I spent most of my spare time on League of Legends. This evil game…

It’s almost the end of the semester and I still have three finals left. Other than school, I learned some php, some Node and also a little bit iOS and Watch OS. I built this blog post system using php and am still updating it. I plan to introduce following features: editing posts if logged in, adding timestamp and author info, and display posts in separate pages. These will take some time but it will also be fun along the way!

I also ordered an Apple watch a couple of days ago. Hope that I can build some interesting apps with it! Maybe I’ll build a watch version app for Duke CSA, but it will be hard for sure.

There’s always so much to learn and so little time.