Reverse engineering Apple’s in-terminal emojis

I found a somewhat interesting feature in the Apple terminal, supported by both Terminal.app and iTerm2. Essentially, it allows for the printing of emojis in the terminal. I discovered this when I noticed that the Homebrew package manager prints beer and tap emojis when it is downloading and installing packages. I was curious about how these emojis are represented, so I used a program that I had written, called hex, which essentially just displays the hex values of characters entered by the user. I’ve used this program to find escape sequences for things like function keys and the like, allowing me to implement these inputs in my programs.

Here is the source code for hex. It’s a fairly simple program…


 1 // This program displays the octal, hex, decimal,
 2 // and byte character representations of characters
 3 // typed at the terminal.
 4 
 5 #include <stdio.h>
 6 #include <termios.h>
 7 #include <signal.h>
 8 
 9 struct termios term;
10 struct termios save;
11 
12 void stop( int );
13 void cont( int );
14 
15 int mainint argc, char **argv ){
16         tcgetattr0, &term );
17         save = term;
18         term.c_lflag &= ~( ICANON | ECHO | ECHONL );
19         signalSIGINT, stop );
20         signalSIGTSTP, stop );
21         signalSIGCONT, cont );
22         tcsetattr0TCSANOW, &term );
23         // Section where the program executes
24         // in raw mode:
25         unsigned char c;
26         for(;;){
27                 c = (unsigned chargetchar();
28                 printf"%3o\t%2x\t%3d\t%c\n", c, c, c, c );
29         }
30         // End raw mode:
31         tcsetattr0TCSANOW, &save );
32         return 0;
33 }
34 
35 // Restores terminal settings before allowing
36 // the signal.
37 void stop( int signum ){
38         tcsetattr0TCSANOW, &save );
39         signal( signum, SIG_DFL );
40         raise( signum );
41 }
42 
43 // This function undoes the cleanup performed
44 // by stop() when the process is suspended.
45 void cont( int signum ){
46         tcsetattr0TCSANOW, &term );
47         signalSIGTSTP, stop );
48 }

I copied the beer mug emoji from the terminal after downloading a package. I then started the hex program and hit Command-V, then copied the hex bytes (there were four of them) by hand into a file using a hex editor. To test it, I ran cat on the file I had created, and lo and behold – a beer emoji was printed to my terminal.

I then decided to test a concept, the concept of printing emojis in a program. I used assembly language for this, because I’m not entirely sure of the proper syntax for C, and I thought it would be faster to write if I did it in assembler. I tested several codes, and then when I found the result, I wrote a comment in the source file indicating what the emoji was. Be warned – this code is extremely tedious, even for an assembly program.


 1 global start
 2 
 3 segment .data
 4 str1:   db      0xf0,0x9f,0x8d,0xb0,0x0a,0x00 ; Cake emoji
 5 str2:   db      0xf0,0x9f,0x8d,0xba,0x0a,0x00 ; Beer emoji
 6 str3:   db      0xf0,0x9f,0x8d,0xa9,0x0a,0x00 ; Donut emoji
 7 str4:   db      0xf0,0x9f,0x8d,0x8e,0x0a,0x00 ; Apple emoji
 8 str5:   db      0xf0,0x9f,0x8d,0x81,0x0a,0x00 ; Maple leaf emoji
 9 str6:   db      0xf0,0x9f,0x8d,0x82,0x0a,0x00 ; Leaf emoji
10 str7:   db      0xf0,0x9f,0x8d,0x97,0x0a,0x00 ; Drumstick emoji
11 
12 segment .text
13 start:
14         push    dword 5
15         push    dword str1
16         push    dword 1
17         mov     eax,4
18         sub     esp,4
19         int     0x80
20 
21         push    dword 5
22         push    dword str2
23         push    dword 1
24         mov     eax,4
25         sub     esp,4
26         int     0x80
27 
28         push    dword 5
29         push    dword str3
30         push    dword 1
31         mov     eax,4
32         sub     esp,4
33         int     0x80
34 
35         push    dword 5
36         push    dword str4
37         push    dword 1
38         mov     eax,4
39         sub     esp,4
40         int     0x80
41 
42         push    dword 5
43         push    dword str5
44         push    dword 1
45         mov     eax,4
46         sub     esp,4
47         int     0x80
48 
49         push    dword 5
50         push    dword str6
51         push    dword 1
52         mov     eax,4
53         sub     esp,4
54         int     0x80
55 
56         push    dword 5
57         push    dword str7
58         push    dword 1
59         mov     eax,4
60         sub     esp,4
61         int     0x80
62 
63         push    dword 0
64         mov     eax,1
65         sub     esp,4
66         int     0x80

Here is what I get when I run the program:

emoji-run

It’s not particularly useful, especially considering that these emojis are not portable to other systems, but it’s an interesting concept to test.  It’s kind of confusing that the emojis use four bytes each, whereas the UTF-8 character set used by the terminal uses three bytes for each Unicode character.  I’m not sure how Apple implemented this.

Advertisements