Writing a simple search engine in PHP

Writing a search engine for your LAMP server is actually fairly simple. I realized I could simply use grep for the search algorithm and have the search engine use regular expression metacharacters. I mean, why reinvent the wheel? Of course, a search engine that crawls the entire web will be more complicated; this one just searches the local web server.

Here is the code I’ve written. It uses less than ten lines of PHP code, and even what I have here has some parts that are rather superfluous (I should probably streamline it)…


 1 <!DOCTYPE html>
 2 <!-- A simple search engine written in PHP -->
 3 
 4 <html>
 5 <head>
 6 <title>Search</title>
 7 </head>
 8 
 9 <body>
10 <?php
11 if( is_null( $_GET['query'] ) ){
12 ?>
13 <!-- No query performed yet -->
14 <form action="<?php echo $_SERVER['PHP_SELF']?>" method="get">
15 Enter a query:<br>
16 <input name="query" type="text"><br>
17 <input type="submit" value="Search">
18 </form>
19 <?php } else { ?>
20 <!-- Query has been performed -->
21 <?php
22 exec( "grep -ril \". $_GET['query'] . "\" *", $files );
23 $length = sizeof( $files );
24 for( $i = 0$i < $length$i++ ){
25         echo( "<a href=\". $files[$i] . "\">. $files[$i] . "</a><br>\n);
26 }
27 ?>
28 <?php } ?>
29 </body>
30 </html>

Since the search algorithm is simply a front-end for grep, I didn’t really have to think about its implementation. Basically, the script has a decision statement that looks at the superglobal variable $_GET['query']. If it’s null, that means the query hasn’t been submitted yet, so it shows the prompt for the query. If it’s not null, it shows the results of the query, which is of course a regular expression. The results are obtained by greping the local server filesystem and returning all files that contain that pattern.

One thing that makes PHP code somewhat confusing is the way you can have a PHP script interleaved with HTML code. It’s one of those things you just have to get used to.

Possible enhancements include:

  • Using egrep instead of grep so the user can use extended regular expressions.
  • Adding the ability to search for images, videos, and other media on the server, rather than just web pages (this could be done by returning any such media that are used by pages that match the pattern).
  • Searching filenames in addition to file contents (you could use find for this).
  • Using a ranking algorithm (currently it just lists them in the canonical order returned by grep).
  • And of course adding CSS and other formatting to make the page more aesthetically pleasing.

Reverse engineering Apple’s in-terminal emojis

I found a somewhat interesting feature in the Apple terminal, supported by both Terminal.app and iTerm2. Essentially, it allows for the printing of emojis in the terminal. I discovered this when I noticed that the Homebrew package manager prints beer and tap emojis when it is downloading and installing packages. I was curious about how these emojis are represented, so I used a program that I had written, called hex, which essentially just displays the hex values of characters entered by the user. I’ve used this program to find escape sequences for things like function keys and the like, allowing me to implement these inputs in my programs.

Here is the source code for hex. It’s a fairly simple program…


 1 // This program displays the octal, hex, decimal,
 2 // and byte character representations of characters
 3 // typed at the terminal.
 4 
 5 #include <stdio.h>
 6 #include <termios.h>
 7 #include <signal.h>
 8 
 9 struct termios term;
10 struct termios save;
11 
12 void stop( int );
13 void cont( int );
14 
15 int mainint argc, char **argv ){
16         tcgetattr0, &term );
17         save = term;
18         term.c_lflag &= ~( ICANON | ECHO | ECHONL );
19         signalSIGINT, stop );
20         signalSIGTSTP, stop );
21         signalSIGCONT, cont );
22         tcsetattr0TCSANOW, &term );
23         // Section where the program executes
24         // in raw mode:
25         unsigned char c;
26         for(;;){
27                 c = (unsigned chargetchar();
28                 printf"%3o\t%2x\t%3d\t%c\n", c, c, c, c );
29         }
30         // End raw mode:
31         tcsetattr0TCSANOW, &save );
32         return 0;
33 }
34 
35 // Restores terminal settings before allowing
36 // the signal.
37 void stop( int signum ){
38         tcsetattr0TCSANOW, &save );
39         signal( signum, SIG_DFL );
40         raise( signum );
41 }
42 
43 // This function undoes the cleanup performed
44 // by stop() when the process is suspended.
45 void cont( int signum ){
46         tcsetattr0TCSANOW, &term );
47         signalSIGTSTP, stop );
48 }

I copied the beer mug emoji from the terminal after downloading a package. I then started the hex program and hit Command-V, then copied the hex bytes (there were four of them) by hand into a file using a hex editor. To test it, I ran cat on the file I had created, and lo and behold – a beer emoji was printed to my terminal.

I then decided to test a concept, the concept of printing emojis in a program. I used assembly language for this, because I’m not entirely sure of the proper syntax for C, and I thought it would be faster to write if I did it in assembler. I tested several codes, and then when I found the result, I wrote a comment in the source file indicating what the emoji was. Be warned – this code is extremely tedious, even for an assembly program.


 1 global start
 2 
 3 segment .data
 4 str1:   db      0xf0,0x9f,0x8d,0xb0,0x0a,0x00 ; Cake emoji
 5 str2:   db      0xf0,0x9f,0x8d,0xba,0x0a,0x00 ; Beer emoji
 6 str3:   db      0xf0,0x9f,0x8d,0xa9,0x0a,0x00 ; Donut emoji
 7 str4:   db      0xf0,0x9f,0x8d,0x8e,0x0a,0x00 ; Apple emoji
 8 str5:   db      0xf0,0x9f,0x8d,0x81,0x0a,0x00 ; Maple leaf emoji
 9 str6:   db      0xf0,0x9f,0x8d,0x82,0x0a,0x00 ; Leaf emoji
10 str7:   db      0xf0,0x9f,0x8d,0x97,0x0a,0x00 ; Drumstick emoji
11 
12 segment .text
13 start:
14         push    dword 5
15         push    dword str1
16         push    dword 1
17         mov     eax,4
18         sub     esp,4
19         int     0x80
20 
21         push    dword 5
22         push    dword str2
23         push    dword 1
24         mov     eax,4
25         sub     esp,4
26         int     0x80
27 
28         push    dword 5
29         push    dword str3
30         push    dword 1
31         mov     eax,4
32         sub     esp,4
33         int     0x80
34 
35         push    dword 5
36         push    dword str4
37         push    dword 1
38         mov     eax,4
39         sub     esp,4
40         int     0x80
41 
42         push    dword 5
43         push    dword str5
44         push    dword 1
45         mov     eax,4
46         sub     esp,4
47         int     0x80
48 
49         push    dword 5
50         push    dword str6
51         push    dword 1
52         mov     eax,4
53         sub     esp,4
54         int     0x80
55 
56         push    dword 5
57         push    dword str7
58         push    dword 1
59         mov     eax,4
60         sub     esp,4
61         int     0x80
62 
63         push    dword 0
64         mov     eax,1
65         sub     esp,4
66         int     0x80

Here is what I get when I run the program:

emoji-run

It’s not particularly useful, especially considering that these emojis are not portable to other systems, but it’s an interesting concept to test.  It’s kind of confusing that the emojis use four bytes each, whereas the UTF-8 character set used by the terminal uses three bytes for each Unicode character.  I’m not sure how Apple implemented this.

C program to convert text to Pig Latin

Just a fun program I wrote the other day, mostly to exercise and maintain my C programming skills. It works for the most part, though it only takes lowercase sentences without any punctuation, and I haven’t figured out how to get it to work with words where the first vowel is “y”.

This program was inspired by a program in /usr/games in my Slackware installation, which was also supposed to convert sentences to Pig Latin, but for some reason didn’t work at all (meaning I would give it a sentence and it would just do nothing).

 

#include <stdio.h>
#include <string.h>
#include <ctype.h>

int main( int argc, char **argv ){
        int pos = 0;
        int len = strlen( argv[1] );
        while( pos <= len ){
                char str1[len + 2], str2[len + 2];
                forint i = 0; i < len + 2; i++ ){
                        str1[i] = '\0';
                        str2[i] = '\0';
                }
                // Copy the next word:
                int j = 0;
                while( isalpha( argv[i][pos] ) ){
                        str1[j++] = argv[1][pos++];
                }
                // Convert the word to Pig Latin:
                forint i = 0; i < len; i++ ){
                        if( str1[i] == 'a' || str1[i] == 'e' || str1[i] == 'i' || str1[i] == 'o' || str1[i] == 'u' ){
                                strcpy( str2, str1 + i ); // Copy part after first vowel
                                strncpy( str2 + strlen( st1 ) - i, str1, i ); // Copy part before first vowel
                                strcpy( str2 + strlen( str1 ), "ay" );
                                break;
                        }
                }
                // Print Pig Latin word:
                print( "%s ", str2 );
                // Advance to next word:
                while( isspace( argv[1][pos] ) ){
                        pos++;
                }
                if( argv[1][pos] == ‘\0’ ) break;
        }
        printf( "\n" );
        return 0;
}

 

Circular buffer simulation in C

Okay, time for a programming post. I’m going to share a small C program I wrote a while ago. It’s an interactive program that simulates a circular buffer. A circular buffer is a data structure where data elements are inserted at one end and taken off the other end. It’s sort of like a queue, but it’s more compact and a limited size.

EDIT: Forgot to change the angle brackets to &lt; and &gt; in the HTML code. It’s fixed now.

 1 /**********************************
 2  * circular-buffer.c - simulation *
 3  * of a circular buffer           *
 4  * Author: ZenHacker2015          *
 5  * Date: Started Jan 15, 2015     *
 6  *       Contd   Jan 16, 2015     *
 7  **********************************/
 8 
 9 #include <stdio.h>
10 #include <ctype.h>
11 #include <stdlib.h>
12 #include <unistd.h>
13 
14 int consume( void );
15 void produce( int );
16 void printbuf( void );
17 void increment( int ** );
18 
19 int *buffer;    // array used as the circular buffer
20 int *start;     // pointer to the start of the data
21 int *end;       // pointer to the end of the data
22 int bufsize;    // size of the buffer, as given by command line argument
23 
24 int main( int argc, char **argv ){
25         // Initialization:
26         bufsize = atoi( argv[1] );
27         buffer = (int *) malloc( sizeofint ) * bufsize );
28         start = end = &buffer[0];
29         char readbuffer[20];
30 
31         puts( "CIRCULAR BUFFER SIMULATOR" );
32         puts( "Commands:" );
33         puts( "consume - Consume an item from the buffer" );
34         puts( "print - Print the contents of the buffer" );
35         puts( "<number> - Add item with value <number> to the buffer" );
36         puts( "Type Ctrl+C to exit." );
37 
38         // Loop and get input from user:
39         for(;;){
40                 printf( "> " );   // prompt
41                 fflush( stdout ); // print prompt immediately
42                 read( 0, readbuffer, 20 );
43                 if( readbuffer[0] == 'c' )
44                         printf( "%d\n", consume() );
45                 else if( readbuffer[0] == 'p' )
46                         printbuf();
47                 else if( isnumber( readbuffer[0] ) )
48                         produce( atoi( readbuffer ) );
49                 else
50                         printf( "Invalid command.\n" );
51         }
52 
53         return 0;
54 }
55 
56 // Consumes an item from the buffer
57 int consume(){
58         int val = *start;
59         increment( &start );
60         return val;
61 }
62 
63 // Places an item into the buffer
64 void produce( int item ){
65         *end = item;
66         increment( &end );
67 }
68 
69 // Prints the contents of the buffer, fromt start to end
70 void printbuf(){
71         if( start < end ){
72                 forint *p = start; p < end; p++ ){
73                         printf( "%d ", *p );
74                 }
75         }
76         else{
77         // Note: If start == end, this can be interpreted as either
78         // a full buffer or an empty buffer.
79                 forint *p = start; p < buffer + bufsize; p++ ){
80                         printf( "%d ", *p );
81                 }
82                 forint *p = buffer; p < end; p++ ){
83                         printf( "%d ", *p );
84                 }
85         }
86         printf( "\n" );
87 }
88 
89 // Increments start/end when an item is consumed/produced
90 void increment( int **pointer ){
91         if( *pointer - buffer < bufsize - 1 )
92                 (*pointer)++;
93         else
94                 *pointer = buffer;
95 }