Writing a simple search engine in PHP

Writing a search engine for your LAMP server is actually fairly simple. I realized I could simply use grep for the search algorithm and have the search engine use regular expression metacharacters. I mean, why reinvent the wheel? Of course, a search engine that crawls the entire web will be more complicated; this one just searches the local web server.

Here is the code I’ve written. It uses less than ten lines of PHP code, and even what I have here has some parts that are rather superfluous (I should probably streamline it)…


 1 <!DOCTYPE html>
 2 <!-- A simple search engine written in PHP -->
 3 
 4 <html>
 5 <head>
 6 <title>Search</title>
 7 </head>
 8 
 9 <body>
10 <?php
11 if( is_null( $_GET['query'] ) ){
12 ?>
13 <!-- No query performed yet -->
14 <form action="<?php echo $_SERVER['PHP_SELF']?>" method="get">
15 Enter a query:<br>
16 <input name="query" type="text"><br>
17 <input type="submit" value="Search">
18 </form>
19 <?php } else { ?>
20 <!-- Query has been performed -->
21 <?php
22 exec( "grep -ril \". $_GET['query'] . "\" *", $files );
23 $length = sizeof( $files );
24 for( $i = 0$i < $length$i++ ){
25         echo( "<a href=\". $files[$i] . "\">. $files[$i] . "</a><br>\n);
26 }
27 ?>
28 <?php } ?>
29 </body>
30 </html>

Since the search algorithm is simply a front-end for grep, I didn’t really have to think about its implementation. Basically, the script has a decision statement that looks at the superglobal variable $_GET['query']. If it’s null, that means the query hasn’t been submitted yet, so it shows the prompt for the query. If it’s not null, it shows the results of the query, which is of course a regular expression. The results are obtained by greping the local server filesystem and returning all files that contain that pattern.

One thing that makes PHP code somewhat confusing is the way you can have a PHP script interleaved with HTML code. It’s one of those things you just have to get used to.

Possible enhancements include:

  • Using egrep instead of grep so the user can use extended regular expressions.
  • Adding the ability to search for images, videos, and other media on the server, rather than just web pages (this could be done by returning any such media that are used by pages that match the pattern).
  • Searching filenames in addition to file contents (you could use find for this).
  • Using a ranking algorithm (currently it just lists them in the canonical order returned by grep).
  • And of course adding CSS and other formatting to make the page more aesthetically pleasing.

Another revamping

I have done another significant revamping of this site. All code snippets are now in color. I will share the Vim script I used to do this, so that other bloggers may benefit from my work:

" Script converts syntax highlighting
" into a form suitable for my blog.

let html_use_css = 0
TOhtml
%s/00c000/00c000/g
%s/c0c000/c0c000/g
%s/0000ff/0000ff/g
%s/00c0c0/00c0c0/g
%s/<br>//g
w

To use this script, copy the code into a file named “blog.vim”, and then enter :so blog.vim in Vim. Open the resulting file in your web browser of choice, then go to the source and copy and paste the code into your blog (in HTML mode, starting after the <font face="monospace"> tag and ending before the final </font> tag. Surround the code with <pre><code> at the beginning and </code></pre> at the end in your HTML. You can modify the script to get the colors you want.

How to export syntax highlighting to a web page in Vim

This is one of the more advanced features of Vim, and one that’s not available in a lot of other text editors.  You can actually create an HTML file that has all the text in the file you are editing, colored with Vim’s syntax highlighting.

First of all, make sure you have syntax highlighting turned on.  If it’s not, enter :syntax on to turn on syntax highlighting.

You can create an HTML syntax file in Vim by entering the following simple command:

:TOhtml

This will create an HTML file with all the syntax highlighting in the file you’re editing.

Implementations of the :TOhtml script vary. Some create an HTML page with CSS by default, and others create a web page with <font> tags. CSS is better in my opinion because it allows you to more easily change the text and background colors after you’ve created the HTML file. You can toggle CSS by setting Vim’s html_use_css environment variable.

:let html_use_css=1

This will tell Vim to use CSS (of course you have to do this before you type :TOhtml).

TOhtml

Once the file is created, you can edit it as you please. The background color is white by default; I like to change it to black, and I like to make the foreground colors brighter.

CSS

The result looks something like this (in Google Chrome):

Chrome

Vim hack: applying an edit uniformly to a series of similar HTML files

Vim is a really great text editor, not just because it provides a lot of editing options, but also because it lends itself to a lot of creative tricks and hacks. This is one such hack. I discovered it a few years ago while exploring Vim’s keystrokes.  It allows you to apply an edit uniformly to several similar files. It works especially for HTML files. Sure, you can use CSS to quickly change the style of a series of HTML pages, but what if you want to change the content? This trick will allow you to do that.

Step 1: Open your files. Remember to only open the files you wish to apply the edit to, otherwise you will get undesired edits, or Vim will choke on one of the non-similar files.

Step 2: Type q followed by a character of your choice (say w). This will be your keystroke buffer.

Step 3: Go to the point in the first file where the editing will start. Here it’s crucial to use the / or ? keystroke for a regular expression search, rather than going to that point manually.

Step 4: Apply your edit. It can be anything you like, as long as you stay on the current file (for now).

Step 5: Go to command mode (if you’re not already there) and enter :w to save changes.

Step 6: Enter :n to go to the next file.

Step 7: Your buffer is now complete. Press q to leave recording mode.

Step 8: Type :map <char1> @<char2>, where the first character is the macro and the second character is the marker for your buffer (example: :map = @w).

Step 9: Hold down the = key (or whatever key you chose for your macro). The edit will be applied almost instantly to all your files.

This hack is especially useful if you have many web pages that you want to change on a content level. Even if you have hundreds of files, the edit will be applied to all of them in a matter of seconds.

NOTE: Is this hack not working for you? If so, please leave a comment on this blog post, and I will go back and fix whatever mistakes I made. Thanks.