File locator web front-end in PHP

When my file server resided in the Windows environment, I made use of the Everything search engine to index the files and to search for them both locally and through Everything’s built-in web server.

This latter functionality is what I wanted to replicate once I built the ZFS-based FreeBSD file server and moved to it. All UNIX flavours have the locate command, which will use a pre-built database to quickly find a string in file names and paths on your server. So, the obvious solution was to install Apache and PHP and write a web front-end for locate.

An alternative option is to use Solr, a Lucene search engine front-end. I, however, wished to have something simpler and custom-made. This was also a good opportunity to explore a new programming language.

I’ve never written PHP code before, as my main area is ASP.NET and C#, but learning the ropes of PHP was an enjoyable task and it is always good to learn another language. The result can be seen below.

The program will search for an arbitrary string in the locate database, optionally ignoring the case. Another option lets the user to restrict the search to the last segment of the path, thus avoiding flooding with nearly duplicate hits if the string is located only in the directory portion of the path. The program will also highlight the hits.

This is how the simple UI of the program looks like:

Update 1

I’ve added some more desired functionality:

  • the program can now update the underlying locate database through the web interface
  • it now accepts ‘*’ and ‘?’ wildcards in the search string and highlights the results appropriately
  • it can now give direct links to the located content
  • it can now search for string containing Unicode charachters
  • highlighting is made Lynx-friendly

Forcing database update involves running the update script as root, which will then su as user nobody. Apache (httpd) runs under a limited user www (or suchlike). To overcome this obstacle, I used a solution, suggested in this Stack Overflow thread:

  1. Modify update.launcher.c (code below) to point to the update script, which is typically located in /etc/periodic/weekly/310.locate
  2. #gcc update.launcher.c -o update.launcher
  3. #chown root update.launcher
  4. #chmod u=rwx,go=xr,+s update.launcher
  5. Place the program on your server and modify UPDATE_SCRIPT_LAUNCHER constant in the program
  6. Verify that LOCATE_DB_FILE constant points to the database file, so that the porgram is able to report the state of the database

Remember to change the value in SEARCH_ROOT constant, which limits the search location range.

If you want the program to display direct links to the located content, perform the following 2 steps:

  1. Create a symbolic link to the root of your searchable content, as defined in SEARCH_ROOT
  2. Update VIEW_SYMLINK_PREFIX constant to point to that symlink, relative to web root or relative to the locator.html placement. (If this constant is not defined, the program will not generate any links.)

There are a few caveats and assumptions:

  • There is no thorough error checking involved
  • Unicode search is always case sensitive

locator.html

Download







File Locator



Search for: (wildcards * and ? are allowed)
/> Ignore case
/> Search in last segment only



File name database is currently being updated.
Search results may be inaccurate.

'; } $ret = array(); $command = 'locate ' . ($ignoreCase ? '-i "' : '"') . SEARCH_ROOT . '*' . $searchString . '*"'; exec($command, $ret); $word = str_replace(array("?", "*"), array(".", ".+"), $searchString); foreach ($ret as $line) { if($lastSegmentSearch && !foundInLastSegment($line, $word, $ignoreCase)) { continue; } $find = highlight($line, $word, $ignoreCase); if(defined("VIEW_SYMLINK_PREFIX")) { $link = str_replace(SEARCH_STRING, VIEW_SYMLINK_PREFIX, $line); print '[View] '; } print "$find
\n"; } } function showDatabaseState() { if(updateLocatorIsRunning()) { print 'File name database is currently being updated.'; return; } else { clearstatcache(); date_default_timezone_set('UTC'); $dbtime = date("D, d.m.Y, H:i:s", filemtime(LOCATE_DB_FILE)); print 'File name database was last updated on ' . $dbtime . ''; } } function updateDatabase() { if(updateLocatorIsRunning()) { print 'File name database is already being updated!'; return; } $command = UPDATE_SCRIPT_LAUNCHER . " > /dev/null 2>&1 &"; exec($command); sleep(1); if(updateLocatorIsRunning()) { print 'Started updating file name database.'; } else { print 'File name database updator failed to start.'; } } function updateLocatorIsRunning() { $ret = array(); $command = "ps -U nobody -o command"; exec($command, $ret); foreach ($ret as $line) { if(strstr($line, "locate.updatedb")) { return true; } } return false; } function foundInLastSegment($line, $searchString, $ignoreCase) { $search = '/(?=[^\/]+$)' . $searchString . ($ignoreCase ? '/i' : '/'); return preg_match($search, $line); } function highlight($text, $word, $ignoreCase) { return preg_replace("/($word)/U" . ($ignoreCase ? "i" : ""), "$1", $text); } ?>

update.launcher.c

Download

#include 
#include 
#include 

int main (int argc, char *argv[])
{
    setuid (0);
    system ("/bin/sh /etc/periodic/daily/320.locate");
    return 0;
}

RegEx to match a substring after a delimiter

They say that if you have a problem and want to use RegEx to solve it, then you have two problems. So true! 🙂

My specific problem was that I wanted to search for a string within a substring after a delimiter sign, more precisely, in the last segment of a path. Here is an example:

/some/test_path/to/search/with_a_Test_file.txt

The RegEx, searching without case sensitivity for “test” should return a match only for the portion of the string after the last “/”.
All suggestions, which I could find on StackOverflow, concerned with matching the entire file name and not a portion of it, so I had to learn some advanced RegEx. Fast.

The answer was something, called “lookahead”, which is well explained at Regular-Expressions.info site.

The resulting RegExt string looks like some serious swearing in a cartoon bubble… 🙂 Here is the code, which is accepted by PHP’s preg_match() function:

/(?=[^\/]+$)test/i

According to my (rather limited) understanding of RegEx, the first portion in the parenthesis, after the “?=”is the lookahead, which matched the entire file name after the last “/”, then comes the search substring, “test”, which operates on that result and, finally, “/i” is the switch, instructing a case-insensitive match.