When my file server resided in the Windows environment, I made use of the Everything search engine to index the files and to search for them both locally and through Everything’s built-in web server.
This latter functionality is what I wanted to replicate once I built the ZFS-based FreeBSD file server and moved to it. All UNIX flavours have the locate command, which will use a pre-built database to quickly find a string in file names and paths on your server. So, the obvious solution was to install Apache and PHP and write a web front-end for locate.
An alternative option is to use Solr, a Lucene search engine front-end. I, however, wished to have something simpler and custom-made. This was also a good opportunity to explore a new programming language.
I’ve never written PHP code before, as my main area is ASP.NET and C#, but learning the ropes of PHP was an enjoyable task and it is always good to learn another language. The result can be seen below.
The program will search for an arbitrary string in the locate database, optionally ignoring the case. Another option lets the user to restrict the search to the last segment of the path, thus avoiding flooding with nearly duplicate hits if the string is located only in the directory portion of the path. The program will also highlight the hits.
This is how the simple UI of the program looks like:
Update 1
I’ve added some more desired functionality:
- the program can now update the underlying locate database through the web interface
- it now accepts ‘*’ and ‘?’ wildcards in the search string and highlights the results appropriately
- it can now give direct links to the located content
- it can now search for string containing Unicode charachters
- highlighting is made Lynx-friendly
Forcing database update involves running the update script as root, which will then su as user nobody. Apache (httpd) runs under a limited user www (or suchlike). To overcome this obstacle, I used a solution, suggested in this Stack Overflow thread:
- Modify update.launcher.c (code below) to point to the update script, which is typically located in /etc/periodic/weekly/310.locate
#gcc update.launcher.c -o update.launcher
#chown root update.launcher
#chmod u=rwx,go=xr,+s update.launcher
- Place the program on your server and modify
UPDATE_SCRIPT_LAUNCHER
constant in the program - Verify that
LOCATE_DB_FILE
constant points to the database file, so that the porgram is able to report the state of the database
Remember to change the value in SEARCH_ROOT
constant, which limits the search location range.
If you want the program to display direct links to the located content, perform the following 2 steps:
- Create a symbolic link to the root of your searchable content, as defined in
SEARCH_ROOT
- Update
VIEW_SYMLINK_PREFIX
constant to point to that symlink, relative to web root or relative to the locator.html placement. (If this constant is not defined, the program will not generate any links.)
There are a few caveats and assumptions:
- There is no thorough error checking involved
- Unicode search is always case sensitive
locator.html
File Locator
|
File name database is currently being updated.
Search results may be inaccurate.
'; } $ret = array(); $command = 'locate ' . ($ignoreCase ? '-i "' : '"') . SEARCH_ROOT . '*' . $searchString . '*"'; exec($command, $ret); $word = str_replace(array("?", "*"), array(".", ".+"), $searchString); foreach ($ret as $line) { if($lastSegmentSearch && !foundInLastSegment($line, $word, $ignoreCase)) { continue; } $find = highlight($line, $word, $ignoreCase); if(defined("VIEW_SYMLINK_PREFIX")) { $link = str_replace(SEARCH_STRING, VIEW_SYMLINK_PREFIX, $line); print '[View] '; } print "$find
\n"; } } function showDatabaseState() { if(updateLocatorIsRunning()) { print 'File name database is currently being updated.'; return; } else { clearstatcache(); date_default_timezone_set('UTC'); $dbtime = date("D, d.m.Y, H:i:s", filemtime(LOCATE_DB_FILE)); print 'File name database was last updated on ' . $dbtime . ''; } } function updateDatabase() { if(updateLocatorIsRunning()) { print 'File name database is already being updated!'; return; } $command = UPDATE_SCRIPT_LAUNCHER . " > /dev/null 2>&1 &"; exec($command); sleep(1); if(updateLocatorIsRunning()) { print 'Started updating file name database.'; } else { print 'File name database updator failed to start.'; } } function updateLocatorIsRunning() { $ret = array(); $command = "ps -U nobody -o command"; exec($command, $ret); foreach ($ret as $line) { if(strstr($line, "locate.updatedb")) { return true; } } return false; } function foundInLastSegment($line, $searchString, $ignoreCase) { $search = '/(?=[^\/]+$)' . $searchString . ($ignoreCase ? '/i' : '/'); return preg_match($search, $line); } function highlight($text, $word, $ignoreCase) { return preg_replace("/($word)/U" . ($ignoreCase ? "i" : ""), "$1", $text); } ?>
update.launcher.c
#include
#include
#include
int main (int argc, char *argv[])
{
setuid (0);
system ("/bin/sh /etc/periodic/daily/320.locate");
return 0;
}
Thanks for the post I really appreciate it it was very useful