File locator web front-end in PHP

Posted on July 18, 2012 by Stanislav

Reading time: 3 minutes

When my file server resided in the Windows environment, I made use of the Everything search engine to index the files and to search for them both locally and through Everything’s built-in web server.

This latter functionality is what I wanted to replicate once I built the ZFS-based FreeBSD file server and moved to it. All UNIX flavours have the locate command, which will use a pre-built database to quickly find a string in file names and paths on your server. So, the obvious solution was to install Apache and PHP and write a web front-end for locate.

An alternative option is to use Solr, a Lucene search engine front-end. I, however, wished to have something simpler and custom-made. This was also a good opportunity to explore a new programming language.

I’ve never written PHP code before, as my main area is ASP.NET and C#, but learning the ropes of PHP was an enjoyable task and it is always good to learn another language. The result can be seen below.

The program will search for an arbitrary string in the locate database, optionally ignoring the case. Another option lets the user to restrict the search to the last segment of the path, thus avoiding flooding with nearly duplicate hits if the string is located only in the directory portion of the path. The program will also highlight the hits.

This is how the simple UI of the program looks like:

Update 1

I’ve added some more desired functionality:

the program can now update the underlying locate database through the web interface
it now accepts ‘*’ and ‘?’ wildcards in the search string and highlights the results appropriately
it can now give direct links to the located content
it can now search for string containing Unicode charachters
highlighting is made Lynx-friendly

Forcing database update involves running the update script as root, which will then su as user nobody. Apache (httpd) runs under a limited user www (or suchlike). To overcome this obstacle, I used a solution, suggested in this Stack Overflow thread:

Modify update.launcher.c (code below) to point to the update script, which is typically located in /etc/periodic/weekly/310.locate
#gcc update.launcher.c -o update.launcher
#chown root update.launcher
#chmod u=rwx,go=xr,+s update.launcher
Place the program on your server and modify UPDATE_SCRIPT_LAUNCHER constant in the program
Verify that LOCATE_DB_FILE constant points to the database file, so that the porgram is able to report the state of the database

Remember to change the value in SEARCH_ROOT constant, which limits the search location range.

If you want the program to display direct links to the located content, perform the following 2 steps:

Create a symbolic link to the root of your searchable content, as defined in SEARCH_ROOT
Update VIEW_SYMLINK_PREFIX constant to point to that symlink, relative to web root or relative to the locator.html placement. (If this constant is not defined, the program will not generate any links.)

There are a few caveats and assumptions:

There is no thorough error checking involved
Unicode search is always case sensitive

locator.html

Download







File Locator






Search for: 
 
(wildcards * and ? are allowed)

 /> 
Ignore case

 /> 
Search in last segment only

 











File name database is currently being updated.
Search results may be inaccurate.

';
    }

    $ret = array();
    $command = 'locate ' . ($ignoreCase ? '-i "' : '"') . 
		SEARCH_ROOT . '*' . $searchString . '*"';

    exec($command, $ret);

    $word = str_replace(array("?", "*"), array(".", ".+"), $searchString);

    foreach ($ret as $line)
    {
	if($lastSegmentSearch && !foundInLastSegment($line, $word, $ignoreCase))
	{
	    continue;
	}
	
	$find = highlight($line, $word, $ignoreCase);
	if(defined("VIEW_SYMLINK_PREFIX"))
	{
	    $link = str_replace(SEARCH_STRING, VIEW_SYMLINK_PREFIX, $line);
	    print '[View] ';
	}
	print "$find
\n";
    }
}

function showDatabaseState()
{
    if(updateLocatorIsRunning())
    {
	print 'File name database is currently being updated.';
	return;
    }
    else
    {
	clearstatcache();
	date_default_timezone_set('UTC');
	$dbtime = date("D, d.m.Y, H:i:s", filemtime(LOCATE_DB_FILE));
	print 'File name database was last updated on ' .
		$dbtime . '';
    }
}

function updateDatabase()
{
    if(updateLocatorIsRunning())
    {
	print 'File name database is already being updated!';
	return;
    }
    $command = UPDATE_SCRIPT_LAUNCHER . " > /dev/null 2>&1 &";
    exec($command);
    sleep(1);

    if(updateLocatorIsRunning())
    {
	print 'Started updating file name database.';
    }
    else
    {
	print 'File name database updator failed to start.';
    }
}

function updateLocatorIsRunning()
{
    $ret = array();
    $command = "ps -U nobody -o command";
    exec($command, $ret);
    foreach ($ret as $line)
    {
	if(strstr($line, "locate.updatedb"))
	{
	    return true;
	}
    }
    return false;
}

function foundInLastSegment($line, $searchString, $ignoreCase)
{
    $search = '/(?=[^\/]+$)' . $searchString . ($ignoreCase ? '/i' : '/');
    return preg_match($search, $line);
}

function highlight($text, $word, $ignoreCase)
{
    return preg_replace("/($word)/U" . ($ignoreCase ? "i" : ""),
                        "$1",
                         $text);
}
?>

update.launcher.c

Download

#include 
#include 
#include 

int main (int argc, char *argv[])
{
    setuid (0);
    system ("/bin/sh /etc/periodic/daily/320.locate");
    return 0;
}

RegEx to match a substring after a delimiter

Posted on July 17, 2012 by Stanislav

Reading time: < 1 minute

They say that if you have a problem and want to use RegEx to solve it, then you have two problems. So true! 🙂

My specific problem was that I wanted to search for a string within a substring after a delimiter sign, more precisely, in the last segment of a path. Here is an example:
/some/test_path/to/search/with_a_Test_file.txt
The RegEx, searching without case sensitivity for “test” should return a match only for the portion of the string after the last “/”.
All suggestions, which I could find on StackOverflow, concerned with matching the entire file name and not a portion of it, so I had to learn some advanced RegEx. Fast.

The answer was something, called “lookahead”, which is well explained at Regular-Expressions.info site.

The resulting RegExt string looks like some serious swearing in a cartoon bubble… 🙂 Here is the code, which is accepted by PHP’s preg_match() function:
/(?=[^\/]+$)test/i
According to my (rather limited) understanding of RegEx, the first portion in the parenthesis, after the “?=”is the lookahead, which matched the entire file name after the last “/”, then comes the search substring, “test”, which operates on that result and, finally, “/i” is the switch, instructing a case-insensitive match.

Adding disks by label in ZFS and making those labels stick around

Posted on July 17, 2012 by Stanislav

Reading time: 2 minutes

When I stared building my new file server, I decided to add the disks to ZFS vdevs by label and not by the device id, i.e:
#glabel label l1 /dev/ada0 #glabel label l2 /dev/ada1
After a reboot, those labelled disks suddenly started to show up as /dev/ada0 and /dev/ada1 again and the labels disappeared from /dev/label directory.

For the existing disks, I tried to offline each disk in turn and re-label it. A new problem turned up then: I could not replace the /dev/adaX offlined disks with the same labelled ones, as zpool gave an error of the device “is part of active pool”.

After some further searching, I found out that I had to zero out the first and the last megabyte of the disk before labelling it and replacing in zpool:
#dd if=/dev/zero of=/dev/ada0 bs=1m count=1 #dmesg | grep ada0 <read the block count value, subtract 2048 and provide the result to the seek switch below> #dd if=/dev/zero of=/dev/ada0 seek=358746954 #glabel label l1 /dev/ada0 #zpool replace zstore /dev/ada0 label/l1
At this point zpool status was again showing labels. However, after the next reboot, the labels were gone again and I was pretty frustrated. Back to the search engine.

On page 3 of some discussion of this matter, I noticed two additional steps, which should fix the problem. After performing the steps above and re-labelling and re-placing the disks, I issued:
#zpool export zstore #zpool import -d /dev/label zstore
The -d switch is what instructs zpool to read the disk references from a specific directory and it makes the labels stick around.

When I added subsequent new disks to the pool, I followed these steps to make the labels stick and to avoid re-labelling at a later point:

Zero-out the first and the last part of each disk that will comprise the new vdev (especially important if the disk has been in use before and does not come staight from the factory)
Label each disk with glabel
#zpool add zstore raidz label/l5 label/l6 etc….
#zpool export zstore
#zpool import -d /dev/label zstore

And the labels never disappeared again.

This same procedure can be applied to labelling your ZIL and LARC devices.

Reporting correct space usage for Samba shared ZFS volumes

Posted on July 10, 2012 by Stanislav

Reading time: 2 minutes

ZFS is all the rage now and there are lots of tutorials and how-to’s out there covering most of the topics. There is one issue, for which I could not find any ready solution. When sharing a zfs volume over Samba, Windows would report incorrect total volume size. More precisely, Windows would always show the same size for both total size and free size and both values will be changing as the volume gets used.
This is obviously not what we want. Some digging uncovered that Samba relies internally on the result of the df program, which will report incorrect values for ZFS systems. More digging lead to this page and to the man pages of smb.conf, showing that it is possible to override space usage detection behaviour by creating a custom script and pointing Samba server to it using the following entry in smb.conf:

[global]

	dfree command = /usr/local/bin/dfree

The following bash script is where the magic lies (tested on FreeBSD):

#!/bin/sh

CUR_PATH=`pwd`

let USED=`zfs get -o value -Hp used $CUR_PATH` / 1024 > /dev/null
let AVAIL=`zfs get -o value -Hp available $CUR_PATH` / 1024 > /dev/null

let TOTAL = $USED + $AVAIL > /dev/null

echo $TOTAL $AVAIL

And the following is a variation, which works on Linux (courtesy commenter nem):

#!/bin/bash

CUR_PATH=`pwd`

USED=$((`zfs get -o value -Hp used $CUR_PATH` / 1024)) > /dev/null
AVAIL=$((`zfs get -o value -Hp available $CUR_PATH` / 1024)) > /dev/null

TOTAL=$(($USED+$AVAIL)) > /dev/null

echo $TOTAL $AVAIL

Make sure to check the comments section, as several variations of this script are posted there, for example taking account for both ZFS and non-ZFS shares on the same system!

I can’t use zpool list as it reports the total size for the pool, including parity disks, so the total size might be greater than the real usable total size.
zfs list could have been used if there was a way to display the information in bytes and not in human-readable form of varying granularity.
The solution was to use zfs get and then normalise the values reported to Samba to the 1024 byte blocks. (I tried providing the third, optional, parameter of 1 as mentioned in the man pages, but Samba seemed to have trouble parsing really large byte values, so I ended up doing the normalisation in the script).

Also, I can’t rely on the $1 input parameter to the script, as it turned out to always be equal to ‘.’, which is usable for df, but not for zfs. This ‘.’ lead me to check the working directory of the invocation and, bingo, it turned out to be the root path of the requested volume, so I could simply get the value from pwd and pass it to zfs.

Ukrainian terrorist attack on Kherson on January 1, 2026
In the first hours of 2026, during the New Year celebrations the Kiev regime committed yet another terrorist act […]
Happy New Year wishes for 2026
🎄 Happy New Year wishes for 2026 to all our readers and friends, from Beorn, The Shieldmaiden and Dzerzhinsky […]
U.S. Army Gen. Christopher Cavoli has “a very big Russia problem”, July 18, 2024
“Honorary European” – as the panel host called him, U.S. Army Gen. Christopher Cavoli: “We are going […]
The Forgotten Victory Parade of the Allies on September 7, 1945
The allied forces of the Anti-Hitler Coalition held a parade in honour of the end of the Second World War. Parade taken […]
The feat of a Russian partisan: how to single-handedly blow up 600 Wehrmacht soldiers
While we recall the manifestations of Nazism – past and present – let us not forget those, who fought […]
Organisation “British Union of Fascists”
Following, is a short publication from the “Two Majors” Telegram channel. Read also How the Anglo-Saxons […]
The Nazi Roots of Today’s European Union
An article by Pål Steigan from October 21, 2025, translated by us from Norwegian. There are clear similarities between […]
If you think the collapse of the Soviet Union was good for the people, you should think again
In the previous publication we saw how Yeltsin was conquering America, on his warpath to destroy the Soviet past. But […]
Yeltsin declared communism defeated. Forgetting to clarify that there were still Communists left in his own country. June 20, 1992
Such was the title of the “Independent Newspaper” on June 20, 1992, telling about Yeltisn’s speech at […]
Forgotten History – The Moscow Negotiations of 1939
On September 30, we remembered the 1938 agreement between Britain, France, Italy and Germany to dismember and abandon […]
Facts about the Munich Conspiracy of September 30, 1938
The material is from Russian MFA Telegram channel, where one can also watch a short facta newsreel. After a short […]
On Kiev’s plans to conduct a false flag operation in Romania and Poland
A commentary by Maria Zaharova: Today, several Hungarian media outlets reported on Zelensky’s plans to carry […]

Beorn's Beehive

About A Little Bit of Everything, but Mostly About the Truth

Monthly Archives: July 2012