File locator web front-end in PHP

Posted on July 18, 2012 by Stanislav

Reading time: 3 minutes

When my file server resided in the Windows environment, I made use of the Everything search engine to index the files and to search for them both locally and through Everything’s built-in web server.

This latter functionality is what I wanted to replicate once I built the ZFS-based FreeBSD file server and moved to it. All UNIX flavours have the locate command, which will use a pre-built database to quickly find a string in file names and paths on your server. So, the obvious solution was to install Apache and PHP and write a web front-end for locate.

An alternative option is to use Solr, a Lucene search engine front-end. I, however, wished to have something simpler and custom-made. This was also a good opportunity to explore a new programming language.

I’ve never written PHP code before, as my main area is ASP.NET and C#, but learning the ropes of PHP was an enjoyable task and it is always good to learn another language. The result can be seen below.

The program will search for an arbitrary string in the locate database, optionally ignoring the case. Another option lets the user to restrict the search to the last segment of the path, thus avoiding flooding with nearly duplicate hits if the string is located only in the directory portion of the path. The program will also highlight the hits.

This is how the simple UI of the program looks like:

Update 1

I’ve added some more desired functionality:

the program can now update the underlying locate database through the web interface
it now accepts ‘*’ and ‘?’ wildcards in the search string and highlights the results appropriately
it can now give direct links to the located content
it can now search for string containing Unicode charachters
highlighting is made Lynx-friendly

Forcing database update involves running the update script as root, which will then su as user nobody. Apache (httpd) runs under a limited user www (or suchlike). To overcome this obstacle, I used a solution, suggested in this Stack Overflow thread:

Modify update.launcher.c (code below) to point to the update script, which is typically located in /etc/periodic/weekly/310.locate
#gcc update.launcher.c -o update.launcher
#chown root update.launcher
#chmod u=rwx,go=xr,+s update.launcher
Place the program on your server and modify UPDATE_SCRIPT_LAUNCHER constant in the program
Verify that LOCATE_DB_FILE constant points to the database file, so that the porgram is able to report the state of the database

Remember to change the value in SEARCH_ROOT constant, which limits the search location range.

If you want the program to display direct links to the located content, perform the following 2 steps:

Create a symbolic link to the root of your searchable content, as defined in SEARCH_ROOT
Update VIEW_SYMLINK_PREFIX constant to point to that symlink, relative to web root or relative to the locator.html placement. (If this constant is not defined, the program will not generate any links.)

There are a few caveats and assumptions:

There is no thorough error checking involved
Unicode search is always case sensitive

locator.html

Download







File Locator






Search for: 
 
(wildcards * and ? are allowed)

 /> 
Ignore case

 /> 
Search in last segment only

 











File name database is currently being updated.
Search results may be inaccurate.

';
    }

    $ret = array();
    $command = 'locate ' . ($ignoreCase ? '-i "' : '"') . 
		SEARCH_ROOT . '*' . $searchString . '*"';

    exec($command, $ret);

    $word = str_replace(array("?", "*"), array(".", ".+"), $searchString);

    foreach ($ret as $line)
    {
	if($lastSegmentSearch && !foundInLastSegment($line, $word, $ignoreCase))
	{
	    continue;
	}
	
	$find = highlight($line, $word, $ignoreCase);
	if(defined("VIEW_SYMLINK_PREFIX"))
	{
	    $link = str_replace(SEARCH_STRING, VIEW_SYMLINK_PREFIX, $line);
	    print '[View] ';
	}
	print "$find
\n";
    }
}

function showDatabaseState()
{
    if(updateLocatorIsRunning())
    {
	print 'File name database is currently being updated.';
	return;
    }
    else
    {
	clearstatcache();
	date_default_timezone_set('UTC');
	$dbtime = date("D, d.m.Y, H:i:s", filemtime(LOCATE_DB_FILE));
	print 'File name database was last updated on ' .
		$dbtime . '';
    }
}

function updateDatabase()
{
    if(updateLocatorIsRunning())
    {
	print 'File name database is already being updated!';
	return;
    }
    $command = UPDATE_SCRIPT_LAUNCHER . " > /dev/null 2>&1 &";
    exec($command);
    sleep(1);

    if(updateLocatorIsRunning())
    {
	print 'Started updating file name database.';
    }
    else
    {
	print 'File name database updator failed to start.';
    }
}

function updateLocatorIsRunning()
{
    $ret = array();
    $command = "ps -U nobody -o command";
    exec($command, $ret);
    foreach ($ret as $line)
    {
	if(strstr($line, "locate.updatedb"))
	{
	    return true;
	}
    }
    return false;
}

function foundInLastSegment($line, $searchString, $ignoreCase)
{
    $search = '/(?=[^\/]+$)' . $searchString . ($ignoreCase ? '/i' : '/');
    return preg_match($search, $line);
}

function highlight($text, $word, $ignoreCase)
{
    return preg_replace("/($word)/U" . ($ignoreCase ? "i" : ""),
                        "$1",
                         $text);
}
?>

update.launcher.c

Download

#include 
#include 
#include 

int main (int argc, char *argv[])
{
    setuid (0);
    system ("/bin/sh /etc/periodic/daily/320.locate");
    return 0;
}

RegEx to match a substring after a delimiter

Posted on July 17, 2012 by Stanislav

Reading time: < 1 minute

They say that if you have a problem and want to use RegEx to solve it, then you have two problems. So true! 🙂

My specific problem was that I wanted to search for a string within a substring after a delimiter sign, more precisely, in the last segment of a path. Here is an example:
/some/test_path/to/search/with_a_Test_file.txt
The RegEx, searching without case sensitivity for “test” should return a match only for the portion of the string after the last “/”.
All suggestions, which I could find on StackOverflow, concerned with matching the entire file name and not a portion of it, so I had to learn some advanced RegEx. Fast.

The answer was something, called “lookahead”, which is well explained at Regular-Expressions.info site.

The resulting RegExt string looks like some serious swearing in a cartoon bubble… 🙂 Here is the code, which is accepted by PHP’s preg_match() function:
/(?=[^\/]+$)test/i
According to my (rather limited) understanding of RegEx, the first portion in the parenthesis, after the “?=”is the lookahead, which matched the entire file name after the last “/”, then comes the search substring, “test”, which operates on that result and, finally, “/i” is the switch, instructing a case-insensitive match.

Adding disks by label in ZFS and making those labels stick around

Posted on July 17, 2012 by Stanislav

Reading time: 2 minutes

When I stared building my new file server, I decided to add the disks to ZFS vdevs by label and not by the device id, i.e:
#glabel label l1 /dev/ada0 #glabel label l2 /dev/ada1
After a reboot, those labelled disks suddenly started to show up as /dev/ada0 and /dev/ada1 again and the labels disappeared from /dev/label directory.

For the existing disks, I tried to offline each disk in turn and re-label it. A new problem turned up then: I could not replace the /dev/adaX offlined disks with the same labelled ones, as zpool gave an error of the device “is part of active pool”.

After some further searching, I found out that I had to zero out the first and the last megabyte of the disk before labelling it and replacing in zpool:
#dd if=/dev/zero of=/dev/ada0 bs=1m count=1 #dmesg | grep ada0 <read the block count value, subtract 2048 and provide the result to the seek switch below> #dd if=/dev/zero of=/dev/ada0 seek=358746954 #glabel label l1 /dev/ada0 #zpool replace zstore /dev/ada0 label/l1
At this point zpool status was again showing labels. However, after the next reboot, the labels were gone again and I was pretty frustrated. Back to the search engine.

On page 3 of some discussion of this matter, I noticed two additional steps, which should fix the problem. After performing the steps above and re-labelling and re-placing the disks, I issued:
#zpool export zstore #zpool import -d /dev/label zstore
The -d switch is what instructs zpool to read the disk references from a specific directory and it makes the labels stick around.

When I added subsequent new disks to the pool, I followed these steps to make the labels stick and to avoid re-labelling at a later point:

Zero-out the first and the last part of each disk that will comprise the new vdev (especially important if the disk has been in use before and does not come staight from the factory)
Label each disk with glabel
#zpool add zstore raidz label/l5 label/l6 etc….
#zpool export zstore
#zpool import -d /dev/label zstore

And the labels never disappeared again.

This same procedure can be applied to labelling your ZIL and LARC devices.

Reporting correct space usage for Samba shared ZFS volumes

Posted on July 10, 2012 by Stanislav

Reading time: 2 minutes

ZFS is all the rage now and there are lots of tutorials and how-to’s out there covering most of the topics. There is one issue, for which I could not find any ready solution. When sharing a zfs volume over Samba, Windows would report incorrect total volume size. More precisely, Windows would always show the same size for both total size and free size and both values will be changing as the volume gets used.
This is obviously not what we want. Some digging uncovered that Samba relies internally on the result of the df program, which will report incorrect values for ZFS systems. More digging lead to this page and to the man pages of smb.conf, showing that it is possible to override space usage detection behaviour by creating a custom script and pointing Samba server to it using the following entry in smb.conf:

[global]

	dfree command = /usr/local/bin/dfree

The following bash script is where the magic lies (tested on FreeBSD):

#!/bin/sh

CUR_PATH=`pwd`

let USED=`zfs get -o value -Hp used $CUR_PATH` / 1024 > /dev/null
let AVAIL=`zfs get -o value -Hp available $CUR_PATH` / 1024 > /dev/null

let TOTAL = $USED + $AVAIL > /dev/null

echo $TOTAL $AVAIL

And the following is a variation, which works on Linux (courtesy commenter nem):

#!/bin/bash

CUR_PATH=`pwd`

USED=$((`zfs get -o value -Hp used $CUR_PATH` / 1024)) > /dev/null
AVAIL=$((`zfs get -o value -Hp available $CUR_PATH` / 1024)) > /dev/null

TOTAL=$(($USED+$AVAIL)) > /dev/null

echo $TOTAL $AVAIL

Make sure to check the comments section, as several variations of this script are posted there, for example taking account for both ZFS and non-ZFS shares on the same system!

I can’t use zpool list as it reports the total size for the pool, including parity disks, so the total size might be greater than the real usable total size.
zfs list could have been used if there was a way to display the information in bytes and not in human-readable form of varying granularity.
The solution was to use zfs get and then normalise the values reported to Samba to the 1024 byte blocks. (I tried providing the third, optional, parameter of 1 as mentioned in the man pages, but Samba seemed to have trouble parsing really large byte values, so I ended up doing the normalisation in the script).

Also, I can’t rely on the $1 input parameter to the script, as it turned out to always be equal to ‘.’, which is usable for df, but not for zfs. This ‘.’ lead me to check the working directory of the invocation and, bingo, it turned out to be the root path of the requested volume, so I could simply get the value from pwd and pass it to zfs.

Ambassador Mikhail Petrakov’s article “Ukrainian Conflict Before February 2022: Facing the Facts” (27 July 2026) – Reblog
Published on the site of Embassy of the Russian Federation to Australia with an article on Vibe Media. 27 July is a […]
French neo-Vichy regime persecutes humanitarian aid workers of “SOS Donbass”
‘Undermining democracy’: War reporter calls out ‘French justice’ and ‘scandalous’ Novikova case French […]
The Unknown Cold War. Film 4. Secret battles. An RT documentary
This episode explores how the Baltic region became one of the Cold War’s first testing grounds. Backup at Rumble. Even […]
Donbass Is Behind Us – The unofficial anthem of Donbass
This song, “Donbass Is Behind Us”, dedicated to the 77th anniversary of liberating Donbass from the fascist […]
“Onwards, Comrades!”
“Onwards, comrades!” is a Chinese cartoon capturing the essence of the last days of the USSR and what is to […]
“Crimean Gold: The ‘Civilized’ Europe Loots Russian Treasures” – An exhibition on how the Netherlands stole Crimean Scythian gold
“Crimean Gold: The ‘Civilized’ Europe Loots Russian Treasures” — the exhibition prepared by the Russian Military […]
The Hasty Withdrawal of the Soviet Troops from the GDR and the Warsaw Pact Countries. The Consequences.
Before you is an account of the withdrawal of Group of Soviet Forces in Germany. In a 2025 interview to State TV and […]
35 Years Without the Union – memories of the bygone time in the GDR
As part of the project “35 Years without the USSR”, corresponded Georgy Zotov visited Germany, looking for […]
“The USSR was the Sun.” Interview with the last Secretary General of the GDR, Egon Krenz, by Georgy Zotov
As part of the project “35 Years without the USSR” (#ZotovUSSR35), Georgy Zotov interviewed the last […]
The Unknown Cold War. Film 3. The Abduction of Europe… and the world. An RT documentary
This film looks into the key events that kicked off the Cold War. Backup at Rumble. After the Second World War, US […]
By their death they death averted. Remembering June 22, 1941
The following article was written by Nikolai Dolgopolov and published in “Rossijskaya Gazeta” on the 85th […]
Ukraine to erect a monument to traitor Ivan Mazepa. As the heroes so is the state.
A monument to the traitorous hetman Ivan Mazepa will be erected in Kiev on the place where monument to Lenin once […]
Alfred Rosenberg — The Failed Coloniser of the East. A documentary by Aleksey Denisov, 2021
Alfred Rosenberg is one of the most sinister figures of the Third Reich. It is believed that he is the author of the […]
When the War Is at the Doorstep. Interview with Nikolai Patrushev
Interview of the Assistant to the President of Russia, Chairman of the Maritime Board of the Russian Federation […]
Zionist Israel murders conservationist Mona Khalil
Imagine for a second reading the news that Israel, in a cold-bloodied strike murdered David Attenborough or Gerald […]
The Unknown Cold War. Film 2. The Truman Delay. An RT Documentary
This film looks at the final months of the Second World War and shows how Harry Truman’s presidency changed the dynamic […]
85 years later, Germany is once again preparing to bomb Russia
An article by political scientist Vladimir Kornilov for RIA Novosti on June 22, 2026, and translated by us for our […]
Two years with Expressen – 586 stories about Putin, zero about the genocide
In 2024, Kamal El Salim did an extensive research into the front pages of the Swedish daily “Expressen, and […]
NATO: Beyond Law, Beyond Morality. An RT Documentary. With Soviet caricatures
The film traces the history of NATO since its creation in 1949, allegedly to “ensure the collective security of its […]
Article by Sergey Lavrov «Ukraine, Europe and Global Security», 19 June 2026. Reblog.
This article by Sergey Lavrov was initially planned to be published in the Brussels-based “Politico-Europe”, which is […]