Filesystem as B-tree database?

Hello,

I have a system, which produces many (1000+) files. Now, they are all put
into the same directory,
so access to that directory becomes very slow (directory entries have to be
scaned sequentially).

I am thinking about writing a resource manager, that would look like a
directory, but it would
have entries stored in b-trees. Before I start, I would like to check if
somebody hasn’t done it already.

Thanks,

Pavol Kycina

If the files are classifiable, it might make more sense in terms of time
spent to just use the filesystem as your tree. If you could get a fairly
good distribution of filenames from a-z, you can cut the search time to
about 1/26 and if you were to create a tree using the first two letters,
even more. It’s probably a little easier to code quickly (but perhaps not
as much fun).

cheers,

Kris

Pavol Kycina <kycina@microstep-hdo.sk> wrote:

Hello,

I have a system, which produces many (1000+) files. Now, they are all put
into the same directory,
so access to that directory becomes very slow (directory entries have to be
scaned sequentially).

I am thinking about writing a resource manager, that would look like a
directory, but it would
have entries stored in b-trees. Before I start, I would like to check if
somebody hasn’t done it already.

Thanks,

Pavol Kycina



Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“Computer science is no more about computers than astronomy is about telescopes”
–E.W.Dijkstra

Hi Pavol

I had such a product for QNX4. I will be porting it to Nto but i am
currently porting a different data base product to Nto.

Just how urgent/import is this to you?

Bill Caroselli

“Pavol Kycina” <kycina@microstep-hdo.sk> wrote in message
news:3bcaa626$1@asrpx.mshdo

Hello,

I have a system, which produces many (1000+) files. Now, they are all put
into the same directory,
so access to that directory becomes very slow (directory entries have to
be
scaned sequentially).

I am thinking about writing a resource manager, that would look like a
directory, but it would
have entries stored in b-trees. Before I start, I would like to check if
somebody hasn’t done it already.

Thanks,

Pavol Kycina
\

Rob Krten had a prog fo QNX 4 that was designed to handle this sort
of situation. If he doesn’t pipe up here, check his web site
http://www.parse.com We had a similar situation we resolved by
using the regularity of the file names (all numeric). We used
directory trees, where each level had 100 sublevels (two digits),
and wrote functions to split the filename into chunks to figure
where it goes. Access is quick enough, but we are still talking
about thousands of files, so doing a pruning or tape recovery
takes forever. I think Rob’s approach would be better.

Richard

Pavol Kycina wrote:

Hello,

I have a system, which produces many (1000+) files. Now, they are all put
into the same directory,
so access to that directory becomes very slow (directory entries have to be
scaned sequentially).

I am thinking about writing a resource manager, that would look like a
directory, but it would
have entries stored in b-trees. Before I start, I would like to check if
somebody hasn’t done it already.

Thanks,

Pavol Kycina

You might also be able to use a hashing algorithm to create your
directories. Arbitrarily decide on a number of buckets and then just do
hashing with a linear probe in whatever directory you get. If you hashed
with a hundred directories, you’d only have to search through ~10 files in
each dir (for 1000 files).

Or is that just too wacky?

cheers,

Kris

Richard R. Kramer <rrkramer@kramer-smilko.com> wrote:

Rob Krten had a prog fo QNX 4 that was designed to handle this sort
of situation. If he doesn’t pipe up here, check his web site
http://www.parse.com > We had a similar situation we resolved by
using the regularity of the file names (all numeric). We used
directory trees, where each level had 100 sublevels (two digits),
and wrote functions to split the filename into chunks to figure
where it goes. Access is quick enough, but we are still talking
about thousands of files, so doing a pruning or tape recovery
takes forever. I think Rob’s approach would be better.

Richard

Pavol Kycina wrote:

Hello,

I have a system, which produces many (1000+) files. Now, they are all put
into the same directory,
so access to that directory becomes very slow (directory entries have to be
scaned sequentially).

I am thinking about writing a resource manager, that would look like a
directory, but it would
have entries stored in b-trees. Before I start, I would like to check if
somebody hasn’t done it already.

Thanks,

Pavol Kycina


Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“Computer science is no more about computers than astronomy is about telescopes”
–E.W.Dijkstra

Richard R. Kramer <rrkramer@kramer-smilko.com> wrote:

Rob Krten had a prog fo QNX 4 that was designed to handle this sort
of situation. If he doesn’t pipe up here, check his web site
http://www.parse.com > We had a similar situation we resolved by
using the regularity of the file names (all numeric). We used
directory trees, where each level had 100 sublevels (two digits),
and wrote functions to split the filename into chunks to figure
where it goes. Access is quick enough, but we are still talking
about thousands of files, so doing a pruning or tape recovery
takes forever. I think Rob’s approach would be better.

Well, then, I’ll pipe up :slight_smile:

There are two programs that are potential solutions here, depending on
the type of access you’re doing.

Both are documented in Doctor Dobb’s Journal:

Arbitrary Text Retrieval (DDJ November 1995)
Improving USENET News Performance (DDJ May 1996)

The arbitrary text retrieval one (as presented) simply makes a multi-level
1/N tree of the directory structure. I use this in the reverse direction
for reverse telephone number lookup – a telephone number, such as +16135998316
gets stored as /data/phoneDB/1/6/1/3/5/9/9, which is a flat ASCII file that
contains up to 10000 telephone number entries. That’s the one that I think
Richard is referring to.

The second one is a resource manager (under QNX 4) that manifests a virtual
filesystem out of a series of expiry-ordered heap files. It’s a database
management exercise mostly.

Lemme know if you need more information; or as Richard said, visit
www.parse.com and click on the “Free Stuff” to download the source (at least
for the VFNews one, I don’t know if the arbitrary text one has been released).

Cheers,
-RK


Richard

Pavol Kycina wrote:

Hello,

I have a system, which produces many (1000+) files. Now, they are all put
into the same directory,
so access to that directory becomes very slow (directory entries have to be
scaned sequentially).

I am thinking about writing a resource manager, that would look like a
directory, but it would
have entries stored in b-trees. Before I start, I would like to check if
somebody hasn’t done it already.

Thanks,

Pavol Kycina


Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Consulting and Training at www.parse.com

Thanks to all for the valuable input. Now I am rather busy, so this project
must wait.

Thanks,

Pavol Kycina


<nospam94@parse.com> wrote in message news:9qmsoc$hbf$2@inn.qnx.com

Richard R. Kramer <> rrkramer@kramer-smilko.com> > wrote:
Rob Krten had a prog fo QNX 4 that was designed to handle this sort
of situation. If he doesn’t pipe up here, check his web site
http://www.parse.com > We had a similar situation we resolved by
using the regularity of the file names (all numeric). We used
directory trees, where each level had 100 sublevels (two digits),
and wrote functions to split the filename into chunks to figure
where it goes. Access is quick enough, but we are still talking
about thousands of files, so doing a pruning or tape recovery
takes forever. I think Rob’s approach would be better.

Well, then, I’ll pipe up > :slight_smile:

There are two programs that are potential solutions here, depending on
the type of access you’re doing.

Both are documented in Doctor Dobb’s Journal:

Arbitrary Text Retrieval (DDJ November 1995)
Improving USENET News Performance (DDJ May 1996)

The arbitrary text retrieval one (as presented) simply makes a multi-level
1/N tree of the directory structure. I use this in the reverse direction
for reverse telephone number lookup – a telephone number, such as
+16135998316
gets stored as /data/phoneDB/1/6/1/3/5/9/9, which is a flat ASCII file
that
contains up to 10000 telephone number entries. That’s the one that I
think
Richard is referring to.

The second one is a resource manager (under QNX 4) that manifests a
virtual
filesystem out of a series of expiry-ordered heap files. It’s a database
management exercise mostly.

Lemme know if you need more information; or as Richard said, visit
www.parse.com > and click on the “Free Stuff” to download the source (at
least
for the VFNews one, I don’t know if the arbitrary text one has been
released).

Cheers,
-RK


Richard

Pavol Kycina wrote:

Hello,

I have a system, which produces many (1000+) files. Now, they are all
put
into the same directory,
so access to that directory becomes very slow (directory entries have
to be
scaned sequentially).

I am thinking about writing a resource manager, that would look like a
directory, but it would
have entries stored in b-trees. Before I start, I would like to check
if
somebody hasn’t done it already.

Thanks,

Pavol Kycina


Robert Krten, PARSE Software Devices +1 613 599 8316.
Realtime Systems Architecture, Consulting and Training at > www.parse.com