24 Million entries and I need to what?

Levi Pearson levipearson at gmail.com
Sat Dec 28 05:46:15 MST 2013


On Sat, Dec 28, 2013 at 4:52 AM, S. Dale Morrey <sdalemorrey at gmail.com> wrote:
> This won't make sense unless you have some background with distributed &
> redundant monitoring setups, but...
>
> Years ago I was working for a company and I wrote an alerting system that
> did a url shortening trick so alerts could be sent over SMS.
> The server needed to be simplistic and resilient with as few entry/attack
> points as possible because this was for a major infrastructure system
> monitoring several thousand machines for dozens of businesses.  Obviously
> the whole system was a prime target for attack.  It would have been a
> disaster to have a SQL injection or some other non-sense go down so we
> ruled out a typical LAMP stack right away.
>
> The solution was to have nagios write the alert acknowledgment URL
> (shortened) to /tmp as a meta-redirect to the actual acknowledgement page
> (which could reside at any one of a dozen or so servers).  We then used a
> purpose built webserver (only responded to get, would only allow a length
> of x characters etc) to serve the page as static HTML, letting the browser
> handle the redirect to the actual acknowledgment page.  Since all alerts
> expire after a set time, and because this was just the acknowledgement URL,
> not the alert itself, having them disappear after a reboot was sort of a
> non-issue.
>
> We had 10's of thousands of shortened URLs stored in /tmp with no problems.
>  Obviously it wasn't millions and they had actual data instead of being the
> result of touch, but the concept is the same.

Sounds like a wise and reasonable solution to the problem. I am not a
huge fan of the LAMP stack, especially when security is paramount or
resources are tight and flexibility isn't a concern.

I'd definitely consider 10's of thousands to be in the realm of the
design scope of the filesystem, though pushing the fringes of it.  But
consider that 20 million is three orders of magnitude larger.

> I've mentioned before that I have major issues with the box I'm on,
> particularly in the filesystem dept.  I'm seriously of a mind that the HD
> may be failing.  I didn't realize that there was any mirroring of /tmp to
> the HD and frankly it worries me that this occurred.  I'm tempted to spin
> up a few different EC2 instances with various filesystems and repeat the
> experiment just to see if they all have the same issue.  Obviously it's not
> what a filesystem is intended for, but one would tend to think that 24
> million files would be a snatch for any modern filesystem to handle.

I wouldn't assume that.  Based on the way that modern applications
like web browsers store their cache files in a shallow hierarchy, I
would guess they've reacted to performance bottlenecks in modern
filesystems with their designs.

While it's possible that you tickled a hardware fault, it is also a
real possibility that you tickled a software scaling bug by pushing
things well beyond normal limits.

> I guess I look at a filesystem differently.  I view any filesystem as
> nothing more than a database of connected blocks which are joined to
> represent files.  I started looking at them this way when I realized that
> rm (or in the case of DOS del), doesn't remove the file itself, it merely
> removes the entry from the file allocation table/tree/whatever.
> Therefore it made sense in my mind to take advantage of what ought to be
> one of the fastest DB's around, i.e. the filesystem to solve my problems.

Well, filesystems are definitely a sort of database, and modern
filesystems are pretty fast at some things, but as millions of files
in a single directory is not a common use case for a filesystem, I
seriously doubt much time has been spent specifically optimizing it in
most filesystems.  I understand ReiserFS was supposed to do
particularly well with many small files in comparison with ext3, but I
haven't really kept track of what ext4 and other filesystems have done
in that area.

Just because something can be thought of as a database doesn't mean
it's going to perform well at some arbitrary data management task,
especially if that task is well outside of the normal operating
parameters of the system.

> Looks like I was wrong and I'm exploring the options.  Thanks for the info
> about the Cuckoo hashes.

You almost certainly don't need to implement one, but they're an
interesting and relatively new idea.  They're somewhat related to
bloom filters, which have been around for a while and give a time- and
space-efficient method of determining (with a small probability of
false positive) set membership. If your normal membership test is slow
(say it requires disk access and some computation) and you expect that
most tests will be negative, you can use a bloom filter as a cheap
first-stage screen, then only run your expensive test on positive
results from the bloom filter. The bloom filter lookups can be
parallelized as well.

        --Levi


More information about the PLUG mailing list