On Mon, Aug 24, 2009 at 10:17 AM, Dan Linder
<dan@linder.org> wrote:
Guys,
I'm looking at rewriting some of the store/retrieve code in a project
I'm working on. The current method uses the Data::Dumper and eval()
code to store data to a hierarchical directory structure on disk.
Over the weekend I all but eliminated the hard-disk overhead by moving
the data to a temporary RAM disk -- sadly, the speed-ups were too
small to notice. This tells me that the overall Linux file-system
caching is working quite well. (Yay!) Unfortunately, this leads me
(again) determine that the Dumper/eval() code is probably the
bottle-neck. (Definately not what they were designed for, but work
remarkably well none the less...)
Eval is more then likely your biggest bottleneck. Dumper not so much, but heavy usage of eval in any language, can create a bottleneck in nothing flat.
So, I started investigating alternatives:
* A true database with client/server model (i.e. MySQL, PostgreSQL, etc)
Use MySQL / PostgreSQL is you are going to have many hits to the Perl script that is going to be executing. It does well with threading, and also solves the problem mentioned below about SQLite.
* An embedded database such as SQLite (others?)
SQLite is a great Database system, for File Based data storage. Unfortunately, it stores in binary, so you can't exactly use grep, vi, etc, etc, to read the contents of the database file. But unlike it's big brother, you can only have one transactional lock (EG Database Open) at a time on a database file. This is to prevent corruption of the data. (And yes, this locks even if your just doing a read query.)
* Continue using the filesystem+directory structure using
freeze()/thaw() from the FreezeThaw CPAN module (speed improvement?)
I dunno if freeze()/thaw() will do any good, as it still comes down to Dumper/eval() to properly store the information.
* Use a DBD module to store/retrieve these files (i.e. DBD::File,
DBD::CSV, etc) (benefit here is that a simple change in the DB setup
code will mean a change from DBD::File to DBD::SQLite or
DBD::PostgreSQL should be fairly short work)
DBD overall, is a great front end for you to use, for database storage, as it gives you a common api across many different DB Backends. If you want consistency, and the ability to test different database storage engines, then I would strongly recommend you use DBD.
Internally I have some constraints:
* We'd like to keep the number of non-core Perl modules down
(currently we're 90% core), and a couple customers are extremely
sensitive to anything that is not supplied by their OS provider
(Solaris and HPUX for example).
This is true in many facets, but you'll find standard that MySQL and SQLite are often the biggest thing that is distributed on most Operating Systems (Aside from Windows, but we won't go there).
* We would also like to keep the files on disk and in a
human-readable form so the end users and support staff can peruse this
data with simple tools (grep, vi, etc).
Again, as stated above, SQLite, and MySQL won't let you use grep, vi, etc, to view the data, but simple tools can be created to create the same effect, and highly optimize it to specific tasks, instead of looking through hundreds of lines of data, to find a specific field.
* The remaining 10% that is non-core Perl modules are local copies of
"pure perl" CPAN modules we've merged into the source code branch
directly. (We do this because the code runs on Solaris/SPARC,
Solaris/x86_64, Linux/x86, Linux/ia64, HPUX/PA-RISC, HPUX/ia64, etc)
My personal pick at the moment is SQLite (it is provided natively in
Solaris 10, and easy to install on Linux platforms), but I question if
the speed up it provides will be over-shadowed by the constant
spawning of the sqlite binary each time an element of data is queried.
(Anyone know if there is a way to leave a persistent copy of SQLite
running in memory that future copies hook into? Getting a bit far
afield from the initial SQLite implementation goals...)
Now, I come to this, after explaining the above to you, and I will be directly to the point. SQLite Binary (or BLOB) data types, while may seem to be huge for data allocation and stuff, is actually quite minimal in overall speed. This especially can be optimized when you need to look at specific data fields, and could care less about the rest. As well with anything else, SQLite does have overheads, but not nearly as much as you might think. It only allocates the data needed to return the results of a SQL query, or insert data into the database.
The SQLite team has put much effort into optimizing the SQLite engine, so that it can store, as well as retrieve data in the most efficient manner possible, and keep the engine fast, and properly working. Many Linux distributions (Ubuntu among most), use SQLite for a large amount of storage within their own system, such as APT/Aptitude/Synaptic. Using SQLite can have it's advantages, but also it's downfalls to. If your wanting to avoid database locking issues, then I suggest MySQL. If your looking for Light weight solution, that is quick, and not so much a worry about Locking issues, then I would suggest SQLite.
Thanks for any insight,
DanL
--
******************* ***************** ************* ***********
******* ***** *** **
"Quis custodiet ipsos custodes?" (Who can watch the watchmen?) -- from
the Satires of Juvenal
"I do not fear computers, I fear the lack of them." -- Isaac Asimov (Author)
** *** ***** ******* *********** ************* *****************
*******************
_______________________________________________
Omaha-pm mailing list
Omaha-pm@pm.org
http://mail.pm.org/mailman/listinfo/omaha-pm