Heya Dan,
It would help to have a bit of information about the general structure
you are currently using, before giving any ideas about storing the data
in a SQL Database. Obviously your doing Directories and Files, but the
structure helps in determining the way to make things work in the SQL
Database. Poor design of the SQL Database, will lead to poor execution
of SQL instructions, as much more is required to get the data you want.
To give you some idea of how I would convert a file system database into
a SQL Database, I'll give you an example of a file structure database,
and a description of what entry means, then show the resulting database
structure for SQL.
An example, of a simple design using Folders:
db/
cust_records/
cust1_info.dat
cust1_purchase.dat
cust2_info.dat
cust2_purchase.dat
inventory/
item1_info.dat
item2_info.dat
cust#_info.dat contains: Customer Name, Address, Phone Number, Shipping
Info, and such.
cust#_purchase.dat contains: Customer's a record of all the purchases
that a customer has made.
item#_info.dat contains: Name of Item, Description, Price, Quantity,
Shipping price, Shipping Weight.
Now, to convert this into a SQL Database, I would formulate it as such:
db
customers
id - INTEGER, PRIMARY KEY, AUTOINCREMENT
name - STRING
address - STRING
phone - STRING
ship_to - STRING
transactions
id - INTEGER, PRIMARY KEY, AUTOINCREMENT
cust_id - INTEGER, PRIMARY KEY
item_id - INTEGER
quantity - INTEGER
purchased_date - INTEGER
inventory
id - INTEGER, PRIMARY KEY, AUTOINCREMENT
item_name - STRING
description - STRING
price - STRING
total_per_quantity - INTEGER
ship_price - INTEGER
ship_weight - INTEGER
Now some explaining about what the right hand side is all about in the
above layout. The first field in all tables are 'id', which is marked as
INTEGER, PRIMARY KEY, AUTOINCREMENT. Integer denotes a number, of
course, Primary Key tells the SQL engine to make quick look ups based
upon this field being one of the more often checked fields to look up
records in a database. Finally the Auto increment (Which is one word in
SQL), denotes the fact that each new record put into the SQL database,
should take the total number of rows, and add one to that number, to
assign the identification number for this record. And lastly, a String
is a variable length of text data to be stored. Most SQL engines will
allow for 5 or 6 paragraphs worth of text, but this can be expensive in
storage and retrieval. If you know that a field is only going to be so
many characters, such as Phone, maximum being 13 characters, then you
can use VARCHAR(13) as the maximum length of the data that is going to
be stored in that field.
There's also FLOAT, which allows for decimal points, but Integers in
most SQL Engines will take decimal numbers, and keep the decimals. But
it's always best to see what data types a SQL engine supports, before
making a final decision. Most DBI's will automatically provide a way to
store common data types in the database, at their best formulation to
save as much space for the database engine to handle. So look at Perl's
DBI for Constructing Tables to see what assistance it will bring you.
Lastly, one other data type I didn't cover in the above database, is the
BLOB data type. BLOB data types are for storing Binary data in, should
you find the need to store some binary data in the database.
With blobs, there are no conversions done to store the data in the
database, it's stored as is (As in, as you provide it to the SQL
Database), and can contain any valid byte sequence in it. Meaning,
anything between 0 and 255 can be stored here. Most SQL Engines will
store UTF-8/16 characters in strings without stripping them, but when in
doubt, you can use the Blob data type.
Now, with the explanations of the data types out of the way, the
structure is efficiently designed, for the simple fact, that if you have
the ID of the customer, you can get all the items that they purchased,
and get each items information from the id's that you get from the
transaction table. You can even use the SQL Instruction JOIN to get all
the data you need in a single execute, for example, if you wanted to get
the name of the person, the name of the item, and the total cost, you
could simply do:
SELECT name, item_name, price
FROM transactions
JOIN customers
ON customers.id <http://customers.id> = transactions.cust_id
JOIN inventory
ON inventory.id <http://inventory.id> = transactions.item_id;
This will return a list of all transactions in the format of:
name | item_name | price
Examples being:
"John Doe","ASUS PC",299.99
"Mary Johnson", "Microsoft Mouse", 19.99
etc, etc.
It makes cross-table look ups a lot easier, to get the relevant data for
what you need, and only what you need. And all of it is handled by the
SQL Engine, not Perl, or whatever high level language you use, so the
execution speed is greatly improved.
HTH,
Mario
On Fri, Oct 30, 2009 at 1:31 PM, Dan Linder <dan@linder.org
<mailto:dan@linder.org>> wrote:
I'm taking on the task of converting our in-house tool to use the Perl
DBI module to replace the Data::Dumper/eval() it currently uses to
store and retrieve data. Not pretty, but it has worked pretty well
for the small data sets we've been using.
We now have some people commenting on the speed - some have pages take
7+ minutes to bring up waiting for the back-end perl code to ripple
through the directory structure and eval() the necessary files to
build the page. The "eval" function seems to be the bulk of the time
as I expected...
What I'm looking for is some general comments and discussion about the
mental task of mapping these hash tables into a SQL table. I'm not
really looking for a tool, more a high level discussion about ways to
store the data and still remain flexible.
Dan
--
******************* ***************** ************* ***********
******* ***** *** **
"Quis custodiet ipsos custodes?" (Who can watch the watchmen?) -- from
the Satires of Juvenal
"I do not fear computers, I fear the lack of them." -- Isaac Asimov
(Author)
** *** ***** ******* *********** ************* *****************
*******************
_______________________________________________
Omaha-pm mailing list
Omaha-pm@pm.org <mailto:Omaha-pm@pm.org>
http://mail.pm.org/mailman/listinfo/omaha-pm
--
Mario Steele
http://www.trilake.net
http://www.ruby-im.net
http://rubyforge.org/projects/wxruby/
http://rubyforge.org/projects/wxride/
------------------------------------------------------------------------
_______________________________________________
Omaha-pm mailing list
Omaha-pm@pm.org
http://mail.pm.org/mailman/listinfo/omaha-pm