[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Omaha.pm] Mapping Perl structures to a SQL table...



Here are some thought:

I have found that when designing a database one needs to be at peace with the flexibility / performance balance. (Come to think of it isn't that why we use Perl? You can write faster code with C but you can write code faster with Perl)

If you want high flexibility you might use a Entity-Attribute-Value model. But eventually you come to the point where performance requirements force you to think in more ridged terms. You will then apply performance based design principles within the context of the information domain.

Since you are converting a file system database to a DBMS structure, you already have some structure you can exploit. Files, Directories (which are files), file attributes (fixed number of attributes and values). The real flexibility comes when you need to store the contents of the file. But then again how much performance do you need/ Your performance comes in when you are trying find and map the structure. Once you find the file base on the attributes you then will display it.

I would start with three tables:
1. Files
2. FileAttributes
2. FileContent

Files would have the following columns
  1. Id
  2. ParentId ( the id of the directory this file is in )
  3. Name
4... All your permission attributes like user read permissions, group permission
  5... All you file attributes like creation date time, mod date time, size

Then you second table would contain non indexed attributes of the file or directory. Category, Subject, Author, status

Then your third table would be the actual file contents.

You would then use the DBMS to provide performance structure (indexes, constraints, aggregation, transactions, journaling)








Mario Steele wrote:
Heya Dan,

It would help to have a bit of information about the general structure you are currently using, before giving any ideas about storing the data in a SQL Database. Obviously your doing Directories and Files, but the structure helps in determining the way to make things work in the SQL Database. Poor design of the SQL Database, will lead to poor execution of SQL instructions, as much more is required to get the data you want.

To give you some idea of how I would convert a file system database into a SQL Database, I'll give you an example of a file structure database, and a description of what entry means, then show the resulting database structure for SQL.
An example, of a simple design using Folders:

db/
  cust_records/
    cust1_info.dat
    cust1_purchase.dat
    cust2_info.dat
    cust2_purchase.dat
  inventory/
    item1_info.dat
    item2_info.dat

cust#_info.dat contains: Customer Name, Address, Phone Number, Shipping Info, and such. cust#_purchase.dat contains: Customer's a record of all the purchases that a customer has made.

item#_info.dat contains: Name of Item, Description, Price, Quantity, Shipping price, Shipping Weight.

Now, to convert this into a SQL Database, I would formulate it as such:

db
  customers
    id                   - INTEGER, PRIMARY KEY, AUTOINCREMENT
    name                 - STRING
    address              - STRING
    phone                - STRING
    ship_to              - STRING
  transactions
    id                   - INTEGER, PRIMARY KEY, AUTOINCREMENT
    cust_id              - INTEGER, PRIMARY KEY
    item_id              - INTEGER
    quantity             - INTEGER
    purchased_date       - INTEGER
  inventory
    id                   - INTEGER, PRIMARY KEY, AUTOINCREMENT
    item_name            - STRING
    description          - STRING
    price                - STRING
    total_per_quantity   - INTEGER
    ship_price           - INTEGER
    ship_weight          - INTEGER

Now some explaining about what the right hand side is all about in the above layout. The first field in all tables are 'id', which is marked as INTEGER, PRIMARY KEY, AUTOINCREMENT. Integer denotes a number, of course, Primary Key tells the SQL engine to make quick look ups based upon this field being one of the more often checked fields to look up records in a database. Finally the Auto increment (Which is one word in SQL), denotes the fact that each new record put into the SQL database, should take the total number of rows, and add one to that number, to assign the identification number for this record. And lastly, a String is a variable length of text data to be stored. Most SQL engines will allow for 5 or 6 paragraphs worth of text, but this can be expensive in storage and retrieval. If you know that a field is only going to be so many characters, such as Phone, maximum being 13 characters, then you can use VARCHAR(13) as the maximum length of the data that is going to be stored in that field.

There's also FLOAT, which allows for decimal points, but Integers in most SQL Engines will take decimal numbers, and keep the decimals. But it's always best to see what data types a SQL engine supports, before making a final decision. Most DBI's will automatically provide a way to store common data types in the database, at their best formulation to save as much space for the database engine to handle. So look at Perl's DBI for Constructing Tables to see what assistance it will bring you. Lastly, one other data type I didn't cover in the above database, is the BLOB data type. BLOB data types are for storing Binary data in, should you find the need to store some binary data in the database.

With blobs, there are no conversions done to store the data in the database, it's stored as is (As in, as you provide it to the SQL Database), and can contain any valid byte sequence in it. Meaning, anything between 0 and 255 can be stored here. Most SQL Engines will store UTF-8/16 characters in strings without stripping them, but when in doubt, you can use the Blob data type.

Now, with the explanations of the data types out of the way, the structure is efficiently designed, for the simple fact, that if you have the ID of the customer, you can get all the items that they purchased, and get each items information from the id's that you get from the transaction table. You can even use the SQL Instruction JOIN to get all the data you need in a single execute, for example, if you wanted to get the name of the person, the name of the item, and the total cost, you could simply do:

SELECT name, item_name, price
FROM transactions
JOIN customers
  ON customers.id <http://customers.id> = transactions.cust_id
JOIN inventory
  ON inventory.id <http://inventory.id> = transactions.item_id;

This will return a list of all transactions in the format of:
name | item_name | price

Examples being:

"John Doe","ASUS PC",299.99
"Mary Johnson", "Microsoft Mouse", 19.99
etc, etc.

It makes cross-table look ups a lot easier, to get the relevant data for what you need, and only what you need. And all of it is handled by the SQL Engine, not Perl, or whatever high level language you use, so the execution speed is greatly improved.

HTH,

Mario


On Fri, Oct 30, 2009 at 1:31 PM, Dan Linder <dan@linder.org <mailto:dan@linder.org>> wrote:

    I'm taking on the task of converting our in-house tool to use the Perl
    DBI module to replace the Data::Dumper/eval() it currently uses to
    store and retrieve data.  Not pretty, but it has worked pretty well
    for the small data sets we've been using.

    We now have some people commenting on the speed - some have pages take
    7+ minutes to bring up waiting for the back-end perl code to ripple
    through the directory structure and eval() the necessary files to
    build the page.  The "eval" function seems to be the bulk of the time
    as I expected...

    What I'm looking for is some general comments and discussion about the
    mental task of mapping these hash tables into a SQL table.  I'm not
    really looking for a tool, more a high level discussion about ways to
    store the data and still remain flexible.

    Dan

    --
    ******************* ***************** ************* ***********
    ******* ***** *** **
    "Quis custodiet ipsos custodes?" (Who can watch the watchmen?) -- from
    the Satires of Juvenal
    "I do not fear computers, I fear the lack of them." -- Isaac Asimov
    (Author)
    ** *** ***** ******* *********** ************* *****************
    *******************
    _______________________________________________
    Omaha-pm mailing list
    Omaha-pm@pm.org <mailto:Omaha-pm@pm.org>
    http://mail.pm.org/mailman/listinfo/omaha-pm




--
Mario Steele
http://www.trilake.net
http://www.ruby-im.net
http://rubyforge.org/projects/wxruby/
http://rubyforge.org/projects/wxride/


------------------------------------------------------------------------

_______________________________________________
Omaha-pm mailing list
Omaha-pm@pm.org
http://mail.pm.org/mailman/listinfo/omaha-pm