I work in a Software Development organization where designing flexible, easily maintained software is a foremost concern. When it comes to storing settings (name/value, also called key/value, pairs) in a database, defining a table with two columns: Name and Value seemed like a good, flexible design since the database schema doesn’t need to change to add new pairs, even if the key has never been used before. There are several important problems with this approach, however. The alternative, defining a database column for each key to be stored, has much to offer despite the need to update two things (code & schema) instead of just one (code).
First, there is a potentially significant space cost to storing key strings. In fact, in our application the key/value tables are among the largest in the approximately 30 GB databases. These would not be stored at all if the keys were columns. In fact, storing key strings violates a basic tenet of database design: avoid duplication of data. It is possible to re-factor the database so that there is a single list of key values (the Key table), which the Values table then references via a foreign key. But now we start to see the problem with this approach: fetching a key/value requires a join and additional effort by the DB.
Second, there is also cost in terms of query performance. An SQL query to fetch a single setting requires a WHERE clause. This means the db engine has to inspect several rows to find the desired data. Effectively the engine loads all name/value pairs into memory and filters out the ones that don’t match the name we want. By using columns to store keys instead, the database engine has far fewer rows to fetch and no filtering to do. On large tables the performance difference can be staggering. Good indexes can help, but the fundamental problem remains the same: the DBMS must work harder for any query involving name/value pairs.
Third, name/value pairs introduce a usability problem. Database users don’t know the data schema because some fields are hidden in the key values. To know what keys exist at all one must query the database. This adds a whole new layer of implementation complexity to any app that wants to make use of the data.
Fourth, using key/values restrict the type of the values. Keys are usually strings, but values must also share the same data type. Usually this leads to a profusion of strings which are hard to analyze and use lots of space. While it is conceivable to define several value columns (IntValue, FloatValue, DateValue, etc.) it is then up to the user or application using the data to know which field to look in for the correct data.
Fifty, name/value pairs eliminate the possibility to enfore certain types of data integrity. For example, it is impossible to set a NOT NULL constraint on key/value pairs. It is also impossible to represent the fact that some key/value pairs should appear together. Expecting applications to enforce data integrity ignores the whole point of a relational databases: to guarantees reliability and consistency of data. Pushing data integrity enforcement out to the application layer is just plain wrong.
There is a final reason, in our specific case, which is that we don’t want Software Engineers adding fields willy-nilly anyway. Doing so can affect the load on the database and other systems, and it requires changes to other supporting systems. Requiring a schema change helps to ensure the change is justified.
In summary, using name/value pairs to facilitate the occasional addition of a new setting name does not justify the constant problems this practice introduces on the back end.
Furious effort is no substitute for good design.