Kirk Martinez

Architecture of a Technical Life

Key Value Pairs in Database Design

Posted by kmartine622 on October 16, 2009

I work in a Software Development organization where designing flexible, easily maintained software is a foremost concern.  When it comes to storing settings (name/value, also called key/value, pairs) in a database, defining a table with two columns: Name and Value seemed like a good, flexible design since the database schema doesn’t need to change to add new pairs, even if the key has never been used before.  There are several important problems with this approach, however.  The alternative, defining a database column for each key to be stored, has much to offer despite the need to update two things (code & schema) instead of just one (code).

First, there is a potentially significant space cost to storing key strings.  In fact, in our application the key/value tables are among the largest in the approximately 30 GB databases.  These would not be stored at all if the keys were columns.  In fact, storing key strings violates a basic tenet of database design: avoid duplication of data.  It is possible to re-factor the database so that there is a single list of key values (the Key table), which the Values table then references via a foreign key.  But now we start to see the problem with this approach: fetching a key/value requires a join and additional effort by the DB.

Second, there is also cost in terms of query performance.  An SQL query to fetch a single setting requires a WHERE clause.  This means the db engine has to inspect several rows to find the desired data.   Effectively the engine loads all name/value pairs into memory and filters out the ones that don’t match the name we want.  By using columns to store keys instead, the database engine has far fewer rows to fetch and no filtering to do. On large tables the performance difference can be staggering.  Good indexes can help, but the fundamental problem remains the same: the DBMS must work harder for any query involving name/value pairs.

Third, name/value pairs introduce a usability problem.  Database users don’t know the data schema because some fields are hidden in the key values. To know what keys exist at all one must query the database.  This adds a whole new layer of implementation complexity to any app that wants to make use of the data.

Fourth, using key/values restrict the type of the values.  Keys are usually strings, but values must also share the same data type.  Usually this leads to a profusion of strings which are hard to analyze and use lots of space.  While it is conceivable to define several value columns (IntValue, FloatValue, DateValue, etc.) it is then up to the user or application using the data to know which field to look in for the correct data.

Fifty, name/value pairs eliminate the possibility to enfore certain types of data integrity.  For example, it is impossible to set a NOT NULL constraint on key/value pairs.  It is also impossible to represent the fact that some key/value pairs should appear together.  Expecting applications to enforce data integrity ignores the whole point of a relational databases: to guarantees reliability and consistency of data.  Pushing data integrity enforcement out to the application layer is just plain wrong.

There is a final reason, in our specific case, which is that we don’t want Software Engineers adding fields willy-nilly anyway.  Doing so can affect the load on the database and other systems, and it requires changes to other supporting systems.  Requiring a schema change helps to ensure the change is justified.

In summary, using name/value pairs to facilitate the occasional addition of a new setting name does not justify the constant problems this practice introduces on the back end.

Furious effort is no substitute for good design.

Key Value Pairs in Database Design

Posted in General | Leave a Comment »

The Pepper Source

Posted by kmartine622 on October 16, 2009

I’ve planted herbs and tomatoes before, but I’ve always wanted to have a “real” garden. Well, this is the year! Two years ago I installed an 18 foot diameter above-ground pool in our back yard.  I took it down last fall because it didn’t get used in proportion to the amount of work I had to put in to maintain it. Of course, that left a nice circle of dead grass which was a perfect place to start the garden! After digging up half of the circle, my wife wisely told me to stop. Now, weeks after starting, I finally have a brick border, about half a cubic yard of added compost, and a garden that is actually growing stuff!

What’s in the garden?

  • Snowpeas
  • Snap beans (bush)
  • Carrots
  • Tomato (Early Girl, and Sweet 100 Plus)
  • Japanese Eggplant
  • Zucchini
  • Watermelon
  • Green onions
  • Peppers (Red and Yellow Bell, and Jalapeno)
  • White Pumpkin
  • Marigolds
  • Sunflowers
  • Basil

Herb garden in containers:

  • Basil
  • Thyme
  • Cilantro
  • Mint
  • Sage

For more details, take a look at my family site.

Posted in General | Tagged: | Leave a Comment »

Twitter: Too Broadly Scoped?

Posted by kmartine622 on April 5, 2009

Although Twitter is the true inspiration for this idea, I can’t let mine go without a name.  In the following paragraphs you will read about ideas such as Switter (or possibly Alterer).  I hope you find the following compelling, and grant me time to pursue its implementation.

Twitter?

In case you don’t know, Twitter allows messages you broadcast to all those “following” you to be at most 140 characters long.  Everyone’s profile is public, and anyone may follow anyone else.  Twitter, a dynamic process, can be controlled and contained in a number of ways.  Some of these may be usable in business or other settings in which universal broadcast capability is not helpful or desired.

Switter?

Specifically, I’m proposing a Twitter in which only those who can access the Software and IP Engineering intranet can register.  Twitter can be a revolutionary way to communicate across boundaries of time and space, facilitation a kind of conversation that is difficult to sustain in any other medium.  I provide several links to corporate use cases in the references (1,2,3).

Ignoring the life-altering communication paradigm shift that Twitter is, the idea is also useful as a basic, real-time, long-range, broadcast communication system which can be integrated with existing corporate systems very easily.  By allowing employees to register their phone number and receive text messages from other employees (without actually exchanging their phone numbers), the corporate network is extended to be a wireless network with no loss of security.  Employees may also elect to read messages online or receive them via email or even an RSS/ATOM feed which can be integrated into other web pages.  Furthermore, by allowing intranet applications to register accounts and drive feeds, running status updates can be generated, and anyone interested may subscribe.  Applications might include: ATS, build status, document publishing, organizational updates, code warnings, QHM, and administrative updates on any application department-wide.

But facilitating human communication remains the most important objective of this system.  It is hard to describe the perspective-changing effect of being connected to a group of people and data sources you select.  It can allow any employee Worldwide to be chosen as someone nearby to share your environment.  This creates a positive community of support and innovation.  By projecting ideas into the Ethernet, it becomes possible to work more closely as a team; discover things you wouldn’t know to ask, but which can help you a great deal; meet new people through shared interests; get ideas and feedback from a diverse set of followers; provide input and advice to developments of interest; collaborate quickly and spontaneously from anywhere.

How?

In order to create this application, we need to be able to send and receive text messages from user cell phones.  Sending can be done from the intranet, but to receive them we need a direct Internet connection.  We can use software.altera.com for that, but it will have to forward the sweets (which are pure text and not processed in any way) to the intranet server through an SSH tunnel.  Other than that tricky part, all we need are:

A database to track:

  • Users; emails, phones, default preferences & schedules
  • User group memberships (search on # too slow)
  • Rs and Es (graph nodes)
  • Es include optional schedule/preferences
  • User timelines
  • Users (@); can generate groups (#), public

Some web pages:

  • Registration
  • View user (reuse org identity for non-anonymous system)
  • “Sweet” (send)
  • “Follow” (add target to user.E)
  • Timelines: user, group, public in HTML & RSS

And a few tools:

  • tinyurl.com functionality on the intranet
  • Command-line tool to sweet, follow
  • We will want to be able to search sweets (4)

References?

1. http://www.fastforwardblog.com/2008/01/30/trends-to-watch-twitter-in-the-enterprise/
2. http://confusedofcalcutta.com/2007/12/27/thinking-about-push-and-pull-and-twitter-in-the-enterprise/
3. http://onlinefacilitation.wikispaces.com/Twitter+Collaboration+Stories
4. http://www.techcrunch.com/2009/03/05/its-time-to-start-thinking-of-twitter-as-a-search-engine/

Posted in General | Leave a Comment »

Web 2.0 Expo

Posted by kmartine622 on April 4, 2009

I took the day off work Friday to visit the Web 2.0 Expo (and get a much-needed haircut!).  I was mainly looking for a way to address some immediate business needs at my company.  As I mentioned before, Sharepoint is being pushed hard from the top down, although my personal experiences with it leave much to be desired.  Many of the vendors at the Web 2.0 Expo told me many of their customers are ex-Sharepoint users because “it just doesn’t do what we want it to.”  I’ve argued on my corporate blog about the need for something better, but I haven’t been able to suggest very many compelling alternatives, especially when management is concerned about the risks of adopting startup-type products.  Sharepoint is seen as a fully-featured, Microsoft-integrated Web 2.0 solution, but the fact is, it’s mostly low-quality, propriety tool with little competition (and therefore little incentive for MS to improve it).  I think my best chance is to argue for simple, pointed solutions to tangible problems.  This can guarantee business value while avoiding the dangers of being locked in to a sub-standard tool.

The immediate business need we face is a way for users to easily share, collaborate on, and publish documents instead of emailing them around which puts a strain on our email infrastrcuture and leads to low visibility (docs aren’t publictly visible or stored long term) and fragile collaboration (by necessity one person can be a bottleneck, and it’s usually the guy who doesn’t keep up with his email).  Ideally, the solution we’d like would integrate with Outlook (and OWA), allow people to attach a file as they do today, but when the email is sent, the attachment is published to a web repository and the email included just a link.  Security and collaborative tools could be built into the web repository tool.  I didn’t find exactly this solution at the Expo, but I will be looking more closely at several companies.   Not just for this problem, but also others that looked relevant to my job:

Vignette – a pretty big company with a wide set of offerrings, their Intranet Collaboration tools might be the best fit for what I’m looking for.

qtask – “The transparency you seek.  The accountability you need.  The results you demand.”  It’s an all-in-one collaboration solution which integrates with email.  Sales guy I spoke to didn’t explain very well how it might solve my problem, but the literature seemed promising.

mindtouch – “social enterprise collaboration”

kapow -”boutique web design firm” – we are thinking about contracting out the visual spiffing-up of our intranet.

TamTamy – “Enterprise social network”

JackBe – executive mashups

yola – “stop browsing, start building” – web site builder

caspio – “Do-It-Yourself” database apps

Amazon web services – I’ve known about this for years, and when I build my first Web 2.0 venture you can bet Amazon will be hosting it for me!

There are also a few products I’d like to have just for my own personal benefit!

SmartMotion Technology’s SwingChair – I sat in this baby with the full ergo-workstation setup and didn’t want to leave!  The armrests that extend from the sides of the keyboard are a cheap alternative to the whole setup (the chairs are about $500).  Really nice, though.  They adjust to your butt so your lumbar is always supported appropriately even as you lean back or forward.

Safari Books Online – for the cost of a technical book ($35) each month, you can have full text access to all the books Safari has collected.  This is awesome for programmers like me!  One can often find info on the web, but being able to search a more authoritative (reviwed, editied, market-driven) data source is definitely valuable.  I’ll be signing up for their free 15-day trial…

Posted in General | 2 Comments »

Mashups

Posted by kmartine622 on March 20, 2009

“If you do the reasoning for people, they don’t have to work it out.  They just have to tune in — or as we say, #followfriday.”

- Kirk Martinez

Yeah, we’ve all seen those map mashups all the time.

But what if the map was a projection of a hyperdimensional space, illuminated with the valued set of data to mash up?  That’s just the thing, though.  What data is valuable to mash up?  What purpose are we serving, exactly?  In my software engineering organization, for example, we’re really focused on the core product and eliminating what isn’t essential for survival…  We just need to get stuff done as quickly and cheaply as possible.  Trouble is, observing the system from outside is both long and costly, but it’s the only way to ensure we stay on track in a larger sense.

Where’s the payoff?  Making engineers more efficient.

We provide tools to facilitate the generation, testing, maintaining and debugging of code.  What can we mash up?

  • Version control data (adds, deletes, a tree of code files)
  • Build data (success rates, code dependencies, run-time dependencies)
  • File access data
  • Regression testing data (official and un-official)
  • Software error rates in the field
  • Customer complaints
  • Customer activity, tool use
  • More version control data (in-branch edits)
  • Stack trace analysis
  • Reported customer defects fixed how quickly?

There are plenty of data points, it’s just a question of using them to a given aim.  At this point, it’s also a question of weighing the relative difficulting of extracting said information.  Let’s see if I can rank them…

Posted in General | Leave a Comment »

Twitter

Posted by kmartine622 on March 15, 2009

Wow.  We really do live in a changed world.  Twitter reduces our six degrees of separation down to zero.  It is now possible for me to directly contact anyone in the world that I know about, probably within the hour.  Better still, the tastiest tidbits of the Internet pantry are served up daily by those who share my interests.  At a high level, aggregate twitter traffic seems to be all about cultural (or sub-cultural) memes, but there are many rich sub-cultures with their own micro-memes shared (RT) within and across sub-networks.  Some good articles on Twitter:

Who?

What?

Where?

Why?

How?

Posted in ITM 198a | Tagged: | Leave a Comment »

ITM 198a Week 2: Chapters 2 & 3

Posted by kmartine622 on March 13, 2009

Reading about the need for top-down control in public wikis (Casarez, et. al, 2009, p. 39) I was reminded of a character from the movie I watched with my kids the other day, City of Ember. Mankind, on the brink of self-annhialation builds a city far underground and locks the inhabitants up for 200 years. Every year the children are assigned jobs, and the energetic hero (who wants to work on the generator because it keeps failing) ends up in pipe-works. There, he meets an old, narcoleptic pipe-worker. Whenever our hero asks him what this or that is for, he always says he doesn’t know, repeating, “that’s not my job.” Of course, at the climax of the movie, when the massive turbine is about to spin out of control destroying the city and all hope of salvation, the old man climbs on top of it, opens a small hatch, and inserts a tool to keep the turbine moving. When the hero asks him, “how do you know you’re supposed to do that?” the old man replies, “it’s my job!” I see the same kind of attitude — or perhaps more of a human or cultural characteristic — at work where people will update the wiki related to their job areas, but shy away from larger changes that cut across other areas of responsibility. While it hasn’t been necessary to set up controls, there has certainly been a need for a small group to take charge of the overall organization of the wiki. There are also other practical concerns to achieve an effective wiki: consistent content layouts (templates) are helpful and usage guidelines can improve usability. One issue we’ve seen is that WikiWords: words with no whitespace between them which automatically become links, are not understood by our search engines. It is therefore a recommended practice to put the properly spaced title in the wiki page so it can be indexed.

Our book has mentioned WYSIWYG editors a couple times now, and I just have to chime in. In my experience WYSIWYG is not workable. Wikis allow users to put a variety of codes (wiki-specific and HTML) in their pages, and WYSIWYG editors simply can’t handle most of these codes. While this is partly a technical problem, in fact, to unleash the full power of wiki editing one simply must invest a bit of time to learn the wiki syntax. One cannot even say, “beginners can use WYSIWYG” because if they edit other people’s pages with a WYSIWYG editor, they will as best be confused, and at worst corrupt the page beyond recognition. Wiki syntax is not that hard (on par with HTML, I’d say), and it’s a small price to pay to take part in the Great Conversation.

In the software engineering department where I work, there is clear value in connecting together the various aspects of our work. We need to unify: code, documentation, designs, help, history, use, plans, problems, projects, and teams. Thus far, URLs and the wiki have been the most effective way to make these kinds of connections, but we are still a long way from the goal of having them all effectively connected. We need to enhance our intranet to include all those elements in a collaborative framework that will empower developers and users to bring about the goal of unifying our operations. The key is, as the book says (p. 54), “making it overwhelmingly simple for users to participate.” Since the =immediate= return to users is nil, it needs to cost nothing. The cumulative gain, however, is of great value.

On page 59 our intrepid authors state, “enterprises are and must be controlled ecosystems with a clearly defined and easily identifiable number of contributors and sources.” I boldly disagree. The authors want to have it both ways: leave enterprise control fully intact while still leveraging the power of crowds. But if an organization is really committed to its user base, they should be willing to listen and adapt, and in fact =be led= by their users. The authors state as much just pages before. The fact is enterprises must be willing to GIVE UP some control by allowing the crowd to have a say. In the end, everyone will be better off. And corporate leaders that don’t believe this won’t be corporate leaders much longer because people will naturally support businesses that give them a voice, and those that don’t will, in due course, fail. The authors may not “argue for a popular uprising and revolution of the workers against The Man”, but the revolution has already come.

References:

Casarez, et. al. (2009). Reshaping Your Business with Web 2.0. McGraw Hill.

Posted in ITM 198a | Leave a Comment »

It’s official…I’m a blogger!

Posted by kmartine622 on March 12, 2009

The only blogging I’ve done before was on my family home page, which unfortunately I can’t link to anymore because I’ve let my .Mac account lapse.  Yeah, times are tough.  I’ve got to look into hosting it somewhere else…for free.  Anyway, this post is just a shout out to my homies in ITM198a at GGU.  You can also find me on LinkedIn and Facebook.

I’m not sure how I’m going to find time to blog much.  I’m kept quite busy by my two daughters (5 and 8), demanding wife, job, school, and eternally unfinished projects…  Despite all that, there are plenty of times when I don’t feel like doing anything productive, so I guess I’ll just use that time to rant here.   Hope it isn’t too boring for you.  :-)

Posted in ITM 198a | 4 Comments »