balancing priorities
Friday, June 29th, 2007there are 4 main things i take in to consideration when designing and coding a (piece of a) system. every decision is an evaluation and maximization of those factors.
- correctness
- performance
- flexibility
- low-maintenance
correctness
there’s a functional bar above which you must be to ride. if you don’t meet that you’ve made a piece of shit. that much is straightforward, but above that bar is a different story. there’s completely 100% functionally correct and if you’re building the flight controller for the space shuttle or a 777 then you probably need to aim for it. otherwise, you can probably trade off more than you would think. in a system that serves up ads no one is going to die if you serve an ad that is supposed to get 1000 impressions 1001 times. someone might lose a penny and someone else might get a free placement, but that’s not the end of the world. prototypes are another good example and most all startups should really consider themselves in the prototyping stage for a lot longer than you would think. with a prototype it doesn’t matter if you drop a few request every once in a while and the user gets an error or has to try again. people won’t put up with shoddy work, but they will accept a few issues now and then. is it really worth the extra week of time it will take to get your (stealth mode) dog dating google mashup to be completely bug free or would you be better off moving on to new features. data duplication is another good example of a correctness trade off that sometimes has to be made (often in order to scale or span geographic space.) i’ve met a lot of people who were unable to let go of correctness (i call it the phd problem. not because it’s limited to phd’s, but because a majority of them seem to have it.) takeaway: sometimes you need to trade unnecessary correctness for gains in in other areas.
performance
performance just doesn’t matter 90% of the time. if you know how to do it it’s pretty easy and can be achieved cheaply, but if you don’t your in trouble b/c you’ll spend enormous efforts for little gains by optimizing the wrong things. and you probably don’t need to anyway. if performance is a consideration/requirement the key is simplicity. (a concept that i will likely repeat in anything i write and in fact will repeat in the following two categories.) performance starts with the db schema, it just about ends there as well and if it wasn’t for api (in the web service world) it would. i will return to performance and give as much as i can share about i in future posts, but for now i’m going to confine myself to saying that it’s importance id defined completely on the current and short-medium term needs of the system. if you need scale now or will in the next few months take it in to consideration, trade other things for it. if you need scale in six months then start to think about it and plan for it, but don’t build it, just know how you’re going to. if you think you need to scale at a point further out than that then don’t bother, give away performance in trade for overall simplicity.
flexibility
startups survive on it and it makes life orders of magnitude more bearable. build exactly what you need to do what you’re trying to do today and nothing more, but in a way that you can extend it minimally to do whatever it is you end up needing to do tomorrow. this is hard. don’t implement the functionality you’re sure you’ll need in a couple weeks b/c you will need it, but you won’t know exactly what it will look like and you’ll get it wrong if you do it now. the best advice i can give is to plan for and think about every way you can see the system going, but don’t build any of it. just keep it in mind when you build only what you immediately need. if you get this right the payoff will be getting the new features working in a couple days rather than a few weeks. there’s no magic here, you just have to do it over and over and learn from you mistakes. simple things are easier to extend. minimal/efficent apis allow you more freedom to change around the internals without bothering/changing the clients. using generic structures (maps) in the api or on the database (serialized or secondary table.) so long as you don’t need to search or sort by it things like this will let you add new information to the call and/or db without any changes to the system. producers and consumers of the data need to know about the changes but the intermediate system doesn’t. prevent the system from caring about the details of how it’s used and it won’t have to change when it needs to be used in a new way.
low-maintenance
low maintenance starts with flexibility, but it doesn’t end there. the more complex your solution the more that can and will go wrong with it. when presented with the choice of a system that monitors itself, brings nodes in and out of the cluster, and gets them up to speed and a dirt simple caching system like memcached go with simple. chances are that it will go down less often than the complex system gets in to broken states that it can’t self recover from. this isn’t just limited to solutions you pick, it also applies to stuff you build. the more complicated it is, the more moving parts the more it will go wrong. it might be built to handle all of the failure modes, but it won’t happen, period. let things fail, work around the failures or better yet don’t care about them. the best solutions simply relies on redundant components. at most they fail over to a hot/cold standby, nothing more complicated than that. the situations where the complexity is preferable are few and far between. if your service doesn’t care about how it’s used you won’t have to change it when you get that new use case that needs to be implemented by the end of the week. the code changes you need to make might only take you a couple hours to do, but if you add in the changes required to clients, which may also need to be done for unrelated systems it ads up. on top of that the release, build, and test processes as well as the coordination required to do a synchronized push of several dependent pieces will quickly eat up time and turn what should be a couple days work for a new feature in to a week or so, …
…
finding the optimal relative importance of the four factors and thus making the correct design decisions is a black art, one that i have worked on mastering, but will never complete. it’s the type of thing that i look back at what i did yesterday and can’t stand it, would do it differently now that i’ve learned more… while i’ve talked about the 4 pieces in the context of systems they apply at all levels and in fact building good solutions requires thinking in these terms at every level. i’ll hopefully come back and revisit these priorities in the near future in other settings. if you have questions ask away.