Lighthouse Problems... Solved!
I just wanted to let everyone know that I’m pretty sure we squashed the nasty ‘white-screen of death’ in Lighthouse. It turns out it had to do with the way I had things configured on the server. The great folks at Engineyard helped me track down the error message after pouring through a bunch of log files, calling strace on processes, etc. It’s been fixed, so things should keep running smoothly now.
If you care about a more technical response, it had to do with the way I was rotating logs. I was using the ruby Logger library from a snippet of code I’ve carried with me for awhile. It’s from my days of deploying on Textdrive and getting nasty grams about constantly growing log files. As you may have gathered, I’m used to more experienced system administrators handling the apps I write. Engineyard has been a huge help for me in keeping Lighthouse running this long, with clustered web servers, memcached, background job processing.
One thing that has come out of this, is a better monit configuration. Monit is a tool that keeps tabs on the running application processes and restarts them if needed. We recently added URLs that monit will periodically access directly too to ensure that the application is working fine. Finally, I set up a script to email me when any of the processes die. Even monit wasn’t able to properly detect the various ‘white-screen’ issues we were having…
Sorry, comments are closed for this article.



Discussion
We’ve moved to God instead of monit for a range or reasons, most importantly though it’s flexibility and detailed logging of everything it’s doing. http://god.rubyforge.org
I asked Ezra about it several months ago while monit was being setup, but Engineyard seems to prefer monit. I think they are looking for other solutions though.