All our web servers went down at the same time. We restarted the servers and re-opened the site. After a while the web servers’ load jumped up until they went down again. This process repeated itself during the 12-hour down period.
What caused it
We have built redundancy into all our hardwares. In fact, we just added a backup load balancer (the machine that directs traffic to different web servers) at the beginning of the week. The chance that the hardware of all the web servers failed together was extremely low. That, along with other signs we were seeing, ruled out hardware problem. It wasn’t the traffic either. The incoming traffic pattern was not unlike the norm.
That left us with the programs. But it was difficult to further narrow the search. We looked at the heavy-duty programs like search indexing and the similarity engine. We spent hours looking but found nothing. Finally, tipped off by two unrelated books that strangely share the same number of readers, we found a book that, as a result of a number of merging, is claimed to have over 100,000 editions(!) Certain programs that touched this book duly went nuts. We fixed it and re-opened the site an hour after we found the culprit.
What now
We are reviewing the book merge process at the moment. The merging function is now temporarily off-line until we’ve added some ways to prevent this from happening.
A while ago we were contacted by the team that behind this interesting software called “Barcode Monster”. What it does is that it lets you scan barcodes with a standard web cam.
This is by no means a new attempt. We ourselves have tried half a dozen softwares that aim to do the same. Yet none matched our standards for speed and accuracy. So we never recommended any to aNobii members.
When we tried the beta version of Barcode Monster, we were simply blown away. Wow.
Speed: their demo video is no exaggeration. It is indeed as snappy as the video suggests.
Accuracy: we’ve used it to scan every single book we have in our office, and Barcode Monster gave the correct result every single time (not counting the time with a worn-out barcode sticker).
Our experience with it the last couple weeks has been excellent. And we’d like to hear more about what people think before we officially endorse it. Got a web cam? It’ll be great if you download the free trial version, test it out, and let us know what you think. Does it work for your web cam? Any issues you’ve come across? Write to us at contact@anobii.com. We’ll report the findings in a week.
Happy new year!
* A note from the Barcode Monster team: the software requires sharp images. Since you’ll be scanning up close, you’ll need to adjust the focus manually. If your web cam does not support manual focus, Barcode Monster may not work
Many of you should have found out that on aNobii, it was much easier to find a book by ISBN than by keywords. To put it mildly, our keyword search ranged from not very good to barely functional, depending on the language you used in the search.
We are happy to announce that this shall no longer be.
A while ago, we decided to rebuild the search from ground-up. The goals we set for ourselves:
Faster
The old one was slow, slow, slow, especially if you type in a lot of words, or if the words are rather common
More integrated
In the old one, you had to choose to search in one language only. And that often frustrated those who were multi-lingual. Must fix.
More comprehensive
When our internal search results in no matches, you should be able to do a web search right away
Recent search
You should be able to see and click on your most recent search terms
The other day a friend pointed at the “beta” sign next to our logo and asked when we’d move beyond that. The embarrassing truth is we’ve long gotten used to the beta sign and pushed it to the back of our minds.
Normally, you won’t want to wait till your new app is perfect to launch (see ship crap -> improve). So there’s a phase in which your app is public but crappy in some non-core areas. That’s when a beta sign is useful, to communicate a more accurate expectation and warn people ahead of likely and unpleasant surprises (told-ya!).
The thing is, once we put that sign up, we have little incentive to take it down. Why remove the shield that seemingly protects you from harsh criticism? Let’s wait a bit. Nobody gets hurt by the sign anyway …, so the thinking went.
And then someday, we just started to forget about the sign altogether … until a curious friend pointed out.
Today, a week after the 7,000,000th book have been added, we are officially moving out of beta. Hurray!
We’ve updated the shelf pages. One change is what happens when you go from one page to another via the “Next” or “Previous” button. In the old way, the whole page would refresh. Now, only the shelf part will change. The rest simply stays still.
This way of changing content is commonly called “Ajax”. The spirit is that only the parts that need to change will change, resulting in lighter load and speedier browsing.
But there’s a downside. Browsers are not designed for Ajax use. If you don’t refresh the whole page, the back (and forward) button will not work, along with history and bookmark. That’s a BIG downside, considering that the back button among the most used browser function.
There are hacks for fixing this. We surveyed a few and settled with HistoryManager. Problem is, the javascript framework we use is Mootools 1.2 and HM will only work in 1.1. We could wait, but the thought of a broken back button on aNobii was simply too painful to bear. So we opened HM up and did the update ourselves.
Thanks to volunteering efforts around the world, aNobii now supports 12 interface languages, with a couple more to come. Big thanks to every involved!
The current translation platform has a couple drawbacks:
1. In the translation spreadsheet, the un-translated texts are mixed with translated ones, making them difficult to spot.
2. It’s difficult to know which text is used in which page. In other words, texts that are no longer used will remain, making the translation process longer.
3. HTML tags are embedded into the texts. For example, to display this:See all books on wishlistThe text to be translated would be something like:See all books on <a href=”person_wishlist.php?all_d=1&pid=35&”>wishlist</a>
Problem is, if the stuff inside the <a> tag is copied wrongly during translation, the link will not work properly for that particular language, and the mistake will be difficult to debug.
To fix them, we are now taking off our translation platform for an upgrade. It’ll be back in a couple days. Stay tuned!
The issue described in the previous post has now been resolved. No more NFS, no more ugly errors its malfunction ensued. And as a side-benefit, pages now load quicker. Horray!
At around 7:00pm GMT November 21, our website became inaccessible for 1 hour. Again, we are deeply sorry for the downtime. What happened was that our Network File system became unmountable and needed a reboot.The majority of data are stored in the database. The rest are caches of temporary data in the form of files.When you have one web server, you can put all those files into that server’s local file system. When you have more than one, you need to put the files where all the web servers can access.One of the easier ways is to use a Network File system (NFS). With NFS, the web servers will the folders on the shared file server as if they are local.The biggest drawback of NFS is that, when something goes wrong, and the shared file server became unmountable, the web servers will stop, as if there’s something wrong with their “real” local file systems. In short, it drags everyone down.To make things worse, with this type of error, we cannot display anything friendlier than the discomfortingly-worded “Service unavailable : Internal error : 132″ page.We have been moving away from NFS recently but there are still some older stuffs that we haven’t converted. Rest assured we’ll be speeding up to achieve independence from NFS these couple days.
At around 5:50pm GMT November 14, our website became inaccessible for 8 hours. We are deeply sorry for the downtime.
Here’s what happened:
We upgraded PHP in our servers
The upgrade made our service unavailable
We failed to notice that until 1:30am. We reversed the upgrade in 15 minutes and the site went back on
What should be a minor issue turned into a major downtime (8 long hours!) simply because we were not alerted to it quickly enough. Definitely not something to be proud of. We have now wised up and included this type of issues into our monitoring and alert system.
We’ve always taken extra care to protect your data and make sure the hardware is always-on. We are using this incident as a wake-up call to ensure our software gets the same level of care as well.
Whenever I visit someone’s shelf, the first thing I look is how much our taste overlaps. This is what aNobii used to show:
Easy to understand but sometimes too simplistic to be helpful. Perhaps I’ve read a book three times while the other person got it as a gift and left it to gather dust ever since. Perhaps I found a book a total waste of time while the person hailed it as the best thing written ever. Not to forget that the absence of the same book could sometimes speak louder than its presence.
So we’ve spiced things up by including a slew of different ingredients into cooking up the “taste compatibility” score. Here’s the new look (heavily influenced by Last.fm’s take):
We’ve tested around before settling with the current formula but it’s by no means final. Let us know what you think of it!