Saturday, April 14, 2012

Facebook caching

Economy of scale takes on a new meaning at Google, Facebook and large cloud companies. I worked at F5 for a year, and learned a fair amount about deduping, link optimization and other strange techniques. I just ran into an issue with Facebook that highlights this, and may prove useful to others who encounter the same problem.

My wife has a blog, The Forever Marriage where she discusses her forthcoming book. She blogs about 3 times per week, and always updates her facebook status with a link to the latest entry. Facebook normally gets a nice picture and some text, but for the latest, it would only post the raw URL.

The issue was some strange html that crept in due to a Wordpress plugin, easily fixed. But Facebook had cached the link on first load, and thereafter only had that link. Any time she would try to post the new link, Facebook would only do the original page. I verified this by checking the logs as she was posting a new link: no activity in the logs.

Economy of scale: imagine millions of people posting links. A few links will be posted by only one person. The majority probably get posted by 5 or 10 people, and a few get reposted by hundreds or thousands of people. Caching the link turns out to be a large savings in bandwidth. I would estimate reducing the bandwidth used for links by 90 to 95 percent!

In case you encounter a similar issue, this website was very helpful: Rajesh Rana . net. It directs you to Facebook Developer debugging page where you can clear the cache entry for your page.