Interesting technical facts about Facebook ~ www.arsh2.tk

Most of us use Facebook as social network application. Its famous, popular so it requires devotion and accuracy to manage all this. Following are some interesting facts

#6 site on the internet
500 total employees
200 in engineering
25 in Infrastructure Engineering
One of the largest MySQL installations in the world
Big user and contributor to Memcached
More than 10k servers in production
6,000 logical databases in production

Photo Storage and Management at Facebook:

Photo facts:
6.5B photos in total
4 to 5 sizes of each picture is materialized (30B files)
475k images/second
Mostly served via CDN (Akamai & Limelight)
200k profile photos/second
100m uploads/week
Stored on netapp filers
First level caching via CDN (Akamai & Limelight)
99.8% hit rate for profiles
92% hit rate for remainder
Second level caching for profile pictures only via Cachr (non-profile goes directly against file handle cache)
Based upon a modified version of evhttp using memcached as a “backing” store
Since cachr is independent from memcachd, cachr failure doesn’t lose state
1 TB of cache over 40 servers
Delivers microsecond response
Redundancy so no loss of cache contents on server failure

Photo Servers

Non-profile requests go directly against the photo-servers
Only profile requests that miss the cachr cache.
File Handle Cache (FHC)
Based upon lighttpd and uses memcached as backing store
Reduces metadata workload on NetApp servers
Issue: filename to inode lookup is a serious scaling issue: 1) drives many I/Os or 2) wastes too much memory with a very large metadata caceh
They have extended the Linux kernel to allow NFS file opens via inode number rather than filename to avoid the NetApp scaling issue.
The inode numbers are stored in the FHC
This technique offloads the NetApp servers dramatically.
Note that files are write only. Mods write a new file and delete the old ones so the handles will fail and a new metadata lookup will be driven.

Issues with this architecture:

Netapp storage overwhelmed by metadata (3 disk I/Os to read a single photo).
The original design required 15 I/Os for a single picture (due to deeper directory hierarchy I’m guessing)
Tracking last access time, last modified etc. has no value to Facebook. They really only need a blob store but they are using a filesystem at additional expense
Heavy reliance on CDNs and caches such that netapp is basically almost pure backup
92% of non-profile and 99.8% of profile pictures are stored in CDN
Many of the rest are almost all stored in caching layers

Solution: Haystacks

Haystacks are a user level abstraction where lots of data is stored in a single file
Store an independent index vastly more efficient than the file store
1M of metadata/1G of data
Order of magnitude better on average than standard NetApp metadata
1 disk seek for all reads with any workload
Most likely store in XFS
Expect each haystack to be about 10G (with an index)
Speaker equates a Haystack to be a lot like a LUN and could be implemented on a LUN. The actual implementation is via NFS onto NetApp as photos were previously stored

Net of what’s happening:

Haystack always hits on the metadata
Plan to replace NetApp
Haystack is a win over NetApp but we’ll likely run over XFS (originally done by Silicon Grapics)
Want more control of the cache behavior

Each Haystack Format:
Version number, Magic number, Length, Data, Checksum, Index format, Version, Photo key, Photo size, Start, Length.

Not planning to delete photos at all since delete rate is VERY low so it the resource that would be recovered are not worth the work to recover them in the Facebook usage. Deletion just removes the entry from the index which makes the data unavailable but they don’t bother to actually remove it from the Haystack bulk storage system.

Q: Why not store the index in a RDBMS? Feels that it’ll drive too many I/Os and have the problems they are trying to avoid (I’m not completely convinced but do understand that simplicity and being in control has value).

• They still plan to use the CDN but they are hoping to reduce their dependence on CDN. They are considering becoming their own CDN (Facebook is absolutely large enough to be able to do this cost effectively today).

• They are considering using to SSDs in the future.

• Not interested in hosting with Google or Amazon. Compute is already close to the data and they are working to get both closer to users but don’t see a need/use for GAE or AWS at the Facebook scale.

• The Facebook default is to use databases. Photos are the largest exception but most data is stored in DBs. Few actions use transactions and joins though.

• Almost all data is cached twice: once in memcached and then again in the DBs.

Random bits:

Canada: 1 out of 3 Canadians use Facebook.
What is the strategy in China? A:“not to do what Google did” :-)
Looking at de-duping and other commonality exploiting systems for client to server communications and storage (great idea although not clearly a big win for photos).
90% Indians access internet via a mobile device. Facebook very focused on mobile and international.

Posted in: INTERNET