
Most of us use Facebook as social network application. Its famous, popular so it requires devotion and accuracy to manage all this. Following are some interesting facts
- #6 site on the internet
- 500 total employees
- 200 in engineering
- 25 in Infrastructure Engineering
- One of the largest MySQL installations in the world
- Big user and contributor to Memcached
- More than 10k servers in production
- 6,000 logical databases in production
 
Photo Storage and Management at Facebook:
- Photo facts:
- 6.5B photos in total
- 4 to 5 sizes of each picture is materialized (30B files)
- 475k images/second
- Mostly served via CDN (Akamai & Limelight)
- 200k profile photos/second
- 100m uploads/week
- Stored on netapp filers
- First level caching via CDN (Akamai & Limelight)
- 99.8% hit rate for profiles
- 92% hit rate for remainder
- Second level caching for profile pictures only via Cachr (non-profile goes directly against file handle cache)
- Based upon a modified version of evhttp using memcached as a “backing” store
- Since cachr is independent from memcachd, cachr failure doesn’t lose state
- 1 TB of cache over 40 servers
- Delivers microsecond response
- Redundancy so no loss of cache contents on server failure
 
Photo Servers
- Non-profile requests go directly against the photo-servers
- Only profile requests that miss the cachr cache.
- File Handle Cache (FHC)
- Based upon lighttpd and uses memcached as backing store
- Reduces metadata workload on NetApp servers
- Issue: filename to inode lookup is a serious scaling issue: 1) drives many I/Os or 2) wastes too much memory with a very large metadata caceh
- They have extended the Linux kernel to allow NFS file opens via inode number rather than filename to avoid the NetApp scaling issue.
- The inode numbers are stored in the FHC
- This technique offloads the NetApp servers dramatically.
- Note that files are write only. Mods write a new file and delete the old ones so the handles will fail and a new metadata lookup will be driven.
 
Issues with this architecture:
- Netapp storage overwhelmed by metadata (3 disk I/Os to read a single photo).
- The original design required 15 I/Os for a single picture (due to deeper directory hierarchy I’m guessing)
- Tracking last access time, last modified etc. has no value to Facebook. They really only need a blob store but they are using a filesystem at additional expense
- Heavy reliance on CDNs and caches such that netapp is basically almost pure backup
- 92% of non-profile and 99.8% of profile pictures are stored in CDN
- Many of the rest are almost all stored in caching layers
 
Solution: Haystacks
- Haystacks are a user level abstraction where lots of data is stored in a single file
- Store an independent index vastly more efficient than the file store
- 1M of metadata/1G of data
- Order of magnitude better on average than standard NetApp metadata
- 1 disk seek for all reads with any workload
- Most likely store in XFS
- Expect each haystack to be about 10G (with an index)
- Speaker equates a Haystack to be a lot like a LUN and could be implemented on a LUN. The actual implementation is via NFS onto NetApp as photos were previously stored
 
Net of what’s happening:
- Haystack always hits on the metadata
- Plan to replace NetApp
- Haystack is a win over NetApp but we’ll likely run over XFS (originally done by Silicon Grapics)
- Want more control of the cache behavior
Each Haystack Format:
Version number, Magic number, Length, Data, Checksum, Index format, Version, Photo key, Photo size, Start, Length.
- Not planning to delete photos at all since delete rate is VERY low so it the resource that would be recovered are not worth the work to recover them in the Facebook usage. Deletion just removes the entry from the index which makes the data unavailable but they don’t bother to actually remove it from the Haystack bulk storage system.
Q: Why not store the index in a RDBMS? Feels that it’ll drive too many I/Os and have the problems they are trying to avoid (I’m not completely convinced but do understand that simplicity and being in control has value).
• They still plan to use the CDN but they are hoping to reduce their dependence on CDN. They are considering becoming their own CDN (Facebook is absolutely large enough to be able to do this cost effectively today).
• They are considering using to SSDs in the future.
• Not interested in hosting with Google or Amazon. Compute is already close to the data and they are working to get both closer to users but don’t see a need/use for GAE or AWS at the Facebook scale.
• The Facebook default is to use databases. Photos are the largest exception but most data is stored in DBs. Few actions use transactions and joins though.
• Almost all data is cached twice: once in memcached and then again in the DBs.
Random bits:
- Canada: 1 out of 3 Canadians use Facebook.
- What is the strategy in China? A:“not to do what Google did” :-)
- Looking at de-duping and other commonality exploiting systems for client to server communications and storage (great idea although not clearly a big win for photos).
- 90% Indians access internet via a mobile device. Facebook very focused on mobile and international.
 
0 comments:
Post a Comment
thank you for visiting our site..!