当前位置:文档之家› Google云计算技术Bigtable_国外课件

Google云计算技术Bigtable_国外课件


4
Performance Improvements
• Tablet Recovery
– perform compaction on tablet before offloading to another tablet server – 2 minor compactions to remove need for recovery tablet server to deal with recovery log entries – No synchronization needed to read from SSTable – Only memtable is mutable – Use mark-and-sweep garbage collection for SSTables in METADATA table – Split tablets quickly by letting child tables share SSTable of parent tablet
• Write operations increase the size of memtable and commit log
– Longer log, longer recovery
Performance Improvements
• Locality group (grouping multiple column families)‫‏‬
Motivation
• High scalability
Bigtable: A Distributed Storage System for Structured Data
April 21, 2;;<
– Scale to petabytes of data – Thousands of machines
• Bloom filters
– Bloom filters for a particular locality group in SSTable – Reduce need to read from disk if SSTable not in memory
• tablets are offloaded to other tablet servers in case of failure rebuild tablets by reading and applying mutations from commit log • sort commit log • partition log into chunks to allow parallelism • two log writing threads per tablet server to prevent hiccups due to GFS latency
– A _tablet` is a row range (set of ordered rows)‫‏‬
Implementation
• Client library • Master Server
– – – – – Only 1 master everyK guaranteed by Chubby Assigning tablets to tablet servers Detecting additionRexpiration of tablet servers Balancing load of tablet servers Garbage collection of GFS files
• High performance • High availability • Wide applicability
– 6; Google products using Bigtable (Analytics, Finance, Earth, OrkutK)‫‏‬
• Monitor temporal changes
– Block Cache
• High level cache to store key-value pairs returned by SSTable to tablet server • Useful when reading same data over and over again • Low level cache to store blocks read from GFS • Useful when reading data close to data recently read
– (row:string, column:string, time:int64) string
The Data Model
1
API
• Enables read, write, delete of tables, column families, rows, column family metadata (access control)‫‏‬
• Caching
– Improve read performance using 2-level cache – Scan Cache
Performance Improvements
• Commit-log
– single commit log per tablet server – append mutations to single file; co-mingling mutations for different tablets – complicates recovery
• METADATA tablets define logical Tables • Table (logical grouping)‫‏‬
– Tablet (S)‫‏‬
• Tablet Log (1)‫‏‬
– Written on GFS
• Tablet Servers
• memtable (1)‫‏‬
• Location of root tablet is in maБайду номын сангаасter lock file.
Tablet Assignment
• Master keeps track of the set of live tablet servers and tablet assignments
– When tablet server starts it acquires a lock on a unique file in a specific directory.
Tablet Serving
• Master pings for liveness of tablet server
– Failure: tablet reports that it has lost its lock or fails to reach the server
3
Compactions
– In memory
• SSTable (S)‫‏‬
– Written on GFS, immutatable.
– 1; to 1;;; tablets in 1 tablet server – Handles readRwrite request from client application – Splits tablets when tablets grow too large
Overview
• Similar to database, but no relational data model essentially a key value store • Sparse, distributed, persistent, multidimensional sorted map
Building Blocks
• Chubby
– Distributed locking service – Uses Paxos algorithm for consensus
• GFS
– Runs on same box as Bigtable – Underlying file system
• Can use regex for row and column matching
• memtable row is copy-on-write • reads and writes occur in parallel
Real Applications
• Google Analytics • Google Earth • Personalized Search
• Exploiting Immutability
Lessons
• Expect failures
– – – – – – – Fail-stop failures Memory and network corruption Clock skews Hung machines Extended and asymmetric network partitions Failures in underlying components (Chubby)‫‏‬ Overflow of GFS quotas
RR Open the table Table ST = OpenOrDie("RbigtableRwebRwebtable"); RR Write a new anchor and delete an old anchor RowMutation r1(T, "n.www"); r1.Set("anchor:", "CNN"); r1.Delete("anchor:"); Operation op; Apply([op, [r1);
– separate SSTable for each locality group – segregation of column families which are not usually accessed together more efficient reads – some SSTables can be declared to be in-memory (loaded lazily)‫‏‬ – compress each SSTable block separately for a locality group (can read portions of SSTable wRo decompressing entire thing)‫‏‬ – two pass compression scheme (1st pass Bentley and McIlroy’s scheme; 2nd pass fast compression algorithm)‫‏‬ – emphasize speed over space reduction
相关主题