当前位置:文档之家› 开源时序数据库opentsdb介绍

开源时序数据库opentsdb介绍

OpenTSDBThe Distributed, Scalable, Time Series Database For your modern monitoring needsCollect, store and serve billions of data points with no loss of precisionBenoît “tsuna” Sigouretsuna@15464 points retrieved, 932 points plotted in 100msTired of 10+ yearold monitoringsystems?Common problems include:•Centralized data storage (SPoF)•Limited storage space•Data deteriorates over time•Plotting a custom graph is hard•Doesn’t scale to:•>>10s of billions of data points•>1000s of metrics•New data every few secondsOld PlowOpenTSDB •First open-source monitoring system built on an open-source distributed database •Collect all the metrics you can imagine every few seconds •Store them forever •Retain granular data •Make custom graphs on the fly •Plug it into your alerting system•Do capacity planning L e t ’s t a ke a d e e p d i v e i n s i d e 203 by David ClavelHBaseDistributed Scalable Reliable EfficientBright Ideas by purplemattfishKey concepts•Data Points(time, value)•Metricsproc.loadavg.1m•T agshost=web42 pool=static•Metric + Tags = Time Seriesput proc.loadavg.1m 1234567890 0.42 host=web42 pool=staticThe Big Picture™Applications tcollectorPeriodic pollingTSD TSDTSD HBase Push data pointsPut BrowserGraph request(HTTP)one machineone or more serversScan Linux custom metrics /proc •Deploy tcollector (agent) on all your servers •Run as many TSDs as you need •Write custom collectors for your custom applications OpenTSDB’spush model12 Bytes Per Datapoint4TB per year for 1000 machines12 Bytes Per Datapoint2 t o 3What’s new?•Faster write path •T wo fsck -type tools (because sh*t happens)•Wider rows•More memory efficient What’s hot (just in for OSCON)•Compacted rows / improved schema (reduces data size by 6x, allows reading >6M points/s)Misc:•More unit tests •Forward compatibility with future variable length encoding •Improved build system BETAOpenTSDB @150 Million Datapoints/Dayin a typical datacenter 600•Over 70 billion data points stored (only 720GB on disk)• 1 year anniversary as the main production monitoring system •Completely replaced Ganglia + Munin + Cacti mix(4x g r o w t h i n 6 m o n t h s )(after 5x LZOcompression)Demo Time!Recipe For Good Performance •#1 rule: keep good data locality•Know your access pattern•Use a key structure that yields good locality for your access pattern •Avoid wide rows with big keys and many small cells •OpenTSDB’s secret ingredient: asynchbase•Fully asynchronous, non-blocking HBase client •Written from the ground up to be thread-safe for server apps •Far fewer threads, far less lock contention, uses less memory •}}}052001028}}047001Column olumn Family: nam : name Colum Column Family: id y: id Row Keymetricstagktagvmetricstagktagvhoststaticproc.loadavg.1mhostproc.loadavg.1m525201001C olumn Fumn FamiFamily: tRow Key+0+15+20...+1890...+36000.690.510.420.990.72}=1234566000+1890}73-107-5112}}}052001028}}047001Implications of the SchemaC olumn F umn Fami Family: tRow Key+0+15+20...+1890...+36000.690.510.420.990.72•Queries always need data points for a metric and time range •All data points for a given metric next to each other •All data points for a time range next to each other •Compact data + data locality = efficient range scansC olumn F umn Fami Family: tRow Key+0...+10...+25......0.690.510.42C olumn F umn Fami Family: tRow Key+0...+10...+25......0.690.510.42Row Key +0+10+25+0+10+250.690.510.420.690.510.42Step 1: Concatenate allcolumns and valuesC olumn F umn Fami Family: tRow Key+0...+10...+25......0.690.510.42Row Key +0+10+25+0+10+250.690.510.42Step 2: Delete individual values0.690.510.42100% Natural, Organic Free &Open-SourceF o rk m e o n G it H u bLiked what you saw?Set it up in 15 minutes•JDK + Gnuplot 1 minute (1 command)•Single-node HBase 4 minutes (3 commands)•OpenTSDB 5 minutes (5 commands)•Deploy tcollector 5 minutes¿ Questions ?Benoît “tsuna” SigoureF o rk m e o n G it Hu bk th i s i s c o o l ?Under the HoodNettysu asyncasync hbaseTSDcore Local Disk (cache)su async async hbaseTSD coreLocal Disk (cache)put proc.loadavg.1m 1234567890 0.42 host=web42 pool=static1s delaymax.su async async hbaseTSD coreLocal Disk (cache)put proc.loadavg.1m 1234567890 0.42 host=web42 pool=staticsu async async hbaseTSD coreLocal Disk (cache)scanGET /q?...su async async hbaseTSD coreLocal Disk (cache)scanwrite to cacheGET /q?...Gnuplot。

相关主题