当前位置:文档之家› 大数据分析存储解决方案

大数据分析存储解决方案


Page 11
Hadoop 说明, Map Reduce, HDFS
HDFS 把数据分散存储在多个存储节点Node上 HDFS 设计时就假设存储节点有失效的可能〃所以HDFS会把一份数据复制3份以上〃分散存 储在多个节点上〃从而实现系统整体上的可靠性 HDFS文件系统是由服务器节点集群组成的〃每台服务器依照HDFS的特有block协议支持网 络化block 数据 HDFS Name Node 有发生单点故障的危险 IBM 在改善文件系统的性能同时消除了单点故障 ——Elastic Storage -SNC (available as beta code)
Big Data Platform Capabilities
Information Ingest Real-time Analytics Warehouse & Data Marts Analytic Appliances
Advanced Analytics/ New Insights
Cognitive认知
Outage Mgmt
Information Integration & Governance
Systems Security Storage
预测哪些用户适合于哪些分 时时段电价或需求/响应服务
Billing systems
数据在加载到数据仓库前的清洗、 验证,这些数据可能来自很多的用 户、收费系统或断电保护系统
Big Data & Analytics
对的决策 对的地方 对的时间点
速度及时响应随时可能出现的商业机
会,这就需要灵活、实时性的基础架 构
System of Record (SoR)
The dynamics of SoR and SoE:
– 通过负载及资源部署的优化,来增强 灵活性和效益 – 通过采用包括基于开放标准的技术等 新技术来改善IT economics
具备洞悉能力的系统 Systems of Insight
Creative, holistic thought, intuition Systems Of Engagement
Hadoop and Streams
New Approach
Data Warehouse Transaction Data Internal App Data Structured Mainframe Data
What Do You Have? ISV Solutions
Social Network
Page 7
New Infrastructure Leverages Data Types
Real-time Analytics
Streams
Data in Motion
Video/Audio Network/Sensor Entity Analytics Predictive Information Ingestion and Operational Information Landing Area, Analytics Zone and Archive
Page 3
大数据分析的新型架构解决方案
All Data
Data Zone
IBM Watson Foundations Application Zone
New/Enhanced Applications
Meters
Real-time Data Processing & Analytics
What is happening?
Resource Planning
Smart Metering
资源规划
电量使用预测更为精确
Customer Service / Customer Operations
提高客户满意度
法规遵从
5
实现真正的有效的 法规遵从
Page 5
案例: 用大数据分析来加强 Smart Metering
All Data
海量数据集成和转化
Stream Computing
InfoSphere Streams
低延迟流数据分析 Velocity, Variety & Volume Data-In-Motion
MPP Data Warehouse
Netezza High Capacity Appliance
基于结构化数据的可查询 归档
serve portals What is 分析用户用电情况,侦 happening? 测偷电、改表等行为
Customer self-
ERP
Location
Operational data zone
Customers
Landing, Exploration and Archive data zone
Warehouse
BI and Predictive Analytics
Streams
Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning
BigInsights
Navigation and Discovery
Smart Analytics System Netezza 1000
基于结构化数据的 BI+定制化分析 Data
基于结构化数据的运营分析
InfoSphere Warehouse
基于结构化数据的大容量数据 分析 Page 10
Informix Timeseries
Time-structured analytics
Fraud / theft protection
What action should I take?
Decision management
What did I learn, what’s best?
Cognitive
Why did it happen?
Reporting and analysis
Call Centers
Multimedia Web Logs Social Data Text Data: emails Sensor data: images
Repeatable Linear
Accumulation
Systems of Insight Unstructured Enterprise Exploratory Integration Dynamic and Context
IBM存储解决方案
——数据分析的存储
IBM STG 谢文华 wenhuax@
© Copyright IBM Corporation 2014
从企业数据向大数据的扩展
Structured, analytical, logical Systems of Record
Traditional Approach
Cognitive
Why did it happen?
Reporting and analysis
Call Centers
关系掌控 构建和维护电网的唯 一试图
Grid
分时时段电价的实时定价 或 提供及时的需求/响应服务
What could happen?
Predictive analytics and modeling
What could happen?
Predictive analytics and modeling
Outage Mgmt
Grid
Information Integration & Governance
Systems Security Storage
Billing systems
On premise, Cloud, As a service
Data in Many Forms
Information Governance, Security and Business Continuity
Page 8
© Copyright IBM Corporation 2014
IBM Big Data Platform大数据平台
InfoSphere BigInsights
What is Hadoop?
What: 一种开源软件〃将数据计算分布到整个集群的常见商用服务器和 存储上
Why: 传统的计算架构是一种沿纵向扩展模式〃通过更快的SAN、大容 量内存和多级缓存将数据加载到CPU上〃成本比较高。 What: Hadoop 把大数据集合拆分区划为小数据集合〃再把小数据集合 分发到多台普通服务器上〃是一种横向扩展模式。 Why: Scalable, Flexible, Cost Effective, Fault Tolerent Components: Map Reduce, HDFS
Deep Analytics data zone EDW and data mart zone
Discovery and exploration
Fraud / theft protection
What action should I take?
Decision management
What did I learn, what’s best?
Intelligence Analysis
Exploration, Integrated Warehouse, and Mart Zones
Discovery Deep Reflection Operational Predictive
Decision Management

Data at Rest
Stream Processing Data Integration Master Data
相关主题