当前位置:
文档之家› SQLServerAnalysis Service学习资料.ppt
SQLServerAnalysis Service学习资料.ppt
Subject-Oriented
• Subject-Oriented data is organized around major subject areas of an enterprise and is useful for an enterprise-wide understanding of those subjects. For Example, a banking operational system maintains independent records of customer savings, loans, and other transactions. A warehouse pulls this independent data together to provide financial information. The data from diverse sources is transformed so that it is consistent and meaningful for the warehouse. So the main work is around the fact table.
Recommend:
1 is very clear and understandable 2 can utilize fully the index of database linked. It is more efficient than 1. So Dimension table can adopt 1 Original table can adopt 2
– DTS design and schedule – The dimension data is incremental, it is to say that they can be inserted and updated, but can’t be deleted. And dimension data must be unique. – The original or fact data is incremental, but according to requirement, it can be updated or deleted under the control. The full process is danger and impossible when the data volume is too huge.
history and temporary table
The large-volume table can be divided into:
– Temporary table – History table It may be inconvenient. But it can improve the performance, temporary table is used to make fact table quickly. Temporary data should be transfer into history table periodically.
– – – – – – Original data from data source Dimension Measurement dimension granularity Star model or snowflake model OLAP model
• •
Integrated
• • Data on a given subject is integrated. In many organizations, data resides in diverse independent systems, making it difficult to integrate the information into a single set of meaningful data from analysis. A key characteristic of a warehouse is that data is completely consolidated or integrated. Data is structured in a globally accepted manner, even when the underlying source data is structured differently (conforming dimension). Integration and transformation processes can be time-consuming and costly. It requires commitment from every part of the organization, particularly top-level managers who make the decisions and allocate resources and funds. So the main work may be like :
SQL Server 2000 Analysis Service
Agenda
• Data Warehouse Concept • Database Design (Physics and Logic) • Backup
Data Warehouse Concept
A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data designed to support management DSS needs.
• • • •
Database Design
• • • • • • • Database design File Group design History and temporary table design No Partition but view Database link Log table Increment ETL
– Classify by Business – Classify by process – Classify by data (history and temporary) Recommend by process – Bidw_org original data storage – Bidw_fact fact data storage – Bidw_dim dimension data storage
• • •
Time-Variant
• Warehouse data is by nature historical; data is retained for a long time, from two to ten years, compared with one to three months of data for a typical operational system. The data allows for analysis of past and present trends, and for forecasting, using what-if scenarios. Base the Time-Variant and data volume, we must consider : The design about the database The design about the table The OLAP increment
No Partition but View
It is still the performance bottleneck of SQL Server.
– SQL Server 2000 not support the partition – SQL Server 2000 encounter bottleneck when the data volume is more than 10,000,000. – But It support view union. – The large-volume table can be divide into different table like tablename_yyyymm, then create a view to union them. Like create view v_fact_table as select * from t_fact_table_200601 union all select * from t_fact_table_200602 union all select * from t_fact_table_200603
– – – – Universal dimension from different department. Dimension design include NULL value or violate constraint Define mid-exchange table But we didn’t control the data quality caused by man-made
•
Nonvolatile
• Typically, data in the data warehouse is read-only (less volatile than operational systems). Data is loaded into the data warehouse for the first-time load, and then refreshed regularly. Warehouse data is accessed by business users. Warehouse operations typically involve: Loading the initial set of warehouse data (often called the first-time load) Refreshing the data regularly (called the refresh cycle) So the main work is around: