Wednesday, August 24, 2011

Data warehouse glossary


For our ease lets divide our topic in 2 components of data warehouse and there we will get to know the common terms that are commonly used in the particular components. Since many terms can be used in across section, if you feel like it should be the part of other component or I have missed some, comments are appreciated.


Disclaimer: My sole objective here is to put across most of the jargon used in data warehousing not segregation of data warehouse into components.
  • Data warehouse
  • Business Intelligence ( reporting)
Some commonly used terminology in Data warehouse:
  • Aggregates: Physical rows in the database, almost always created by summing the other records in the database for the purpose of improving query performance. There are many   FACTS tables in the data warehouse that are summed for selected dimensions from the original fact table. Thus these tables will be having fewer rows and hence the query   execution will be faster.
  • Attribute: A column field in the DIM (Dimension) table.
  • Conformed dimension: If two dimensions are same or one is exactly the perfect subset of other, they are called conformed dimension. The results produced by either of these dimension with any fact should be same.
  • Data extract: Process of copying from source system (in this case operational system) in order to load in the target system (Data warehouse).
  • Data Mart: A logical and physical subset of data warehouse that caters the requirement for the specific area of the business process.
  • Data Mining: Is referred as finding the pattern from the most atomic data.
  • Data presentation Area: Is the place where data is organized, stored, and available for direct querying or from any data access tools or from analytics tool.
  • Data Staging area: Is the area where data is kept temporarily after extracting from the source system and is used for data clean, transform, de dup, and archive (in some case).
  • Data warehouse: Is the subject oriented, integrated, time-variant, and non-volatile collection of data for supporting organization’s decision making process.
  • Degenerate Dimension: Dimension key, such as invoice number or transaction number or bill number, that has no attributes and hence does not joins with the actual dimension table.
  • Dimension table: The table in data warehouse that has descriptive attribute columns for a entity.
  • Meta Data: Any data maintained to support the operations or use of data warehouse.
  • Slowly changing dimensions (SCD): The tendency of dimension rows to change gradually or occasionally over time. A type 1 SCD is a dimension whose attributes are overwritten when the value of an attribute changes. A type 2 SCD is a dimension where a new row is created when the value of an attribute changes. A type 3 SCD is a dimension where an alternate old column is created when an attribute changes.
     Some commonly used terminology in Business Intelligence:
  • AdHoc Queries / Reports: Queries: Reports that are formulated/ designed by the user on the spur of the moment.
  • Analytic processing: Using data for analytic purpose to support business decision making.
  • Business Intelligence: Is the generic term to describe the leveraging the organization’s internal and external asset for better Business decision.
  • Cubes: Name of dimensional structure on a OLAP database platform, originally referring to the simple three dimension. Like in case of the product, time & place.
  • ODS: A physical set of tables sitting between operational system and data warehouse. ODS caters the demand of immediate / real time reporting from the dashboard.
  • Slice and Dice: Ability to access a data warehouse through any of its dimension equally. Slicing and dicing is the process of separating and combining warehouse data in seemingly endless combinations.