Monday, 23 July 2012

Analysis of Business Workflows & Pipelines

Systematic expansion of the businesses across geographies brings in the simple and complex business processes. These processes combine numerous workflows & pipelines representing life cycle of the business events. Many of the data warehouse (DW) sponsors deprioritize the analytical needs of workflows & pipelines which ironically are very rich source of numerous analytical reports which can act as game changer for that particular business.


Let us start with an example of business workflow for online retail store. The company named“eBayz” sells commodities online. Customers can log into the website and place orders. For out of stock items the purchase order will be raised to the respective supplier and the item will be shipped to the warehouse and then will follow the in-stock workflow. In-stock products will be dispatched from warehouse to local area store and from there it will be delivered to the customers. Payment can be done while placing order or at thetime of delivery.


With the existing analytics business users found that the customer complaints have increased related to delivery issues. The “Customer Complaints” analytics shows that the trend graph is touching all time high peaks for last three months and the reaction captured were...


 Alas! This f**k is going to suck our business. What to do now?
What the problem is?
Where the problem is?
How do we know about the problem?

These are the business questions which can’t be answered from the regular analytics. Here the accumulated snapshot facts serve as rescue boat. The whole lifecycle/pipeline/workflow of the order is captured in accumulated snapshot fact table. All the events have the date and each event date is captured as a column. For our example the fact table design may look like below to capture complete pipeline.




What derived the cost saving for shipping? Is it the low in-stock commodities?
How many new orders and how many undelivered orders do we have on daily basis?


Interestingly the time lags are also the part of the fact table. With the design we can find the answers to the above business questions. Very easily we can see where the lag is more and that can be the problem area. To answer above questions it was found that for the same time period the trend chart for the “ArrivedToStoreLag” is also touching peaks implying that the problem area was late product shipping to local store. On investigation it was found that to save the shipping cost the product was kept on hold for more number of days which resulted peaks in “Customer Complaints” analytics. This answers all the related questions and raising some new question like…


These new business questions are also supported by the current design. The new orders can be directly queried joining DimDate and FactOnlineOrders entities whereas undelivered orders are really a daily snapshot calculated based on more than one role of the date dimension. Here undelivered orders for 4-June-2012 are the records having “DeliveryDate” greater than and “OrderPlacedDateId” less than 4-June-2012 including 2 different roles of date.

Another question from the DW designer prospect is like “Is this the only design supporting the above features and answering all the business questions in most performing manner?” The answer lies on individual’s assessment of the problem statements being targeted from the analytics. Here is one more design approach to capture pipeline in transactional manner i.e. store lifecycle in multiple rows rather than putting them together as a single row. Obviously it has its own pros and cons and they are discussed below with the changed fact and introducing new dimension design.




The latter design moves the multiple columns into the rows i.e. fact table is transformed from accumulated snapshot grain to transactional grain. Event grouping helps us flexibility to have more than one lifecycle in same dimension without making it junk.The new design answers the questions asked in the initial state but restrictively. In the former design we can keep multiple time lags easily like we have individual lags and overall lag in the process which can’t be done with the latter design approach. Moreover it is also difficult to get other advanced answers like count of undelivered orders, orders waiting at warehouse, orders waiting at store on daily basis etc. which we were getting easily from the former approach. One of the designs to answer these questions is creating one table for each question on the daily basis but it need very strong archive and purge policies or the data will grow exponentially.


With the above discussion the accumulated snapshot provides us the flexible and preferred design. Kimball group advocates the analytical need of workflows& pipelines using accumulated snapshot. Detailed behavior of the accumulated snapshot facts are detailed in “Design Tip #37, #130 & #145” at Kimball group. Microsoft’s SSAS very nicely supports the workflows & pipelines scenarios with accumulated snapshot facts putting some tricks in place particularly with role playing dimensions. The blog “Counting 'in-flight' events in MDX” from Mosha talks in detail to handle these scenarios in cube and MDX.

No comments:

Post a Comment