Data history tracking

One of common features in information systems especially in enterprise systems and in systems that has important auditing is history tracking. System operators want to know each data change in the system is done by who in what time. This is true in financial and health information systems.



Different software use different approaches. Web content editing software like Wikipedia and WordPress use versioning mechanism while editing content of an entry. They keep every changes made to an item. So all modifications during several years exists and can be reverted to. This approach usually is good when one or two database tables are tracked this way. Also one or two fields of them are tracked. Otherwise database volume growth will be very huge after a while. Old day developer used to use WORM storage medias so forced to use versioning approach by nature. They were not able to change saved records, instead inserted new records with increased version number.


Full History

Some software solutions use full history strategy. They blindly log every record change automatically. For example consider you modify and Order info and its OrderItems. This translates to multiple record update in database level. A History table will be populated with same count of records indicating old value and current value of modified records (serialized value of all the row), time and person who triggered the change. Utilizing this approach helps finding every tiny information change but also will create large amounts of redundant data that large percent of them will never be used. It's triggering nature may cause cycling problem in addition to performance overhead. Event sourcing technique can reduce performance and storage usage footprints of full history approach.


Process Level Log

Referential integrity, performance overhead, redundancy and data storage waste are obvious weaknesses of versioning and full history approaches. There is another approach that can not be automated like them and does not have high level of accuracy but instead does not suffer from mentioned weaknesses. This approach is done via manual call of logging utility in each high level entry of the system. Process refer to use case scenarios of system usage in the view of end user. Each process begins in high level entries of the system. Developer will fill a brief description of data change, purpose of it and the user or system that triggered it. It can include some detailed information of record changes for advanced auditing usages. A sample class diagram would be like following.