Saturday, 11 April 2015

Caching Concepts and ATG Commerce Repository Cashing

In General :

Caching:


In computing, a cache (/ˈkæʃ/ KASH) is a component that stores data so future requests for that data can be served faster; the data stored in a cache might be the results of an earlier computation, or the duplicates of data stored elsewhere.


How does a cache work?

When a data is needed, the cache is checked to see if it contains the datum. If it does, the data is used from the cache, without having to access the main data store. This is known as a 'cache hit'. If the data is not in the cache, it is transferred from the data store; this is known as a 'cache miss'. When the cache fills up, items are ejected from the cache to make space for new items.

What are some of the popular caching algorithms?

Some of the most popular and theoretically important algorithms are FIFO, LRU, LFU, LRU2, 2Q and time-based expiration. Most of the titles are based on the strategies used to eject items from cache when the cache gets full, except the time-based expiration algorithms.


FIFO (First In First Out): Items are added to the cache as they are accessed, putting them in a queue or buffer and not changing their location in the buffer; when the cache is full, items are ejected in the order they were added. Cache access overhead is constant time regardless of the size of the cache. The advantage of this algorithm is that it's simple and fast; it can be implemented using just an array and an index. The disadvantage is that it's not very smart; it doesn't make any effort to keep more commonly used items in cache.


Summary for FIFO: fast, not adaptive, not scan resistant


LRU - (Least Recently Used): Items are added to the cache as they are accessed; when the cache is full, the least recently used item is ejected. This type of cache is typically implemented as a linked list, so that an item in cache, when it is accessed again, can be moved back up to the head of the queue; items are ejected from the tail of the queue. Cache access overhead is again constant time. This algorithm is simple and fast, and it has a significant advantage over FIFO in being able to adapt somewhat to the data access pattern; frequently used items are less likely to be ejected from the cache. The main disadvantage is that it can still get filled up with items that are unlikely to be reaccessed soon; in particular, it can become useless in the face of scans over a larger number of items than fit in the cache. Nonetheless, this is by far the most frequently used caching algorithm.


Summary for LRU: fast, adaptive, not scan resistant


LRU2 - (Least Recently Used Twice): Items are added to the main cache the second time they are accessed; when the cache is full, the item whose second most recent access is ejected. Because of the need to track the two most recent accesses, access overhead increases logarithmically with cache size, which can be a disadvantage. In addition, accesses have to be tracked for some items not yet in the cache. There may also be a second, smaller, time limited cache to capture temporally clustered accesses, but the optimal size of this cache relative to the main cache depends strongly on the data access pattern, so there's some tuning effort involved. The advantage is that it adapts to changing data patterns, like LRU, and in addition won't fill up from scanning accesses, since items aren't retained in the main cache unless they've been accessed more than once.


Summary for LRU2: not especially fast, adaptive, scan resistant


2Q - (Two Queues): Items are added to an LRU cache as they are accessed. If accessed again, they are moved to a second, larger, LRU cache. Items are typically ejected so as to keep the first cache at about 1/3 the size of the second. This algorithm attempts to provide the advantages of LRU2 while keeping cache access overhead constant, rather than having it increase with cache size. Published data seems to indicate that it largely succeeds.


Summary for 2Q: fairly fast, adaptive, scan resistant


LFU - (Least Frequently Used): Frequency of use data is kept on all items. The most frequently used items are kept in the cache. Because of the bookkeeping requirements, cache access overhead increases logarithmically with cache size; in addition, data needs to be kept on all items whether or not in the cache. The advantage is that long term usage patterns are captured well, incidentally making the algorithm scan resistant as well; the disadvantage, besides the larger access overhead, is that the algorithm doesn't adapt quickly to changing usage patterns, and in particular doesn't help with temporally clustered accesses.


Note: This is sometimes referred to as "perfect LFU", which is in contrast to "in cache LFU". The latter retains frequency of use data only on items that are already in the cache, and generally does not perform as well.


Summary for LFU: not fast, captures frequency of use, scan resistant


Simple time-based expiration - Data in the cache is invalidated based on absolute time periods. Items are added to the cache, and remains in the cache for a specific amount of time.


Summary for Simple time-based expiration: Fast, not adaptive, not scan resistant.


Extended time-based expiration - Data in the cache is invalidated based on relative time periods. Items are added to the cache, and remains in the cache until they are invalidated at certain points in time, such as every five minutes, each day at 12.00 etc.


Summary for Extended time-based expiration: Fast, not adaptive, not scan resistant.


Sliding time-based expiration - Data in the cache is invalidated by specifying the amount of time the item is allowed to be idle in the cache after last access time.


Summary for Sliding time-based expiration: Fast, adaptive, not scan resistant.


Working set - Based on Dr Peter Denning's classic "Working Set" paper from ACM Computing Surveys (CSUR) Volume 2 , Issue 3 (September 1970


Data in the cache marked with a flag for every access. The cache is periodically checked, recently access members are considered part of the "working set". Members not in the working set are candidates for removal. Size of cache is not defined directly, rather the frequency of the periodic checks indirectly controls how many items are deleted.

Summary for Working Set: Fast, adaptive, theoretically near optimal, not scan resistant.


Other algorithms - there are other caching algorithms available that have been tested in published papers. Some of the popular ones include CLOCK, GCLOCK, and LRD (Least Reference Density). Of possible interest is IBM's Adaptive Replacement Cache (ARC) paper (see ARC: A Self-Tuning, Low Overhead Replacement Cache, presentation), which includes some useful tables giving overhead times and hit ratios as functions of cache size and some other parameters.




Caching in ATG Repository

We use caching in ATG to handle frequent data changes in the application. We cache data for better system performance but many a time we need to clear the cache data to reflect the new changes or the server will give us the updated data. 

Disadvantage:There is a possibility of deadlock if we don’t manage the caching properly

Example:
<item-descriptor name=”..” cache-mode=”..”>

For each item descriptor, an SQL repository generally maintains two caches:
Item Caches

Query Caches

Note: Item descriptors within an inheritance tree share the same item cache



ITEM CACHES:

Item caches hold the values of repository items, indexed by repository IDs. Item caching can be explicitly enabled for each item descriptor. Even if caching is explicitly disabled, item caching occurs within the scope of each transaction.



An item cache entry is invalidated when that item is updated. The scope of an entry’s invalidation depends on its caching mode. For example, when an item is changed under simple caching mode, only the local cache entry is invalidated; other ATG instances are not notified. ATG provides several different Distributed Caching Modes to invalidate items across multiple instances.

QUERY CACHES:

Query caches hold the repository IDs of items that match given queries. When a query returns repository items whose item descriptor enables query caching, the result set is cached as follows:


The query cache stores the repository IDs.

The item cache stores the corresponding repository items.

Subsequent iterations of this query use the query cache’s result set and cached items. Any items that are missing from the item cache are fetched again from the database.

Query caching is turned off by default. If items in your repository are updated frequently, or if repeated queries are rare, the benefits of query caching might not justify the overhead that is incurred by maintaining the cache.

A query cache entry can be invalidated for two reasons:

A cached item property that was specified in the original query is modified.


Items of a queried item type are added to or removed from the repository.

Caching Modes
ITEM CACHES:

           Item caches hold the values of repository items, indexed by repository IDs. Item caching can be explicitly enabled for each item descriptor. Even if caching is explicitly disabled, item caching occurs within the scope of each transaction.
An item cache entry is invalidated when that item is updated. The scope of an entry’s invalidation depends on its caching mode.
For example, when an item is changed under simple caching mode, only the local cache entry is invalidated; other ATG instances are not notified. ATG provides several different Distributed Caching Modes to invalidate items across multiple instances.

QUERY CACHES:

           Query caches hold the repository IDs of items that match given queries. When a query returns repository items whose item descriptor enables query caching, the result set is cached as follows:

The query cache stores the repository IDs.

The item cache stores the corresponding repository items.

Subsequent iterations of this query use the query cache’s result set and cached items. Any items that are missing from the item cache are fetched again from the database.

Query caching is turned off by default. If items in your repository are updated frequently, or if repeated queries are rare, the benefits of query caching might not justify the overhead that is incurred by maintaining the cache.

A query cache entry can be invalidated for two reasons:

A cached item property that was specified in the original query is modified.


Items of a queried item type are added to or removed from the repository.

Caching Modes

1. Disabled:
                Caching, in a transaction specific cache, takes place during a transaction and the cache is flushed at transaction termination. The idea is that you don’t want any caching of these items but ATG will perform a transaction local caching for performance reasons. From the perspective of other repository users the items are not cached. In my experience this cache mode does not work correctly and rather acts like simple caching. I’d be happier if ATG dispensed with the “transaction local cache” and just did no caching whatsoever!
2. Simple Cashing

          In simple caching each ATG instance maintains an item cache for use by repository users on that instance. There is no synchronization between instances and changes made by other instances may not be seen. 
When caching mode is set to simple, each server maintains its own cache in memory. A server obtains changes to an item’s persistent state only after the cached entry for that item is invalidated. This mode is suitable for read-only repositories such as product catalogs, where changes are confined to a staging server, and for architectures where only one server handles a given repository item type.

3. Locked Cashing

           A multi-server application might require locked caching, where only one ATG instance at a time has write access to the cached data of a given item type. You can use locked caching to prevent multiple servers from trying to update the same item simultaneously—for example, Commerce order items, which can be updated by customers on an external-facing server and by customer service agents on an internal-facing server. By restricting write access, locked caching ensures a consistent view of cached data among all ATG instances.

          Locked caching is based on write locks and read locks.If no servers have a write lock for an item,any number of servers may have a read lock on that item.When a server requests a write lock,all other servers are instructed to release their read locks.Once an item is write locked,no other servers may get a read lock or write lock until the first server releases its write lock.In other words,once a server has a write lock on an item,all access to that item is blocked until the write is completed.

A server requests a read lock the first time it tries to access an item.Once the server has a read lock on the item,it holds that read lock until the lock manager notifies the server to release its read lock.At that time ,it drops the item from its cache.

A write lock is requested whenever a server calls getItemForUpdate(),or the first time setPropertyValue() is called,and released at the end of the transaction.





 Prerequisites:   Locked caching has the following prerequisites:

Item descriptors that specify locked caching must disable query caching by setting their query-cache-size attribute to 0.

A repository with item descriptors that use locked caching must be configured to use a ClientLockManager component; otherwise, caching is disabled for those item descriptors. The repository’s lockManager property is set to a component of type atg.service.lockmanager.ClientLockManager.

At least one ClientLockManager on each ATG instance where repositories participate in locked caching must be configured to use a ServerLockManager.

A ServerLockManager component must be configured to manage the locks among participating ATG instances.

Repository Example:
        display-property="invoiceNumber" cache-mode="locked">
                   display-property="address1" cache-mode="locked">
  <item-descriptor name="paymentTerms"display-name-resource="itemDescriptorPaymentTerms" sub-type-property="type" version-property="version"cache-mode="locked">
./atg_bootstrap.war/WEB-INF/ATG-INF/DCS/config/config.jar atg/commerce/invoice/invoiceRepository.xml
  <item-descriptor name="swapcheck"sub-type-property="stagingState"last-modified-property="lastCheckTime"cache-mode="locked" query-cache-size="0">
./atg_bootstrap.war/WEB-INF/ATG-INF/DAF/Search/Routing/config/config.jar atg/search/routing/repository/SearchConfigurationRepository.xml
  <item-descriptor name="configAndRepository" cache-mode="locked">
  <item-descriptor name="item" default="true" id-separator="|" cache-mode="locked">
  <item-descriptor name="versionedItem"id-separator="|" cache-mode="locked">
  <item-descriptor name="searchConfig"id-separator="|" cache-mode="locked">
./atg_bootstrap.war/WEB-INF/ATG-INF/DAF/Search/Index/config/config.jar atg/search/repository/IncrementalItemQueue.xml
  <item-descriptor name="deployment"cache-mode="locked">
  <item-descriptor name="threadBatch"cache-mode="locked">
  <item-descriptor name="failureInfo"cache-mode="locked">
./atg_bootstrap.war/WEB-INF/ATG-INF/DAF/Deployment/liveconfig/config.jar atg/deployment/deployment.xml


  • ClientLockManager Component:
         For each SQL repository that contains any item descriptors with cache-mode="locked", you must set the lockManager property of the Repository component to refer to a ClientLockManager. ATG comes configured with a default client lock manager, which you can use for most purposes:

lockManager=/atg/dynamo/service/ClientLockManager

         In SQL Repository: 
         lockManager=/atg/dynamo/service/ClientLockManager

A ClientLockManager component must be configured as follows:
Property
Setting
useLockServer
true enables this component to connect to a ServerLockManager
lockServerAddress
Host address of the ServerLockManager and, if set, the backup ServerLockManager
lockServerPort
The ports used on the ServerLockManager hosts, listed in the same order as inlockServerAddress

When you first install the ATG platform, the ClientLockManager component has its useLockServer property set to false, which disables use of the lock server. In order to use locked mode repository caching, you must set this property to true. This setting is included in the ATG platform liveconfig configuration layer, so you can set the useLockServer property by adding the liveconfig configuration layer to the environment for all your ATG servers. You must also set the lockServerPort and lockServerAddress properties to match the port and host of your ServerLockManagers components. For example, suppose you have two ServerLockManagers, one running on host tartini and port 9010 and the other running on host corelli and port 9010. You would configure the ClientLockManager like this:

$class=atg.service.lockmanager.ClientLockManager
lockServerAddress=tartini,corelli
lockServerPort=9010,9010
useLockServer=true
          
·         ServerLockManger Component:

The ServerLockManager is only ever used in a repository call. Recall that an ATG repository is simply an Object Relational Mapping framework. An item descriptor represents either a single table, or multiple tables in the case of a join. Regardless, if the data is something that should only be updated, or perhaps even read, by a single session at a time, the access to the data can be managed via a lock manager. There are multiple cache modes for an item descriptor. The only time a lock manager is used is when the cache mode is either locked or distributed.

Out of the box, very few repositories require a ServerLockManager.

The other caching modes, simple, distributed tcp, distributed jms, and distributed hybrid, do not require a ServerLockManager.


Example :

In repositories definition files, you can configure the cache mode and cache size and other cache related properties , you can check the cache statistics by accessing the dyna-admin page of these components..
The important thing you need to be aware of is the locking cache module where you have 1 lock server in the cluster and other servers have client lock managers to manage the requests to this server.. a configurable timeout for obtaining such locks, you shouldn’t use this lock mode unless you need this as in OrderRepository for example.

• /atg/dynamo/service/ServerLockManager
• /atg/dynamo/service/ClientLockManager

Example of how you can obtain a lock over an order:

TransactionDemarcation td = new TransactionDemarcation();
try {
td.begin(getTransactionManager(), td.REQUIRED);
getClientLockManager().acquireWriteLock(pOrderId);
LockReleaser lr = new LockReleaser(getClientLockManager(),
getTransactionManager().getTransaction());
lr.addWriteLock(pOrderId);
<insert your code here>
} catch (Exception de) {
…..
return false;
} finally {
try {
td.end();
}
catch(TransactionDemarcationException tde) {
}
}

The Lock Manager Architecture:

·         A repository configured to point to ClientLockManager
·         ClientLockManager points to ServerLockManager

The ClientLockManager opens a single socket to the ServerLockManager, and manages messages and waking up clients when locks are available.

4. Distributed TCP caching:
          Distributed TCP uses the das_gsa_subscriber table to store item descriptors and servers that have an interest in knowing when a repository item from that item descriptor is changed. When an item is invalidated by any server configured for distributed tcp caching, the server that initiated the change looks up all servers that have an interest and sends a message to each over a TCP socket. Application delivery is not guaranteed in distributed TCP caching.
          In this caching all the caches in a cluster are kept synchronized by sending invalidation events when a repository item is updated or deleted. These events are sent to each member of the cluster, serially, via a TCP connection. Yes, every instance in the cluster has a TCP connection to every other instance in the cluster. This amounts to a total of N * (N – 1) socket connections (where N is the number of cluster members) to support cache invalidation. In one of the clusters I support this number was approaching 11,000. In order to identify cluster members a database table, DAS_GSA_SUBSCRIBER, is maintained by each instance identifying the item types it shows as distributed and the host/port it is listening on for invalidation event connections. To ATG’s credit they use the same TCP connection to distribute all distributed invalidation events.
    
5. Distributed JMS caching:
          Distributed JMS persists the invalidation messages in the database. Use this when a given instance *must* invalidate the repository item to be invalidated.
          This cache mode works like distributed but uses a PatchBay message source/sink pair to deliver cache invalidation events using a JMS Topic. DistributedJMS is the new kid on the cache mode block joining the fray with ATG 7.0. This, to me, is a very promising delivery mechanism for invalidation events but it falls short in that it is based on SQLJMS which only supports persistent JMS destinations which operate in a polled manner. The whole purpose of a cache is to avoid disk/database I/O so a distribution scheme which uses database I/O doesn’t make much sense. In addition, the polled nature of SQLJMS can easily introduce latency into event distribution. However, plugging in a third party JMS provider which supports in-memory topics could be just the ticket.

6. Distributed Hybrid caching:
          Distributed hybrid caching stores all items to be cached for the given repository in a GSA cache server. All the instance invalidating the repository item must do is send a message to this server. This server then sends invalidation messages to only those servers that have the item cached. This is good as it greatly reduces network traffic, just ensure you have enough memory in the cache server to store everything.
         
Troubleshooting the ServerLockManager:

Enable debug

TCPDUMP – If the ClientLockManager can talk to the ServerLockManager, you should see a message come across on port 9010. You should also see the repository item Id (the primary key) of the item (row) being locked. Even if it can’t obtain the lock, you will see the network request as the ServerLockManager will add it to a waiting list. If you don’t see the request, troubleshoot the network and the ClientLockManager.

The Cache Droplet:

Cache Droplet caches content that changes infrequently used especially if it includes a lot of processing or DB interactions (Component /atg/dynamo/droplet/Cache)

<dsp:droplet name="/atg/dynamo/droplet/Cache">
<dsp:param name="key" value="${category.repositoryId}_${userLocale}"/>
<dsp:oparam name="output">
....
</dsp:oparam>
</dsp:droplet>

Example:
The main page of a typical ATG Commerce Website often has many targeters and other rich content which is computer-resource intensive.
Often, the easiest way to speed up the load time of a home page is to wrap any targeters within a

          The key parameter needs to be sufficiently unique. For example if your home page has a version for each locale then a key should be part of the cache key.

Required Input Parameters key
Lets you have more than one view of content based on a value that uniquely defines the view of the content. For example, if content is displayed one way for members and another for non-members, you can pass in the value of the member trait as the key parameter.

Optional Input Parameters
-hasNoURLs
Determines how cached URLs are rendered for future requests. By setting hasNoURLs to false, you specify that subsequent requests for the cached content causes URLs to be rewritten on the fly, assuming URL Rewriting is enabled. A setting of false for hasNoURLs causes URLs to be saved and rendered exactly as they are currently (without session or request IDs) regardless of whether URL rewriting is enabled.

-cacheCheckSeconds
The interval after content is cached until the cached is regenerated. If omitted, the interval is set from the defaultCacheCheckSeconds property in the Cache servlet bean’s properties file.

Open Parameters output
The code enclosed by the output open parameter is cached.

Clearing the cache:
You can determine how often data is flushed for a given Cache instance on a JSP or for all instances of Cache. To remove cached content associated with a particular instance of Cache, set the cacheCheckSession input parameter in the Cache instance to the frequency by which associated data should be expired. If you omit this parameter, the Cache.defaultCacheCheckSeconds property is used (default value is 60 seconds) .
The Cache.purgeCacheSeconds property determines how often content cached by any Cache servlet bean is flushed. The default is 21600 seconds (6 hours). Cache purging also occurs when a JSP is removed or recompiled.


2 comments:

  1. Great details and very helpful. Only thing I am still not clear is "item-cache-timeout" vs "item-expire-timeout". Both sounds similar, but, they must be serving different purpose as both are related to item level.

    One deals with expire of cache based on time, and the other deals with how long to hold the cache when the item is NOT accessed.

    But, there is a great deal of confusion around this area. In case if you have idea, can you please explain ?

    Thanks,
    Prakash

    ReplyDelete

  SyntaxError : (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape Solution:...