A Technology Blog About Code Development, Architecture, Operating System, Hardware, Tips and Tutorials for Developers.

Showing posts with label GARBAGE COLLECTION. Show all posts
Showing posts with label GARBAGE COLLECTION. Show all posts

Tuesday, August 14, 2012

SPEED UP WITH MEMCACHED

Is your website running into performance bottlenecks? Does the database or backend feel like a really expensive resource, even though you’ve got a huge cluster set up to improve parallel processing? Read on to find out why you should be including Memcached, in your SOA based application.



Caching is a concept that almost all developers use in some form or the other in their applications. It’s basically about storing a piece of information in memory so that it can be retrieved quickly. Caching is mostly used for data that is accessed repeatedly, so that instead of calculating/retrieving from the disk repeatedly, which takes time, we can instead directly look it up in the cache, which is much faster. Caches can be used at multiple places in the application stack, so you have quite a few options when it comes to choosing where to cache, what to cache and how to cache.

Here are some of the techniques on how and where you might like to cache data:

  • Browser caching: As Web developers might be aware, some data can be cached on the client-side in the browser, like images, etc., so that they are automatically used when repeated requests for that resource are made.
  • Server-side caching: Data or objects can alternatively be cached on the server-side itself. This can either be a local server cache, a centralised caching server or a distributed cache.
  • Local database query cache: A good database caches the database queries or data internally, so as to improve the speed of looking up data as well as the performance of the database.
You may choose to implement a cache in one way or another, or you might use a combination of more than one technique to cache different types of data at separate levels. But more importantly, it is helpful to know whether you even need caching in the particular application/use-case you are thinking about.

Most people, in the process of implementing a cache, actually lose out because it was wrongly implemented. So the cache ends up slowing down the application, instead of speeding it up. Getting fancy software with fancy features doesn’t always make sense, but using even the modest ones in the right way, does.

Why you need Memcached

This discussion assumes that you have set up a cluster, and you want to implement caching. In this case, what happens if you start caching on each node independently? You will see that some nodes face memory issues, while others have quite a bit of memory left. Moreover, most of the data stored in their individual caches is redundant.

This calls for a centralised caching mechanism that makes sense, such that the data being cached is evenly distributed and unique for the whole cluster. And memcached is the right thing to choose.

It provides a solution in which the available memory in the cache is the sum of that on all nodes on which the Memcached instance is running. So if, for example, you have 10 nodes, with each being allocated 1 GB of memory for caching, you get a total of 10 GB of cache available for the whole cluster. Here are some features in Memcached that might lure you into using it within the context of your application:

  • Easy scalability: This feature is applicable for almost any software with the tag of “distributed”, but still, it is worth noting that Memcached needs minimal configuration to add a new node, with almost no special interconnect requirements. Your available memory in the cache just increases on the fly.
  • Hidden complexity: Memcached hides beneath it all the complexity associated with storing/retrieving the data from the cache. All we need to provide is the key associated with the data. The whole task of determining which node to store the data on, or to retrieve it from, is done by the Memcached client itself.
  • Minimal impact of a node failure: Even if a particular Memcached node does fail, it has almost no impact on the overall cache other than reducing the available memory, and a minor increase in the number of cache misses.
  • Flexible architecture: Memcached does not impose a restriction on all nodes to have a uniform cache size. So, some of your nodes with less physical memory can be set up to contribute perhaps only 512 MB to the cluster, while others may have 2 GB of memory dedicated for the Memcached instance. Apart from this, you can even run more than one instance of Memcached on a single node.
  • Multiple clients available: Memcached has client APIs available for various languages like PHP, C++, Java, Python, Ruby, Perl, .NET, Erlang, ColdFusion and even more.
  • Cross-platform: Memcached is available for a wide variety of platforms including Linux, BSD and Windows.
  • Multi-fetch: With the help of this feature, we can request values for more than one key at once, instead of querying them in a loop, one by one, which takes a lot of network round-trips.
  • Constant time functions: It takes the same amount of time to perform an operation in memory, whether it is a single key or a hundred. This corresponds to the Multi-fetch feature discussed before.



Saturday, December 25, 2010

GARBAGE COLLECTION IN JAVA - PERFORMANCE CONSIDERATION

11:54:00 AM Posted by Satish , , , , No comments


There are two primary measures of garbage collection performance. Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time. Throughput includes time spent in allocation (but tuning for speed of allocation is generally not needed.) Pauses are the times when an application appears unresponsive because garbage collection is occurring.

Users have different requirements of garbage collection. For example, some consider the right metric for a web server to be throughput, since pauses during garbage collection may be tolerable. However, in an interactive graphics program even short pauses may negatively affect the user experience.

Some users are sensitive to other considerations. Footprint is the working set of a process, measured in pages and cache lines. On systems with limited physical memory or many processes, footprint may dictate scalability. Promptness is the time between when an object becomes dead and when the memory becomes available, an important consideration for distributed systems, including remote method invocation (RMI).

In general, a particular generation sizing chooses a trade-off between these considerations. For example, a very large young generation may maximize throughput, but does so at the expense of footprint, promptness, and pause times. young generation pauses can be minimized by using a small young generation at the expense of throughput. To a first approximation, the sizing of one generation does not affect the collection frequency and pause times for another generation.

There is no one right way to size generations. The best choice is determined by the way the application uses memory as well as user requirements. For this reason the virtual machine's choice of a garbage collector are not always optimal, and may be overridden by the user in the form of command line options.

GARBAGE COLLECTION IN JAVA - GENERATIONS

11:31:00 AM Posted by Satish , , , , 1 comment
At initialization, a maximum address space is virtually reserved but not allocated to physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and tenured generations.

The young generation consists of eden plus two survivor spaces. Objects are initially allocated in eden. One survivor space is empty at any time, and serves as a destination of the next, copying collection of any live objects in eden and the other survivor space. Objects are copied between survivor spaces in this way until they are old enough to be tenured, or copied to the tenured generation.
Other virtual machines, including the production virtual machine for the J2SE Platform version 1.2 for the Solaris Operating System, used two equally sized spaces for copying rather than one large eden plus two small spaces. This means the options for sizing the young generation are not directly comparable.





A third generation closely related to the tenured generation is the permanent generation. The permanent generation is special because it holds data needed by the virtual machine to describe objects that do not have an equivalence at the Java language level. For example objects describing classes and methods are stored in the permanent generation.