Java内存模型

JSR-133（Java 5）中定义了Java的内存模型。

指令重排

For example, if a thread writes to field a and then to field b, and the value of b does not depend on the value of a, then the compiler is free to reorder these operations, and the cache is free to flush b to main memory before a. There are a number of potential sources of reordering, such as the compiler, the JIT, and the cache.

一个例子：

r1 和 r2都是局部变量，A和B是共享变量
初始化A == B == 0

Thread 1	Thread 2
1: r2 = A;	3: r1 = B;
2: B = 1;	4: A = 2;

按照如上的代码运行，理论上不应该出现 r2 == 2, r1 == 1的情形。因为r2如果为2说明线程2先执行，这时候B还是为0，所以r1应该为0。

但是实际上编译器在不改变单线程执行的语义的情况下，是可以对指令进行重新排序的，如：

Thread 1	Thread 2
B = 1;	        r1 = B;
r2 = A;	        A = 2;

因此会导致上述问题（forward substitution）。另一个例子（p==q,p.x=0）：

Thread 1	Thread 2
r1 = p;	        r6 = p;
r2 = r1.x;	r6.x = 3;
r3 = q;	 
r4 = r3.x;	 
r5 = r1.x;

这里编译器可能会复用r2读取到的值给r5。

happen before

If one action happens-before another, then the first is visible to and ordered before the second.

An unlock on a monitor happens before every subsequent lock on that same monitor.
A write to a volatile field happens before every subsequent read of that same volatile.
A call to start() on a thread happens before any actions in the started thread.
All actions in a thread happen before any other thread successfully returns from a join() on that thread.
The default initialization of any object happens-before any other actions (other than default-writes) of a program.

volatile

内存屏障

// C++ implementation with explicit memory barriers
// Should work on any platform, including DEC Alphas
// From "Patterns for Concurrent and Distributed Objects",
// by Doug Schmidt
template <class TYPE, class LOCK> TYPE *
Singleton<TYPE, LOCK>::instance (void) {
    // First check
    TYPE* tmp = instance_;
    // Insert the CPU-specific memory barrier instruction
    // to synchronize the cache lines on multi-processor.
    asm ("memoryBarrier");
    if (tmp == 0) {
        // Ensure serialization (guard
        // constructor acquires lock_).
        Guard<LOCK> guard (lock_);
        // Double check.
        tmp = instance_;
        if (tmp == 0) {
                tmp = new TYPE;
                // Insert the CPU-specific memory barrier instruction
                // to synchronize the cache lines on multi-processor.
                asm ("memoryBarrier");
                instance_ = tmp;
        }
    return tmp;
    }

On all processors discussed below, it turns out that instructions that perform StoreLoad also obtain the other three barrier effects, so StoreLoad can serve as a general-purpose (but usually expensive) Fence.

Plus the special final-field rule requiring a StoreStore barrier in

x.finalField = v; StoreStore; sharedRef = x;

LoadLoad Barriers

Load1; LoadLoad; Load2

ensures that Load1's data are loaded before data accessed by Load2 and all subsequent load instructions are loaded. In general, explicit LoadLoad barriers are needed on processors that perform speculative loads and/or out-of-order processing in which waiting load instructions can bypass waiting stores. On processors that guarantee to always preserve load ordering, the barriers amount to no-ops.

StoreStore Barriers

Store1; StoreStore; Store2

ensures that Store1's data are visible to other processors (i.e., flushed to memory) before the data associated with Store2 and all subsequent store instructions. In general, StoreStore barriers are needed on processors that do not otherwise guarantee strict ordering of flushes from write buffers and/or caches to other processors or main memory.

LoadStore Barriers

Load1; LoadStore; Store2

ensures that Load1's data are loaded before all data associated with Store2 and subsequent store instructions are flushed. LoadStore barriers are needed only on those out-of-order procesors in which waiting store instructions can bypass loads.

StoreLoad Barriers

Store1; StoreLoad; Load2

ensures that Store1's data are made visible to other processors (i.e., flushed to main memory) before data accessed by Load2 and all subsequent load instructions are loaded. StoreLoad barriers protect against a subsequent load incorrectly using Store1's data value rather than that from a more recent store to the same location performed by a different processor. Because of this, on the processors discussed below, a StoreLoad is strictly necessary only for separating stores from subsequent loads of the same location(s) as were stored before the barrier. StoreLoad barriers are needed on nearly all recent multiprocessors, and are usually the most expensive kind. Part of the reason they are expensive is that they must disable mechanisms that ordinarily bypass cache to satisfy loads from write-buffers. This might be implemented by letting the buffer fully flush, among other possible stalls.

volatile实现

线程变量存在于公共堆栈和私有堆栈中，当JVM以-server模式启动时，为了提高线程运行时效率，线程一直在私有堆栈中取值。设置成volatile后，则会强制从公共堆栈中取值。使用volatile关键字增加了实例变量在多个线程之间的可见性。

volatile不能保证原子性。

对于 volatile的变量，java保证每次都是从主存中读取（而不是线程的局部变量中）
其读取都是原子的（包括long和double）

volatile只保证可见性，但是JVM规范中没有提及其是否会禁止指令重排！

class X {                                       
    int a, b;                       
    volatile int v, u;                            
    void f() {                       
      int i, j;                       
                            
      i = a;   // load a                    
      j = b;   // load b                    
      i = v;   // load v                    
               //     LoadLoad            
      j = u;   // load u                    
               //     LoadStore            
      a = i;   // store a                   
      b = j;   // store b                   
               //     StoreStore        
      v = i;   // store v                   
               //     StoreStore        
      u = j;   // store u                   
               //     StoreLoad        
      i = u;   // load u                   
               //     LoadLoad        
               //     LoadStore        
      j = b;   // load b                    
      a = i;   // store a                   
    }                       
  }

volatile使用场景

典型的应用是利用volatile变量控制循环退出。一般使用时应该满足如下的所有的原则：

对变量对写入操作不依赖于变量的当前值，或者保证只有一个线程更新变量的值
该变量不会与其他状态变量一起纳入不变性条件中
访问变量时不需要加锁

synchronized

mutual exclusion: only one thread can hold a monitor at once, so synchronizing on a monitor means that once one thread enters a synchronized block protected by a monitor, no other thread can enter a block protected by that monitor until the first thread exits the synchronized block.
ensures that memory writes by a thread before or during a synchronized block are made visible in a predictable manner to other threads which synchronize on the same monitor.
- After we exit a synchronized block, we release the monitor, which has the effect of flushing the cache to main memory, so that writes made by this thread can be visible to other threads.
- Before we can enter a synchronized block, we acquire the monitor, which has the effect of invalidating the local processor cache so that variables will be reloaded from main memory. We will then be able to see all of the writes made visible by the previous release.

volatile和synchronized比较

volatile是线程同步的轻量级实现，性能稍优于synchronized，volatile只能修饰变量。
多线程访问volatile不会导致阻塞，但synchronized会出现阻塞
volatile能保证数据的可见性，但不能保证原子性；而synchronized可以保证原子性，间接保证了可见性
volatile解决变量在多个线程之间的可见性，而synchronized解决的是多个线程之间访问资源的同步性。
synchronized可以保证互斥性和可见性，保证进入同步方法或者代码块的每个线程都看到由同一个锁保护之前所有的修改效果。

Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them.

Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire.

In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

class VolatileExample {
  int x = 0;
  volatile boolean v = false;
  public void writer() {
    x = 42;
    v = true;
  }

  public void reader() {
    if (v == true) {
      //uses x - guaranteed to see 42.
    }
  }
}

see: