1) Disable caching is the simplest mechanism but may cost significant CPU performance. To get the highest performance processors are pipe-lined to run at high frequency, and to run from caches which offer a very low latency. Caching of data that is accessed multiple times increases performance significantly and reduces DRAM accesses and power. Marking data as “non-cached” could impact performance and power.

2) Hardware cache coherency schemes are commonly used as it benefits from better performance and lesser complexity and issues. Based on the number of processors/caches in the multiprocessor sub system and the cache policies - there are different coherency protocols that can be implemented in hardware to keep all caches coherent. Some of the commonly used schemes are snoop based protocols (works well for smaller number of processors/caches) and directory based protocols (more suitable for larger systems). These can further be classified based on update vs invalidate protocols (based on whether data gets transferred across caches or invalidate) and based on the number of states used to track the data in the cache (MESI / MESIF / MOESI / MOESIF)

3) For many years, cache has been used to speed up memory access and overcome bottlenecks in a uniprocessor computer systems. All the reasons for using caches with uniprocessors also exist with multiprocessor systems but exist more acutely. A typical shared-bus SMP system will have its performance severely limited by bus cycle time without using local cache memory. The existence of caches local to each processor in a multiprocessor system introduces the problem of cache coherence. Multiple copies of data could possibly exist in different caches simultaneously due to shared data structures or because of process migration between processors. Each processor must be sure that when it reads a line of memory from its cache, the line has not been previously overwritten in the cache of some other processor or in main memory due to a transaction initiated by another processor. This is the essence of the cache coherence problem for which several techniques have been applied.

4) In multiprocessor systems, the memory should provide a set of locations that hold values, and when a location is read it
should return the latest written value to that location. This property must be established to communicate data between
threads or processes running on one processor. One reading returns the latest written value to the location regardless of
which process wrote it. This question is known as the cache coherence problem. This kind of problems arises even in
uniprocessors when I/O operations occur. Most I/O transfers are performed by direct memory access (DMA) devices
that move data between the memory and the peripheral component without involving the processor. When the DMA
device writes to a location in main memory, unless special action is taken, the processor may continue to see the old
value if that location was previously present in its cache. The techniques and support which are used to solve the
multiprocessor cache coherence problem also solve the I/O coherence problem. Essentially all microprocessors today
provide support for multiprocessor cache coherence.