THESIS
2016
xiii, 69 pages : illustrations ; 30 cm
Abstract
As computer technology has progressed leaps and bounds over the last few decades,
we have reached a point where further performance enhancements can't be achieved by
feature size scaling alone due to the inherent physical limitations. The performance gains
of traditional powerful computing systems by increasing the operating frequencies are
now only giving diminishing returns due to the infamous memory wall, ILP (Instruction
Level Parallelism) wall and power wall issues. Therefore, we are now into a phase
where multiple computing cores are brought together into a single processor to boost
performance by exploiting the inherent parallelism in complex applications. This has led
to the shift in focus from computation to communication. The interconnection network
between processors...[
Read more ]
As computer technology has progressed leaps and bounds over the last few decades,
we have reached a point where further performance enhancements can't be achieved by
feature size scaling alone due to the inherent physical limitations. The performance gains
of traditional powerful computing systems by increasing the operating frequencies are
now only giving diminishing returns due to the infamous memory wall, ILP (Instruction
Level Parallelism) wall and power wall issues. Therefore, we are now into a phase
where multiple computing cores are brought together into a single processor to boost
performance by exploiting the inherent parallelism in complex applications. This has led
to the shift in focus from computation to communication. The interconnection network
between processors and memory defines the memory latency and memory bandwidth, which have greater bearing on the system performance than the compute power of processors
themselves. Providing efficient communication infrastructure between the multiple cores
and the off-chip memory has become more and more imperative. Network-on-Chip (NoC)
paradigm enable efficient and scalable infrastructure for future interconnection networks.
The worst-case network throughput is an important performance metric of interconnection
networks and it is of particular concern to real-time systems. The existing model for
throughput analysis assumes that the router nodes will not constrain the throughput,
and hence the ideal throughput of the network is determined completely by congestion
in the channels/links. In many recently proposed NoCs, however, the real router design
demonstrates significant impact on the delivered throughput. This work thus re-examines the worst case throughput issue considering the router model. In the first part of this
thesis, we present an extended framework for analytically evaluating the throughput
constrained by both the routers and the channels with typical network and router settings,
and commonly used routing algorithms including DOR and ROMM.
Secondly, we address the major performance bottleneck in heterogeneous chip multiprocessors,
namely, the memory bandwidth between the on-chip system and off-chip
memory. We use multi-criteria optimization for efficient memory controller placement
schemes to optimize off-chip memory access for general heterogeneous applications and
extend it for domain-specific applications. Experimental results demonstrate that our
method can accelerate such applications by improving the average network latency and
link utilization with minimal change to network components over existing schemes.
Post a Comment