SCP²: Ways to Have a Nice Graduate Student Life

Self-motivated: Graduate student life has a very high degree of freedom, and you research for yourself, not advisor. Try to plan out several short-term milestones just like smashing down game quests.
Credible: Credibility is the most basic virtue for having a wonderful relationship with your colleagues and advisors.
Patient: Earning a degree is a marathon-like procedure. Hence, unless you are against the clock for the paper’s deadline, at least, please take a day off per week to prevent yourself from burn-out.
Productive: Please let your efforts bear fruit by publishing papers or patents after a project, as these will leave solid records on your CV. The third parties other than your colleagus almost always judge you based on official outputs.

How to Read Papers Properly?

Creativity originates from repeated learning and imitation – This is indeed true for any field. Historically, people have augmented various methods by learning from literatures. Since computer science has been quickly growing within decades, we cannot come up with novel ideas from the blank background. Thus, reading papers has to be one of your usual habits that should not be enforced by your advisors/managers, even if you are pursuing a career in the industry in the future.

Reading papers can be categorized into two ways: (1) PERUSE and (2) SKIM. Skimming is used for broad literature survey to find a topic, we quickly grasp the problems and high-level key idea from introduction and the front-end of methodology parts.

In fact, peruse is probably the high-priority method you need to firstly learn while doing research. To peruse, please keep the following questions in your mind:

What is the problem definition of this paper?
Is there a previous approach? What are the limitations of the previous approach?
What is the proposed key idea for overcoming the aforementioned limitations? What are the practical challenges to realizing the idea?
Why would the proposed approach outperform the previous approach from the technical perspective (i.e., rationale behind design choices)?
How do the authors address those challenges in detail in the methodology part?
In the evaluation part, how do the authors evaluate their idea? By simulation? Real machine-based emulation?
What are the evaluation configurations? What is the rationale behind those configurations?
What are the general tendencies of results? How do the authors explain some outliers?
Above all, what are the commonly used approaches to realize idea, considering papers in various fields?

Papers for Fresh Graduate Students

To get your feet wet at computer architecture (e.g., understand common architectural design and terminologies), I recommend you read the following high-quality papers published within decades!

Microarchitectures

J. A. Fisher et al., “Very Long Instruction Word Architectures and the ELI-512,” ISCA, 1983.
J. E. Smith et al., “The Microarchitecture of Superscalar Processors,” Proceedings of the IEEE, 1995.
K. Olukotun et al., “The Case for a Single-Chip Multiprocessor,” ASPLOS, 1996.
A. Moshovos et al., “JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers,” HPCA, 2001.
G. Hinton et al., “The Microarchitecture of the Pentium 4 Processor,” Intel Technology Journal Q1, 2001.
S. J. Eggers et al., “Simultaneous Multithreading: A Platform for Next-Generation Processors,” IEEE Micro, 2002.
O. Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors,” HPCA, 2003.
P. Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro, 2005.
E. Lindholm et al., “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, 2008.
A. Jaleel et al., “Achieving Non-Inclusive Cache Performance with Inclusive Caches,” MICRO, 2010.
E. Blem et al., “Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures,” HPCA, 2013.

Proper uses of performance metrics

L. K. John, “More on Finding a Single Number to Indicate Overall Performance of a Benchmark Suite,” ACM SIGARCH Computer Architecture News, 2004.
A. A. Alameldeen et al., “IPC Considered Harmful for Multiprocessor Workloads,” IEEE Micro, 2006.
S. Eyerman et al., “Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance, IEEE Computer Architecture Letters, 2014.
L. Eeckhout, “R. I. P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead,” IEEE Computer Architecture Letters, 2025.

Data prefetch

S. Palacharla et al., “Evaluating Stream Buffers as a Secondary Cache Replacement”, ISCA, 1994.
K. J. Nesbit et al., “Data Cache Prefetching using a Global History Buffer”, HPCA, 2004.
S. Srinath et al., “Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers”, HPCA, 2007.
S. H. Pugsley et al., “Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers,” HPCA, 2014.
M. Shevgoor et al., “Efficiently Prefetching Complex Address Patterns”, MICRO, 2015.
J. Kim et al., “Path Confidence based Lookahead Prefetching,” MICRO, 2016.

Virtual memory

J. Huck et al., “Architectural Support for Translation Table Management in Large Address Space Machine,” ISCA, 1993.
J. Navarro et al., “Practical, Transparent Operating System Support for Superpages,” OSDI, 2002.
A. Seznec, “Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB,” IEEE Transactions on Computers, 2003.
R. Bhargava et al., “Accelerating Two-Dimensional Page Walks for Virtualized Systems,” ASPLOS, 2008.
T. W. Barr et al., “Translation Caching: Skip, Don’t Walk (the Page Table),” ISCA, 2010.
B. Pichai et al., “Architectural Support for Address Translation on GPUs,” ASPLOS, 2014.
J. Power et al., “Supporting x86-64 Address Translation for 100s of GPU Lanes,” HPCA, 2014.

Non-volatile memory

B. Yang et al., “A Low Power Phase-Change Random Access Memory using a DCW Scheme,” ISCAS, 2007.
M. K. Qureshi et al., Scalable High Performance Main Memory System Using PCM Technologies,” ISCA, 2009.
B. C. Lee et al., “Architecting Phase Change Memory as a Scalable DRAM Alternative,” ISCA, 2009.
S. Cho et al., “Flip-N-write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance,” MICRO, 2009.
M. K. Qureshi et al., “Enhancing Lifetime and Security of PCM-Based Main Memory with Start-Gap Wear Leveling,” MICRO, 2009.
S. Schechter et al., “Use ECP, not ECC, for Hard Failures in Resistive Memories,” ISCA, 2010.
M. K. Qureshi et al., “Security Refresh,” ISCA, 2011.
R. Wang et al., “SD-PCM: Constructing Reliable Super Dense Phase Change Memory under Write Disturbance,” ASPLOS, 2016.
X. Liu et al., “Binary Star: Coordinated Reliability in Heterogeneous Memory Systems for High Performance and Scalability,” MICRO, 2019.

DRAM and RowHammers

Y. Kim et al., “A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM,” ISCA, 2012.
Y. Kim et al., “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA, 2014.
E. Lee et al., “TWiCe: Preventing Row-hammering by Exploiting Time Window Counters,” ISCA, 2019.
J. S. Kim et al., “Revisiting RowHammer: An Experimental Analysis of Modern DRAM Devices and Mitigation Techniques,” ISCA, 2020.

Memory compression

R. B. Tremaine et al., “IBM Memory Expansion Technology (MXT),” IBM Journal of Research and Development, 2001.
D. A. Wood et al., “Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches,” Technical Report 1500 UW-Madison, 2004.
X. Chen et al., “C-PACK: A High-Performance Microprocessor Cache Compression Algorithm,” IEEE TVLSI, 2010.
G. Pekhimenko et al., “Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches,” PACT, 2012.
G. Pekhimenko et al., “Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework,” MICRO, 2013.
J. Kim et al., “Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures,” ISCA, 2016.
E. Choukse et al., “Compresso: Pragmatic Main Memory Compression,” MICRO, 2018.

Memory security

B. Gassend et al., “Caches and Merkle Trees for Efficient Memory Authentication,” HPCA, 2003.
C. Yan et al., “Improving Cost, Performance, and Security of Memory Encryption and Authentication,” ISCA, 2006.
B. Rogers et al., “Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly,” MICRO, 2007.
L. Ren et al., “Design Space Exploration and Optimization of Path Oblivious RAM in Secure Processors,” ISCA, 2013.
S. Gueron, “A Memory Encryption Engine Suitable for General Purpose Processors,” eprint iacr, 2016.
A. Awad et al., “Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers,” ASPLOS, 2016.
A. Awad et al, “Triad-NVM: Persistency for Integrity-Protected and Encrypted Non-Volatile Memories,” ISCA, 2019.
K. A. Zubair et al., “Anubis: Ultra-Low Overhead and Recovery Time for Secure Non-Volatile Memories,“ ISCA, 2019.
J. Zhou et al., “Lelantus: Fine-Granularity Copy-on-Write Operations for Secure Non-Volatile Memories,” ISCA, 2020.

Chiplets

D. Stow et al., “Cost Analysis and Cost-Driven IP Reuse Methodology for SoC design Based on 2.5D/3D Integration,” ICCAD, 2016.
A. Arunkumar et al., “MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability,” ISCA, 2017.
S. Naffziger et al., “Pioneering Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families,” ISCA, 2021.

Disaggregation

K. Lim et al., “Disaggregated Memory for Expansion and Sharing in Blade Servers,” ISCA, 2009.
J. Gu et al., “Efficient Memory Disaggregation with INFINISWAP,” USENIX NSDI, 2017.
Y. Shan et al., “LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation,” USENIX OSDI, 2018.
I. Calciu et al., “Rethinking Software Runtimes for Disaggregated Memory,” ASPLOS, 2021.
S.-S. Lee et al., “MIND: In-Network Memory Management for Disaggregated Data Centers,” SOSP, 2021.

Near data processing

J. Ahn et al., A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing,” ISCA, 2016.
Hadi et al., “Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems,” ISCA, 2016.
M. Alian et al., “Application-Transparent Near-Memory Processing Architecture with Memory Channel Network,” MICRO, 2018.
M. He et al., “Newton: A DRAM-maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning,” MICRO, 2018.
S. Lee et al., “Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product,” ISCA, 2021.
L. Ke et al., “Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM,” IEEE Micro, 2022.

Tips for Fresh Gradudate Students

SCP2: Ways to Have a Nice Graduate Student Life