Publications
Author key: My Student, *(co-advised student with Prof. Solihin)
- D. Adak, H. Zhou, E. Rotenberg, and A. Awad, “SpecMPK: Efficient In-Process Isolation with Speculative and Secure Permission Update Instruction”, The 31st International Symposium on High Performance Computer Architecture (HPCA-31), 2025.
- A. Meher, Y. Liu and H. Zhou, “Error Mitigation of Hamiltonian Simulations from an Analog-based Compiler (SimuQ)“, IEEE International Conference on Quantum Computing and Engineering (QCE24), 2024.
- S. Mohapatra and H. Zhou, “Understanding Error Sensitivity of Quantum Circuits“, IEEE International Conference on Quantum Computing and Engineering (QCE24), 2024.
- D. Baron, H. Patil, and H. Zhou, “Qubit-Wise Majority Vote: Maximum Likelihood Quantum Error Mitigation for Algorithms with a Single Correct Output“, IEEE International Conference on Quantum Computing and Engineering (QCE24), 2024.
- S. Faghih and H. Zhou, “Dynamic Runtime Assertions in Quantum Ternary Systems“, IEEE International Conference on Quantum Computing and Engineering (QCE24), 2024.
- A. Yudha, J. Xue, Q. Lou, H. Zhou, and, Y. Solihin, “BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons“, the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2024.
- S. Yuan, A. Awad, and H. Zhou, “Delta Counter: Bandwidth-Efficient Encryption Counter Representation for Secure GPU Memory“, IEEE Transactions on Dependable and Secure Computing (TDSC), April, 2024.
- Y. Jin., Z. Li, F. Hua, T. Hao, H. Zhou, Y. Huang, and E. Zhang, “Tetris: A Compilation Framework for VQA Applications in Quantum Computing“, the 51st International Symposium on Computer Architecture (ISCA), 2024.(Artifact available) (Distinguished Artifact Award)
- P. Li, J. Liu, A. Gonzales, Z. Saleem, H. Zhou, and P. Hovland, “QuTracer: Mitigating Quantum Gate and Measurement Errors by Tracing Subsets of Qubits”, the 51st International Symposium on Computer Architecture (ISCA), 2024. (Artifact available) (Best Paper Candidate)
- R. Abdullah, H. Lee, H. Zhou, and A. Awad, “Salus: Efficient Security Support for CXL-Expanded GPU Memory“, The 30th International Symposium on High Performance Computer Architecture (HPCA-30), 2024. (Artifact available)
- P. Li, J. Liu, H. Patil, P. Hovland, and H. Zhou, “Enhancing Virtual Distillation with Circuit Cutting for Quantum Noise Mitigation“, the 41st IEEE International Conference on Computer Design (ICCD-2023), 2023.
- Y. Tozlu and H. Zhou, “PBVR: Physically Based Rendering in Virtual Reality”, The 2023 IEEE International Symposium on Workload Characterization (IISWC-2023), 2023. (Artifact)
- H. Patil, P. Li, J. Liu, and H. Zhou, “Folding-Free ZNE: A Comprehensive Quantum Zero-Noise Extrapolation Approach for Mitigating Depolarizing and Decoherence Noise“, the IEEE International Conference on Quantum Computing and Engineering (QCE’23), 2023.
- A. Frejj*, H. Zhou, and Y. Solihin, “SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers”, The 29th International Symposium on High Performance Computer Architecture (HPCA-29), 2023.
- R. Abdullah, H. Zhou, and A. Awad, “Plutus: Bandwidth-Efficient Memory Security for GPUs”, The 29th International Symposium on High Performance Computer Architecture (HPCA-29), 2023.
- P. Li, J. Liu, Y. Li, and H. Zhou, “Exploiting Quantum Assertions for Error Mitigation and Quantum Program Debugging”, a special session paper in the 40th IEEE International Conference on Computer Design (ICCD-2022), 2022.
- A. Yudha, J. Meyer, S. Yuan, H. Zhou, and Y. Solihin, “LITE: a Low-Cost Practical Inter-Operable GPU TEE”, The 36th ACM International Conference on Supercomputing (ICS-2022), 2022.
- S. Yuan, A. Awad, A. Yudha, Y. Solihin, and H. Zhou, “Adaptive Security Support for Heterogeneous Memory on GPUs”, The 28th International Symposium on High Performance Computer Architecture (HPCA-28), Feb. 2022.
- J. Liu, P. Li, and H. Zhou, “Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit Routing”, The 28th International Symposium on High Performance Computer Architecture (HPCA-28), Feb. 2022. (HPCA-28 Distinguished Artifact Award)
- J. Ravi, T. Nguyen, H. Zhou, and M. Becchi, “PILOT: a Runtime System to Manage Multi-Tenant GPU Unified Memory Footprint”, 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 2021.
- A. Frejj*, H. Zhou, and Y. Solihin, “Bonsai Merkle Forests: Efficiently Achieving Crash Consistency in Secure Persistent Memory, The 54th International Symposium on Microarchitecture (MICRO–54), 2021.
- S. Yuan, Y. Solihin, and H. Zhou, “PSSM: Achieving Secure Memory for GPUs with Partitioned and Sectored Security Metadata”, The 35th ACM International Conference on Supercomputing (ICS-2021), 2021.
- S. Yuan, A. Yudha, Y. Solihin, and H. Zhou, “Analyzing Secure Memory Architecture for GPUs”, the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’2021), Mar. 2021
- J. Liu and H. Zhou, “Systematic Approaches for Precise and Approximate Quantum State Runtime Assertion”, The 27th International Symposium on High Performance Computer Architecture (HPCA-27), Feb. 2021.
- J. Liu, L. Bello, and H. Zhou, “Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits”, International Symposium on Code Generation and Optimization (CGO-2021), 2021. (Artifact included)
- J. Liu and H. Zhou, “Reliability Modeling of NISQ-Era Quantum Computers”, The 2020 IEEE International Symposium on Workload Characterization (IISWC-2020), 2020.
- A. Yudha, K. Kimura, H. Zhou, and Y. Solihin, “Scalable and Fast Lazy Persistency on GPUs”, The 2020 IEEE International Symposium on Workload Characterization (IISWC-2020), 2020.
- A. Frejj*, S. Yuan, H. Zhou, and Y. Solihin, “Persist-Level Parallelism: Streamlining Integrity Tree Updates for Secure Non-Volatile Memory”, The 53rd International Symposium on Microarchitecture (MICRO–53), 2020.
- J. Liu, A. Kafi, X. Shen and H. Zhou, “MKPipe: A Compiler Framework for Optimizing Multi-Kernel Workloads in OpenCL for FPGA”, The 34th ACM International Conference on Supercomputing (ICS-2020), 2020.
- J. Liu, G. Byrd, and H. Zhou, “Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation”, The 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2020),2020.(Artifact: benchmarks and source code)
- C. Zhao, et al., Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU. Future Gener. Comput. Syst. 112: 1093-1105 (2020)
- H. Guan, L. Ning, Z. Lin, X. Shen, H. Zhou, and S. Lim, “In-Place Zero-Space Memory Protection for CNN”, 23rd Conf. on Neural Information Processing Systems (NeurIPS), 2019.
- H. Zhou and G. T. Byrd, “Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation,” in IEEE Computer Architecture Letters (CAL). 2019. doi: 10.1109/LCA.2019.2935049
- Zhen Lin, Mohammad Alshboul, Yan Solihin, and Huiyang Zhou, “Exploring Memory Persistency Models for GPUs”, in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2019.
- Z. Lin, H. Dai, M. Mantor, and H. Zhou, “Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution”, in ACM Transactions on Architecture and Code Optimization (TACO), 2019.
- Z. Lin, U. Mathur, and H. Zhou, “Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs”, The 12th workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-2019), 2019. (Source code)
- Y. Zhong, C. Li, H. Zhou, and G. Wang, “Developing Noise-Resistant Three-Dimensional Single Particle Tracking Using Deep Neural Networks”, Analytical Chemistry, 2018.
- H. Dai, Z. Lin, C. Li, C. Zhao, F. Wang, N. Zheng, and H Zhou, “Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls”, in 24th International Symposium on High Performance Computer Architecture (HPCA-24), Feb. 2018.
- Z. Lin, M. Mantor, and H. Zhou, “GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and A Novel Way to Improve TLP”, in ACM Transactions on Architecture and Code Optimization (TACO), Issue 1, 2018
- H. Dai, C. Li, Z. Lin and H. Zhou, “The Demand for a Sound Baseline in GPU Memory Architecture Research”, 14th Annual Workshop on Duplicating, Deconstructing and Debunking (WDDD), held with ISCA-2017, 2017. (Source code)
- A. Verma, H. Zhou, S. Booth, R. King, J. Coole, J. Marshall, A. Keep, and W. Feng, “Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs”, in the 54th Design Automation Conference (DAC-2017), 2017. (Sample code in github)
- G. Chen, Y. Zhao, X. Shen, and H. Zhou, “EffiSha: A Software Framework for Enabling Efficient Preemptive Scheduling of GPU”, in the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’17), 2017.
- Y. Zhang, S. Li, S. Yan, and H. Zhou, “A Cross-Platform SpMV Framework on Many-Core Architectures”, ACM Transactions on Architecture and Code Optimization (TACO), Vol. 13, Issue 4, Nov. 2016.
- C. Zhao, F. Wang, Z. Lin, H. Zhou, and N. Zheng, “Selective GPU Cache Bypassing for Un-Coalesced Loads”, in the 22nd IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2016.
- Z. Lin, L. Nyland, and H. Zhou, “Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching”, in the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’16), 2016
- C. Li, Y. Yang, M. Feng, C. Srimat, and H. Zhou, “Optimizing Memory Efficiency for Deep Convolutional Networks on GPUs”, in the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’16), 2016. (best student paper finalist)
- Q. Jia and H. Zhou, “Tuning Stencil Codes in OpenCL for FPGAs”, in the 34th IEEE International Conference on Computer Design (ICCD-2016), 2016. (source code)
- G. Chen, H. Zhou, X. Shen, J. Gahm, N. Venkat, S. Booth and J. Marshall, “OpenCL-Based Erasure Coding on Heterogeneous Architectures”, in the 27th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2016), 2016. (presentation, containing the results of the FPGA on an Intel Broadwell processor using SVM)
- H. Dai, S. Gupta, C. Li, C. Kartsaklis, M. Mantor, H. Zhou, “A Model-Driven Approach to Warp/Thread-Block Level GPU Cache Bypassing”, in the 53rd Design Automation Conference (DAC-2016), 2016.
- S. Gupta and H. Zhou, “Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing”, in the 44th International Conference on Parallel Processing (ICPP 2015), Sept. 2015.
- Q. Jia, M. B. Padia, K. Amboju, and H. Zhou, “An Optimized AMPM-based Prefetcher Coupled with Configurable Cache Line Sizing”, JILP Workshop on Computer Architecture Competitions (JWAC): 2nd Data Prefetching Championship (DPC2), held with ISCA-42, June 2015.
- C. Li, S. Song, H. Dai, A. Sidelnik, S. Hari, and H. Zhou, “Locality-Driven Dynamic GPU Cache Bypassing”, in the 29th International Conference on Supercomputing (ICS’15), June 2015.
- K. Mayank, H. Dai, J. Wei and H. Zhou “Analyzing Graphics Processor Unit (GPU) Instruction Set Architectures”, Poster paper in the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’2015), Mar. 2015
- P. Xiang, Y. Yang, M. Mantor, N. Rubin and H. Zhou, “Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture”, the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2015), May 2015
- Y. Yang, C. Li, and H. Zhou, “CUDA-NP: Realizing nested thread-level parallelism in GPGPU applications.” Journal of Computer Science and Technology (JCST) 30(1): 3–19 Jan. 2015
- C. Li, Y. Yang, Z. Lin, and H. Zhou, “Automatic Data Placement into GPU On-Chip Memory Resources”, 2015 ACM International Symposium on Code Generation and Optimization (CGO’2015), Feb., 2015. (talk)
- Y. Yang and H. Zhou, “A Highly Efficient FFT Using Shared-Memory Multiplexing”, a book chapter in Numerical Computations with GPUs (Editor: Volodymyr Kindratenk), Springer 2014. (source code)
- Y. Yang, P. Xiang, M. Mantor, N. Rubin, L. Hsu, Q. Dong and H. Zhou, “A Case for a Flexible Scalar Unit in SIMT Architecture”, in the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS’2014), May 2014
- C. Li, Y. Yang, H. Dai, S. Yan, F. Mueller and H. Zhou, “Understanding the Tradeoffs between Software-Managed vs. Hardware-Managed Caches in GPUs”, in the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’2014), March 2014 (Talk)
- P. Xiang, Y. Yang, and H. Zhou, “Warp-Level Divergence in GPUs: Characterization, Impact and Mitigation”, in the 20th International Symposium on High Performance Computer Architecture (HPCA-20), Feb., 2014. (Talk)
- Y. Yang and H. Zhou, “CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications”, in the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14), Feb. 2014. (Talk)
- S. Yan, C. Li, Y. Zhang, and H. Zhou, “yaSpM: Yet Another SpMV Framework on GPUs”, in the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14), Feb. 2014. (Source code) (Talk)
- S. Gupta, P. Xiang, and H. Zhou, “Analyzing locality of memory references in GPU architectures”, in ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC), co-located with PLDI 2013, June 2013.
- P. Xiang, Y. Yang, M. Mantor, N. Rubin, L. Hsu, and H. Zhou, “Exploiting Uniform Vector Instructions for GPGPU Performance, Energy Efficiency, and Opportunistic Reliability Enhancement“, in the 27th International Conference on Supercomputing (ICS’13), June 2013.
- S. Gupta, H. Gao, and H. Zhou, “‘Adaptive Cache Bypassing for Inclusive Last Level Caches”, in the 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), May 2013.
- S. Gupta, P. Xiang, Y. Yang, and H. Zhou, “Locality principle revisited: A probability-based quantitative approach“, Journal of Parallel and Distributed Computing (JPDC), 73(7): 1011-1027, 2013 (a special issue on the Best Papers: International Parallel and Distributed Processing Symposium (IPDPS) 2010, 2011 and 2012)
- Y. Yang, P. Xiang, M. Mantor, N. Rubin, and H. Zhou, “Shared Memory Multiplexing: A Novel Way to Improve GPGPU Throughput”, in the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12), Sept. 2012.
- Y. Yang, P. Xiang, M. Mantor, and H. Zhou, “Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs”, in the 41st International Conference on Parallel Processing (ICPP 2012), Sept. 2012.
- Y. Yang and H. Zhou, “The Implementation of a High Performance GPGPU Compiler”, in International Journal of Parallel Programming (IJPP) 41(6): 768-781, 2013..
- J. Kong, O. Acıiçmez, J.-P. Seifert and H. Zhou, “Architecting Against Software Cache-based Side Channel Attacks”, in IEEE Trans. Computers (TC) 62(7): 1276-1288 (2013).
- S. Gupta, P. Xiang, Y. Yang, and H. Zhou, “Locality Principle Revisited: A Probability-Based Quantitative Approach”, in the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012) (best paper in the architecture track), May, 2012. (locality computation code)
- Y. Yang, P. Xiang, M. Mantor, and H. Zhou, “CPU-Assisted GPGPU on Fused CPU-GPU Architectures”, in the 18th International Symposium on High Performance Computer Architecture (HPCA-18), Feb., 2012.
- Y. Yang, P. Xiang, J. Kong, M. Mantor, and H. Zhou, “A Unified Optimizing Compiler Framework for Different GPGPU Architectures”, in ACM Transactions on Architecture and Code Optimization (TACO), Vol. 9, Num. 2, 2012.
- Y. Yang and H. Zhou, “Developing a High Performance GPGPU Compiler using Cetus”, Cetus Users and Compiler Infrastructure Workshop, held with International Conference on Parallel Architectures and Compilation Techniques (PACT’11), Oct., 2011
- N. Bhansali, C. Panirwla, and H. Zhou, “Exploring Correlation for Indirect Branch Prediction”, in 2nd JILP Workshop on Computer Architecture Competitions (JWAC-2): Championship Branch Prediction, held with ISCA-38, June, 2011.
- M. Dimitrov and H. Zhou, “Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs”, in the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), pp 311-321, May, 2011.
- M. Dimitrov and H. Zhou, “Combining Local and Global History for High Performance Data Prefetching”, Journal of Instruction-Level Parallelism (JILP), Vol. 13, 2011
- J. Kong and H. Zhou, “Improving Privacy and Lifetime of PCM-based Main Memory”, The 40th IEEE/IFIP Conference on Dependable Systems and Networks (DSN 2010) (DCCS track), July, 2010.
- Y. Yang, P. Xiang, J. Kong, and H. Zhou, “A GPGPU Compiler for Memory Optimization and Parallelism Management”, The ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI’2010), June, 2010. (the open-source compiler code is available here) (talk given at PLDI’2010)
- J. Kong, et. al., “Accelerating MATLAB Image Processing Toolbox Functions on GPUs”, The 3rd workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), held with the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XV), Mar. 2010. (the OpenCL code is available here) (talk given at GPGPU-3)
- M. Dimitrov and H. Zhou, “Anomaly-based Bug Prediction, Isolation, and Validation: An Automated Approach for Software Debugging”, The 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIV), Mar. 2009. (The automated debugging tool can be downloaded here) (talk given at ASPLOS-XIV).
- M. Dimitrov, M. Mantor, and H. Zhou, “Understanding Software Approaches for GPGPU Reliability”, The 2nd workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-2), held with the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIV), Mar. 2009. (talk given at GPGPU-2)
- J. Kong, O. Acıiçmez, J.-P. Seifert and H. Zhou, “Hardware-Software Integrated Approaches to Defend Against Software Cache-based Side Channel Attacks”, The 15th International Symposium on High Performance Computer Architecture (HPCA-15), Feb., 2009. (talk)
- M. Dimitrov and H. Zhou, “Combining Local and Global History for High Performance Data Prefetching”, The 1st Journal of Instruction-Level Parallelism (JILP) Data Prefetching Championship (DPC-1), held with 15th International Symposium on High Performance Computer Architecture (HPCA-15), Feb., 2009. (ranked 2nd) (talk, code)
- J. Kong, O. Acıiçmez, J.-P. Seifert and H Zhou, “Deconstructing New Cache Designs for Thwarting Software Cache-based Side Channel Attacks”, The 2nd ACM Computer Security Architecture Workshop (CSAW-2), held in conjunction with 15th ACM Conference on Computers and Communication Security (CCS-2008), pp. 25-34, Oct. 2008. (talk given at CSAW-2)
- H. Gao, Y. Ma, M. Dimitrov, and H. Zhou, “Address-Branch Correlation: A Novel Locality for Long-Latency Hard-to-Predict Branches”, The 14th International Symposium on High Performance Computer Architecture (HPCA-14), pp. 74-85, Feb., 2008. (talk given at HPCA-14)
- M. Dimitrov and H. Zhou, “Unified Architectural Support for Soft-Error Protection or Software Bug Detection”, International Conference on Parallel Architectures and Compilation Techniques (PACT’07), pp. 73-82, Sept. 2007. (talk given at PACT’07)
- H. Gao and H. Zhou, “PMPM: Prediction by Combining Multiple Partial Matches”, Journal of Instruction-Level Parallelism (JILP), pp. 1-18, Vol. 9, 2007.
- Y. Ma, H. Gao, M. Dimitrov, and H. Zhou, “Optimizing Dual-Core Execution for Power Efficiency and Transient-Fault Recovery”, IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 8, pp. 1080-1093, Aug., 2007
- H. Gao and H. Zhou, “PMPM: Prediction by Combining Multiple Partial Matches”, 2nd Championship Branch Prediction (CBP-2) held with the 39th International Symposium on Microarchitecture (MICRO-39), pp. 19-24, Dec. 2006. (finalist in both the realistic and idealistic tracks, code for the realistic track, code for the idealistic track, talk given at CBP-2)
- M. Dimitrov and H. Zhou, “Locality-based Information Redundancy for Processor Reliability”, 2nd Workshop on Architectural Reliability (WAR-2) held in conjunction with 39th International Symposium on Microarchitecture (MICRO-39), pp. 29-36, Dec. 2006. (presentation given at WAR-2)
- J. Kong, C. Zou, and H. Zhou, “Improving Software Security via Runtime Instruction-Level Taint Checking”, Workshop on Architectural and System Support for Improving Software Dependability (ASID) held in conjunction with 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII), pp. 18-24, October, 2006. (presentation given at ASID’06)
- Y. Ma and H, Zhou, “Efficient Transient-Fault Tolerance for Multithreaded Processors Using Dual-Thread Execution”, IEEE International Conference on Computer Design (ICCD), pp. 120-126, October, 2006. (presentation given at ICCD’06)
- Y. Ma, H. Gao, and H. Zhou, “Using Indexing Functions to Reduce Conflict Aliasing in Branch Prediction Tables”, IEEE Transactions on Computers (TC), pp. 1057-1061, August, 2006.
- H. Zhou, “A Case for Fault Tolerance and Performance Enhancement using Chip Multi-Processors”, IEEE Computer Architecture Letters (CAL), pp. 1-4, Sept. 2005.
- H. Zhou, “Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window”, Proceedings of the 2005 International Conference on Parallel Architectures and Compilation Techniques (PACT’05), pp. 231-242, Sept. 2005. (presentation given at PACT’05)
- H. Gao and H. Zhou, “Adaptive Information Processing: An Effective Way to Improve Perceptron Branch Predictors”, Journal of Instruction-Level Parallelism (JILP), pp. 1-10, Vol. 7, 2005.
- H. Zhou and T. M. Conte, “Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction”, IEEE Transactions on Computers (TC), pp. 897-912, July 2005.
- H. Gao and H. Zhou, “Adaptive Information Processing: An Effective Way to Improve Perceptron Branch Predictors”, Champion, In the 1st Championship Branch Prediction (CBP-1) held with the 37th International Symposium on Microarchitecture (MICRO-37), Dec. 2004. (presentation, download the code, the simulation framework).
- H. Zhou, M. Toburen, E. Rotenberg, and T. M. Conte, “Adaptive Mode Control: A Static-Power-Efficient Cache Design”, ACM Transactions on Embedded Computing Systems (TECS), pp. 347-372, vol. 2, no. 3, August, 2003.
- H. Zhou and T. M. Conte, “Enhancing Memory Level Parallelism via Recovery-Free Value Prediction”, The 2003 International Conference on Supercomputing (ICS’03), pp. 326-335, June 2003.
- H. Zhou, J. Flanagan, and T. M. Conte, “Detecting Global Stride Locality in Value Streams”, The 30th ACM/IEEE International Symposium of Computer Architecture (ISCA-30), pp. 324-335, June 2003.
- H. Zhou and T. M. Conte, “Code Size Efficiency in Global Scheduling for ILP Processors”, The 6th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT-6) held in conjunction with HPCA-8, pp. 79-90, February 2002.
- H. Zhou, M. Toburen, E. Rotenberg, T. M. Conte, “Adaptive Mode Control: A Static-Power-Efficient Cache Design”, Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT’01), pp. 61-70, Sept. 2001.
- H. Zhou, M. D. Jennings, T. M. Conte, “Tree Traversal Scheduling: A Global Scheduling Technique for VLIW/EPIC Processors”, The 14th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC’01), LNCS 2624, pp. 223-238, Springer Verlag, August, 2001 (2003).
- A. A. Kassim, H. Zhou, S. Ranganath, Automatic IC orientation checks, Machine Vision and Applications, pp. 107-112, Vol. 12, No. 3, pp. 107-112, 2000.
- H. Zhou, A. A. Kassim, S. Ranganath, A fast algorithm for detecting die extrusion defects in IC packages, Machine Vision and Applications, pp. 37-41, Vol.11, No.1, pp.37-41, 1998.
- Zhou Huiyang, Qu Liangsheng, Li Aihua, Test Sequencing and Diagnosis in Electrical System with Decision Table, Microelectronics and Reliability, Vol.36, No.9, pp.1167-1175, 1996.
Refereed Educational Publication
H. Gao, M. Dimitrov, J. Kong, and H. Zhou, “Experiencing Various Massively Parallel Architectures and Programming Models for Data-Intensive Applications”, Workshop on Computer Architecture Education (WCAE-08), held in conjunction with ISCA-35, 2008. (talk given at WCAE-08)
Technical Reports
- Y. Mao, H. Zhou, and X. Gui, “Exploring deep neural networks for branch prediction”, Technical Report, ECE Department, N. C. State University, Sep. 2017.
- H. Zhou, “Code size aware compilation for real-time applications”, Technical Report, CS department, University of Central Florida, July 2003.
- H. Zhou and T. M. Conte, “Performance modeling of memory latency hiding techniques”, Technical Report, ECE Department, N. C. State University, January 2003.
- H. Zhou and T. M. Conte, “Using Performance Bounds to Guide Pre-scheduling Code Optimizations”, Technical Report, ECE Department, N. C. State University, Sep. 2002.
- M. D. Jennings, H. Zhou, T. M. Conte, “A Treegion-based Unified Approach to Speculation and Predication in Global Instruction Scheduling”, Technical Report, ECE Department, N. C. State University, August 2001.
- H. Zhou, C. Fu, E. Rotenberg, T. Conte, “A study of value speculative execution and mispeculation recovery in superscalar microprocessors“, Technical Report, ECE Department, N. C. State University, Jan., 2001.
- H. Zhou, M. Toburen, E. Rotenberg, T. Conte, “Adaptive Mode Control: A Low-Leakage Power-Efficient Cache Design”, Technical Report, ECE Department, N. C. State University, Nov., 2000.
Software Release
An Open-Source GPGPU Compiler (CUDA/OpenCL-to-CUDA/OpenCL code optimizer)