Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in NeurIPS 2016, 2016
We introduce Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameter gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency.
Recommended citation: Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio. (2016). "Binarized Neural Networks." NeurIPS 2016. https://proceedings.neurips.cc/paper/2016/file/d8330f857a17c53d217014ee776bfd50-Paper.pdf
Published in CVPR 2020, 2020
We propose to repeat instances within a batch with different data augmentations. This simple modification consistently improves generalization.
Recommended citation: Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry. (2020). "Augment Your Batch: Improving Generalization Through Instance Repetition." CVPR 2020. https://openaccess.thecvf.com/content_CVPR_2020/papers/Hoffer_Augment_Your_Batch_Improving_Generalization_Through_Instance_Repetition_CVPR_2020_paper.pdf
Published in CVPR 2020, 2020
We propose methods for compressing models without access to the original training data, generating synthetic data that matches the statistics of the original dataset.
Recommended citation: Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry. (2020). "The Knowledge Within: Methods for Data-Free Model Compression." CVPR 2020. https://openaccess.thecvf.com/content_CVPR_2020/papers/Haroush_The_Knowledge_Within_Methods_for_Data-Free_Model_Compression_CVPR_2020_paper.pdf
Published in ICML 2021, 2021
We minimize quantization errors of each layer by optimizing parameters over a small calibration set, breaking the 8-bit barrier for post-training quantization without significant overfitting.
Recommended citation: Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry. (2021). "Accurate Post Training Quantization With Small Calibration Sets." ICML 2021. http://proceedings.mlr.press/v139/hubara21a/hubara21a.pdf
Published in NeurIPS 2021, 2021
We suggest a novel transposable fine-grained sparsity mask for N:M sparsity, allowing acceleration of both forward and backward passes. We formulate finding the optimal mask as a min-cost flow problem.
Recommended citation: Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Joseph Naor, Daniel Soudry. (2021). "Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks." NeurIPS 2021. https://proceedings.neurips.cc/paper/2021/file/b0490b85e92b64dbb5db76bf8fca6a82-Paper.pdf
Published in ICLR 2023, 2023
We examine how N:M sparsity can be used for neural gradients. We show that unlike weights/activations, gradients require an unbiased minimum-variance pruning mask. We design such masks and show 1:2 or 2:4 sparsity works well.
Recommended citation: Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry. (2023). "Minimum Variance Unbiased N:M Sparsity for the Neural Gradients." ICLR 2023. https://openreview.net/pdf?id=vuD2xEtxZcj
Published in ICLR 2024, 2024
We present a method to train and fine-tune high-end DNNs to utilize cheaper, low-bit accumulators with no significant degradation in accuracy, addressing the computational bottleneck of high-precision accumulation.
Recommended citation: Yaniv Blumenfeld, Itay Hubara, Daniel Soudry. (2024). "Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators." ICLR 2024. https://openreview.net/pdf?id=wMbe8fVjgf
Published in Transactions on Machine Learning Research (TMLR), 2025
We propose Foldable SuperNet (FoldSN), a novel method for merging multiple Transformer models trained on different tasks and initializations into a single, scalable SuperNet. This approach enables dynamic resource allocation and efficient multi-task inference.
Recommended citation: Edan Kinderman, Itay Hubara, Haggai Maron, Daniel Soudry. (2025). "Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks." TMLR 2025. https://openreview.net/pdf?id=6FqwLestHv
Published in arXiv, 2025
We present Block-Sparse FlashAttention (BSFA), a drop-in replacement that accelerates long-context inference while preserving model quality by addressing the quadratic complexity bottleneck.
Recommended citation: Daniel Ohayon, Itay Lamprecht, Itay Hubara, Israel Cohen, Daniel Soudry, Noam Elata. (2025). "Block Sparse Flash Attention." arXiv preprint arXiv:2512.07011. https://arxiv.org/pdf/2512.07011
Published:
Talk given at NeurIPS 2016 about Binarized Neural Networks.
Published:
Talk given at NeurIPS 2017 about closing the generalization gap in large batch training.
Published:
Talk given at NeurIPS 2018 about Quantized Neural Networks efficiency.
Published:
Talk given at ICML 2021 about Accurate Post Training Quantization with Small Calibration Sets.
Published:
Talk given at NeurIPS 2021 about Accelerated Sparse Neural Training.
Undergraduate ML labs, Technion Israel Institute of Technology, 2014
ML basic lab course - hands on python/matlab algorithms such as Naive Bayes, KNN, SVM
CV course, Technion Israel Institute of Technology, 2014
Gave a recurrent guest lecture on Deep learning and CNN as part of a broader CV course