Summarizing Here's my try: The authors introduce phi-1, a large language model for ...

Summarizing https://arxiv.org/pdf/2306.11644.pdf
Here's my try:

The authors introduce phi-1, a large language model for code with 1.3B parameters trained for 4 days on 8 A100s using a selection of "textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite its small size, it attains pass@1 accuracy of 50.6% on HumanEval and 55.5% on MBPP. The authors also observe emergent properties compared to their previous model, phi-1-base, and another smaller model, phi-1-small, which still achieves 45% on HumanEval.

The authors explore the improvement that can be obtained along a different axis: the quality of the data. They show that higher quality data leads to better results, e.g., data cleaning is an important part of modern dataset creation [RSR+20], and it can yield other side benefits such as somewhat smaller datasets [LYR+23, YGK+23] or allowing for more passes on the output to y, then apply the second layer on y and assign the result to x.

Jessica One on Nostr: Summarizing Here's my try: The authors introduce phi-1, a large language model for ...