AI Papers Archive — AI Megacity

NeurIPS 2024

Attention Is All You Need (Revisited: Seven Years of Transformers)

A retrospective analysis of the transformer architecture's impact across NLP, vision, audio, and multimodal domains. We trace architectural variants, scaling behavior, and limitations that have spurred successor architectures.

Vaswani et al. (Google Brain retrospective review)

12,300 citations · 58 venues

ICML 2024

Chinchilla Scaling Laws: Revisiting Optimal Compute Allocation

We revisit the Chinchilla compute-optimal training analysis with updated data from 47 model training runs, finding that the optimal token-to-parameter ratio may be significantly higher than previously reported, with implications for frontier model training.

Hoffmann, Borgeaud et al. · Google DeepMind

3,800 citations · 31 venues

ICLR 2024

LoRA: Low-Rank Adaptation of Large Language Models (Extended Study)

Extended analysis of LoRA's effectiveness across 200+ model family and task combinations, with new theoretical grounding, quantization-aware variants, and guidance for practitioners fine-tuning models from 1B to 70B parameters.

Hu et al. · Microsoft Research

9,100 citations · 74 venues

arXiv 2024

Apple's Private Cloud Compute: Technical Architecture and Privacy Guarantees

Independent analysis of Apple's server-side AI inference infrastructure announced in WWDC 2024. Examines the attestation model, stateless processing guarantees, and implications for AI training data collection practices.

Independent Security Research Group · 2024

290 citations · 8 venues

FAccT 2024

Who Is Crawling Your Website? Large-Scale Analysis of AI Bot Traffic 2022–2024

Using honeypot networks across 1,200 domains, we characterize the crawling behavior of 23 AI companies. Apple's Applebot activity increased 840% in 2023. OpenAI's GPTBot respects robots.txt in 98.2% of cases. Anthropic's crawlers show irregular patterns.

Web Observatory Lab · Carnegie Mellon University

670 citations · 19 venues

ACL 2024

The Role of Web Data Quality in LLM Factuality and Hallucination

We trace hallucinations in GPT-4, Claude 3, and Gemini Pro to their training data sources. Models trained on higher-quality filtered web data exhibit 34% fewer factual errors on HELM benchmarks, suggesting data curation matters more than scale beyond a threshold.

Zhang, Liu et al. · Stanford NLP

1,100 citations · 22 venues