Apple Unveils OpenELM: Small, Open Source AI Models Engineered for On-Device Operation
The competition in the field of generative AI is heating up as tech giants like Google, Samsung, and Microsoft intensify their efforts to bring advanced AI capabilities to PCs and mobile devices. Joining the fray, Apple has unveiled OpenELM, a new family of open-source large language models (LLMs) that can operate entirely on a single device without the need for cloud servers.
Released just hours ago on the AI code community platform Hugging Face, OpenELM comprises small models optimized to efficiently perform text generation tasks. This move by Apple signifies its commitment to advancing AI technology and providing users with powerful and privacy-focused AI capabilities directly on their devices. With OpenELM, Apple aims to empower developers and users to leverage state-of-the-art natural language processing capabilities while maintaining data privacy and security.
The OpenELM project consists of a total of eight models, categorized into four pre-trained models and four instruction-tuned models. These models vary in parameter sizes, ranging from 270 million to 3 billion parameters. Parameters refer to the connections between artificial neurons in a large language model (LLM), and typically, a higher number of parameters indicate greater performance and capabilities, though this may not always be the case. The range of models provided by OpenELM allows developers and researchers to choose models that best suit their specific needs and computing resources, enabling a wide range of applications and use cases in natural language processing.
Pre-training is the process of training a large language model (LLM) on a vast amount of text data to learn the structure and patterns of language. While pre-training enables an LLM to generate coherent and potentially helpful text, it primarily focuses on predictive tasks. As a result, when prompted with specific requests by users, pre-trained models may generate responses that are not always directly relevant or actionable.
Instruction tuning, on the other hand, is a technique used to fine-tune pre-trained models to produce more relevant outputs for specific user requests. Unlike pre-training, which focuses on predictive tasks, instruction tuning involves providing explicit instructions or guidance to the model during the training process. This allows the model to learn how to generate responses that are tailored to the user’s requests, such as providing step-by-step instructions in response to a query about baking bread.
By combining pre-training with instruction tuning, developers can enhance the capabilities of large language models, ensuring that they not only produce coherent text but also deliver more relevant and useful responses to user queries. This approach enables the development of more effective conversational AI systems that better understand and address user needs and preferences.
Under the “sample code license” provided by Apple, the weights of its OpenELM models are made available, along with various checkpoints from training, performance statistics, and instructions for pre-training, evaluation, instruction tuning, and parameter-efficient fine-tuning. Importantly, this license does not restrict commercial usage or modification of the software. Instead, it mandates that if the Apple Software is redistributed in its entirety and without modifications, the redistributor must retain certain notices, including the original notice and disclaimers provided by Apple. This allows developers and researchers to freely use and modify the OpenELM models for commercial purposes, as long as they comply with the redistribution requirements outlined in the license.
Apple emphasizes that the OpenELM models are provided without any safety guarantees. This disclaimer acknowledges the possibility that the models may produce outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. This cautionary statement underscores the importance of responsible use and evaluation of AI models, especially in contexts where their outputs may have significant consequences.
The release of the OpenELM models represents a departure from Apple’s traditional approach of secrecy and closed development processes. It is part of a series of surprising moves by the company to contribute to the open-source AI community. In October, Apple made headlines with the quiet release of Ferret, an open-source language model with multimodal capabilities. Despite Apple’s reputation for secrecy, these releases demonstrate its willingness to engage with the wider AI research and development community, contributing to the advancement of AI technologies in an open and collaborative manner.
Related Courses and Certification
Also Online IT Certification Courses & Online Technical Certificate Programs