Meta’s New Large-Language Model to Run on Intel and Qualcomm Hardware


Meta has released Llama 3, the newest large language model (LLM) for a safer, more accurate generative AI experience. Along with the LLM, Meta introduced Llama Guard 2, Code Shield, and CyberSec Eval 2 trust and safety tools to help ensure compliance with industry and user safety expectations. While Meta is still developing Llama 3 AI models, the company is releasing the first two models to the public now.

 

Users can experiment with Meta’s first two LLMs now. Image used courtesy of Meta
 

The open-source Llama 3 bakes safety into the models and provides multi-platform hardware support. Meta notes that support for Llama 3 will soon be available on all major platforms, including cloud providers and model API providers. Companies to soon host the Llama 3 LLM include AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, Nvidia NIM, and Snowflake. The LLM will also be supported on hardware from AMD, AWS, Dell, Intel, Nvidia, and Qualcomm. 

 

Qualcomm and Intel Quick to Run Llama 3 on Hardware Platforms

Processors for generative AI must move large amounts of data quickly and process math in massively parallel operations. This is true for all types of processors, be it graphics processing units (GPUs), neural processing units (NPUs), or tensor processing units (TPUs) in concert with high-powered CPUs. GPUs, NPUs, and TPUs can be standalone high-powered co-processors or cores integrated into system-on-chip (SoC) processors.  

Qualcomm uses an SoC approach to bring Llama 3 to its mobile processors. The company worked with Meta during Llama 3 development to ensure that the LLM would be compatible with its flagship Snapdragon products. The Snapdragon processors come with AI-capable NPU, CPU, and GPU cores.

 

Intel validated Intel has validated its AI product portfolio

Intel validated its AI product portfolio for the first Llama 3 8B and 70B models. Image used courtesy of Intel
 

Intel also collaborated with Meta in developing Llama 3 for data center-level processors. Intel optimized its Gaudi 2 AI accelerators for Llama 2, Meta’s prior version of the LLM, and has now demonstrated the accelerators’ compatibility with Llama 3. Intel’s Xeon, Core Ultra, and Arc processors have also been validated with Llama 3. 

 

How Do LLM’s, Particularly Llama 3, Work?

AI LLMs interpret a data set and convert it to a machine-interpretable set. This allows generative AI to duplicate a human-like experience of building on a prior knowledge base. The modeling process does this by tokenizing words, much like a software compiler takes keywords and tokenizes them into CPU opcodes. Rules like grammar, syntax, and punctuation are tokenized into the governance for the AI interpretation and generation of output.

The more parameters are tokenized, the more accurate and human-like the output will be. However, the number of parameters must be balanced with the computing load to tokenize, apply rules, and interpret. Llama 3 comes in two different models, one with eight billion parameters targeted at high-end edge AI, such as that used in phone processors, and one with 70 billion targeted at larger data center systems.

Llama 3 uses a 128-K token vocabulary for efficient encoding. It uses grouped query attention (GQA) for both 8B and 70B models. The models were trained with sequences of 8,192 tokens. Meta used a mask to prevent self-attention from crossing document barriers.

 

Llama 3 8B and 70B performance measures

Llama 3 8B and 70B performance measures. Image used courtesy of Meta
 

In zero-shot learning (0-shot) tests, the AI model is not specifically trained on the data used in the question. For example, the AI model will be asked to identify a duck when it has not been trained with examples of ducks. Instead, it must infer a result based on semantic relationships.

In n-shot learning (n > 0) tests, the model has been trained with at least n examples of test question data. Chain of thought (CoT) tests AI reasoning for complex tasks like math and physics. Meta tested Llama with a new human evaluation set containing 1,800 prompts covering 12 common use cases. 

 

Meta Considers AI Safety

With more than a year of broad AI use by the general public, issues of safety, accuracy, and reliability have risen to the forefront. Meta has taken these concerns into consideration, allowing AI developers to fine-tune the models for safety in each application.

 

Llama 3 system-level safety model

Llama 3 system-level safety model. Image used courtesy of Meta
 

Meta maintains separation between the test cases and the model developers to prevent unintentional overfitting. LLM overfitting occurs when a complex model essentially memorizes the training data rather than learning how to utilize the underlying patterns. With overfilling, an LLM would be very effective with training data but would have limited ability to work with new or different data. A badly overfitted LLM will mimic well but won’t think on its own.

 

Llama 3’s Next Steps

AI is a work in progress and will be for quite some time, and the Meta Llama 3 LLM under development is no exception.

 

Preview of future Llama 3 performance

Preview of future Llama 3 performance. Image used courtesy of Meta
 

Though Meta made public the 8B and 70B models, the company is still training the 400B parameter version. And while the 400 billion parameters show improvements in accuracy with a larger parameter set, comparing 400B+ numbers to the 8B and 70B performance chart reveals non-linear returns in some of the benchmarks. It’s easy, then, to infer that the higher AI hardware requirements are not going to slow down anytime soon.

 

 

Images used courtesy of Meta. 



Source link

Leave a comment