Open Source AI Versus Proprietary AI Models: Key Differences in Contract Terms and IP Risks - Part 2, Legaltech News
Part 1 of this article covered the evolving paradigms for releasing AI models, including open source AI and open weights AI. In open source AI, according to the Open Source Initiative, the AI provider will provide detailed information about the training data, the source code used to train and run the AI model, and the weights and parameters that are refined during training of the AI model and used in operation when the AI model generates its output. See https://opensource.org/ai/open-source-ai-definition. In open weights AI, the AI provider releases the weights and parameters that are needed to run the AI model, but typically will not release the training data, detailed information on the training data or the training algorithms. Both of these licensing models enable the user to fine tune and customize the AI model and avoid paying licensing fees. But there are additional considerations when choosing an open AI model versus a proprietary AI model relating to contract terms and IP risks. So, what are the key considerations?
Protection of User Data
One of the biggest advantages of using an open AI model rather than a proprietary AI model is that in most cases, an open AI model will allow the user to prevent its data (e.g., trade secrets, technical information, proprietary business information) from being transmitted to a third-party AI model. In particular, the user, e.g., a business or other organization, stores and runs the entire AI model on servers it controls, rather than sending its data to a proprietary AI model hosted by a third-party AI provider. By containing the entire open AI model within the user’s IT environment, the user can ensure that the AI provider cannot access its valuable user data.
Of course, with a proprietary AI model, the user will do its best to negotiate robust confidentiality obligations and use restrictions in the services agreement with the AI provider. However, obtaining these terms can be more challenging than expected. For example, the standard contract terms offered by some AI providers may not be a model of clarity. The user will generally seek a contractual use restriction that prevents all use of its data except to provide the services for the customer. But the standard use restrictions offered by the AI provider may be more limited, e.g., the restrictions may apply to only certain specified AI models or have ambiguities such as conflicts between different sets of applicable terms. And the terms may change over time with updates to the AI provider’s online terms and services, which may occur relatively frequently in the AI space. As a result, and depending on the user’s negotiating leverage, it may be more challenging than expected to secure a straightforward, categorical use restriction.
The AI provider may assure the user that it will keep the user data confidential. However, confidentiality obligations alone are generally not adequate, because an AI provider can arguably keep user data confidential while using it to train its AI models.
In addition, AI providers generally are eager to access quality data sources because the quality of the training data has a significant effect on the quality of the trained model (see, e.g., June 23, 2025, Bartz v. Anthropic order on fair use recognizing this relationship). If the user is not focused on protecting its data, it may inadvertently enter into agreement terms allowing some use of its data to train third-party AI models.
All of these factors demonstrate that there may be risks involved with sending your company’s proprietary data to a third-party AI model. Some companies conclude that these risks are not particularly concerning. However, for a company that is considering inputting very sensitive company information into a third-party AI model, it becomes critical to carefully evaluate these risks.
Infringement Risk
The second major risk is infringement liability arising from use of AI models. On this point, proprietary AI models have a distinct advantage, because the AI services agreement will typically provide some protection against potential liabilities for infringing third-party IP. Many AI providers now offer an IP indemnification that covers not only use of the AI system, but also use and distribution of AI-generated output. As with any indemnification, however, the details of the indemnification clause matter. For example, in addition to the normal exclusions for combinations and modifications, the AI provider may impose a list of requirements for the indemnity to apply, such as a prohibition on disabling any content filters, no use of the output in a manner that the user knows or should know is likely to infringe, sufficient rights to use the input data, and other required mitigations such as the inclusion of specific metaprompts directing the AI model to prevent copyright infringement in its output.
But even with all of these requirements, an IP indemnity that covers use of the AI system and use and distribution of the AI-generated output is still significantly better than what comes with an open AI model. Several of the better known open AI models have been released under permissive open source licenses that include a disclaimer of warranty, limitation of liability and no indemnity. For example, the DeepSeek R1 model was released under the permissive MIT license which provides, in part: “The software is provided ‘as-is,’ without warranty of any kind, express or implied, including but not limited to the warranties of...noninfringement.” A number of other AI models, such as xAI’s Grok-1, Alibaba’s Qwen-3 and several Mistral AI models, were released under the permissive Apache 2.0 open source license, which has a similar disclaimer of warranty and limitation of liability.
While the indemnities offered by providers of proprietary AI models may have their own limitations and exclusions, the fact that they provide one on a broad scale certainly gives them motivation to avoid infringement arising from use of their AI models. The same may not be true for open AI models, where the AI provider typically disclaims all liability and therefore has considerably less risk of infringement liability. In addition, with an open weights AI model, the user may have little or no visibility into what training data was used and whether it included copyrighted works that were used without authorization. In this scenario, the user would be in the dark as to the risk of infringement from use of the open AI system or its output.
At the end of the day, the best option will depend on the circumstances. A company with its own valuable trade secrets and data, IT resources and technical expertise that is looking to use a customized AI model for internal functions may be a good candidate for an open AI model. On the other hand, a company with less technical expertise and internal IT resources that is planning to use AI-generated output externally where IP infringement risks may be significant, could be a better fit for a proprietary AI model. Wherever your organization falls, it is likely that open source AI will continue to expand its reach. As the open AI licensing norms become more standardized and organizations gain experience with the models, risks and terms, open AI models may proliferate just like open source software.
Reprinted with permission for July 9, 2025 issue of Legaltech News. Further duplication without permission is prohibited. All rights reserved.
Related People
Related Services
Media Contact
Lisa Franz
Director of Public Relations
Jeremy Heallen
Public Relations Senior Manager
mediarelations@Hunton.com
