Alium Open RAN IP Analytics Training Methodology
Alium’s Open RAN Patent Portfolio license program uses specifically trained machine learning (ML) algorithms as part of the Open RAN IP Analytics to statistically predict the potential essentiality of granted patents to the Open RAN Standard for purposes of royalty allocation among Licensors. The predictions are strictly used only for allocating royalties among Licensors; they do not determine patent essentiality or license coverage. Whether or not a patent is identified by the Open RAN IP Analytics, a Licensee is covered under all Open RAN Essential Patents that Licensors or their Affiliates have the right to license presently or in the future. This paper describes the basis for using ML to generate these predictions and the methodology used to train and develop Alium’s Open RAN IP Analytics.
Summary
The data scientists at Unified Patents trained and continue to develop for Alium separate ML algorithms, known as the Open RAN IP Analytics, using large sets of expert reviewed RAN infrastructure patents to objectively score the statistical essentiality of any granted patent publication worldwide covering the LTE or 5G functionality implemented in Open RAN Radio Units. These landscapes have been developed for a number of different standards by Unified under the OPAL (Objective PAtent Landscape) tools it provides to members of SEP Zones. For a full list of standards and example landscapes go here.
Universe of Patents Subject to Analysis - 4.5 M applications and granted patents relevant to LTE and 5G RAN.
Patents Evaluated Manually by Experts - 6.6 K families self-declared in ETSI for 3GPP evaluated manually by the 3GPP experts at Concur IP, who performed analysis for the 2015 TCL vs Ericsson case in the US and the 2017 Unwired Planet vs Huawei case in the UK. Additional reviewed patent families will be added to the training of the Open RAN IP Analytics each year.
AI Training - ML Algorithms were trained using FastText vectorization of the title, abstract, and claims of each patent to (1) score their statistical essentiality to the LTE and 5G RANs based on semantic similarity and (2) filter out patents not covering Open RAN functionality.
Objective Result - Trained Open RAN IP Analytics generated an Open RAN Statistically Essential Landscape of 17.4 K families with active grants.
ML Performance - Training complied with good ML training practices and the results earned high F-1 scores of 0.999 for 5G and 0.998 for LTE.
Development - Annual development of the Open RAN IP Analytics with results ready for Licensor review each year during 3Q - each Licensor may submit during 4Q of each year any of their claim charted patents independently reviewed to be essential to be added to training sets.
Standards Scope and Purpose
The Open RAN Standard is defined for the purposes of the Alium pool license to be the infrastructure functionality specified in 3GPP releases 15, 16, and 17 for the LTE and 5G NR radio access network solely to the extent that such functionality is required to be in the physical or software implementation of a radio unit as specified by the O-RAN Alliance in its reference hardware designs for indoor pico cells and outdoor microcells with the 7-2x architecture split option. The 7-2 architecture split option is defined in 3GPP technical report 38.801 but the O-RAN Alliance has added to the 7-2 split option the functions of precoding and resource element mapping and refers to this as the 7-2x architecture split option. The Alium pool license covers this 7-2x architecture split option. In effect, the Alium pool license covers lower physical layer functionality including precoding, resource element mapping, beamforming, PRACH filtering and PRACH preamble responses, fast fourier transforms and their inverse, cyclic prefix insertion and extraction, digital to analog conversions, analog to digital conversions, and signal filtering. Much of this functionality is specified in 3GPP technical specifications starting with 36.1xx and 38.1xx but also in parts of 3GPP technical specifications starting with 36.2xx and 38.2xx.
Identifying patents that are essential to the 7-2x architecture split option is a demanding and expensive task. This for the following reasons:
First, there is no repository of patents objectively identified to be essential to the Open RAN Standard. While the standards development organizations (SDOs) partnering with 3GPP to build the LTE and 5G standards do require their participants to promptly self-declare any potentially essential applications and patents, there are no objective criteria as to what is essential nor any requirements on the level of evidence required. It is also likely that patents initially declared to be essential could figuratively fall out of essentiality due to the development of the standard or the prosecution of the patent. The SDOs do not conduct an independent review of declared patents nor require the declarants to conduct an independent review. The 5G repository maintained by ETSI currently contains close to 40 K families that are only subjectively viewed to be essential by the patent holders.
Second, a large number of 3GPP participants under a standard essential patent (SEP) declaration obligation declare only with respect to the technical specification (TS) and not to the specific section number of that TS. This is an important distinction because many of the 3GPP TSs for the physical layer cover not just the low physical layer but also the high physical layer as well as layer 3 network and higher functionality, which principally are not performed in radio units but at higher levels of the baseband performed in central units and distributed units. Relying on a SEP declaration made to, for instance, just TS 38.211 without more is insufficient for the purposes of identifying SEPs to the Open RAN Standard.
Third, the number of declarations alone made to TSs that pertain to the Open RAN Standard makes it impractical from a timing and costing perspective to evaluate their essentiality. As of October 1, 2021, over 27 K active granted patents (in over 16 K active families) have been declared to 5G TS 38.1xx and TS 38.2xx. This high number does not take into account the patents essential to 5G TS 38.1xx and TS 38.2xx but which have not been declared nor do these numbers account for patents essential to LTE TS 36.1xx or TS 36.2xx, which are integral to Aluim’s pool for Open RAN.
Fourth, a large number of companies and institutions have contributed through 3GPP to the development of the Open RAN Standards and this requires broad searches for potentially essential patents. Over 5 K technical contributions by roughly 100 companies and institutions have been made since the mid 2000s to the development of the physical layer of the LTE and 5G NR radio access network implemented in network infrastructure. While 80% of these contributions involved one or a combination of the traditional manufacturers of RAN equipment such as Ericsson, Huawei, Nokia, ZTE, or Samsung, close to 1 K technical contributions were made by one or a combination of close to 100 other entities.
For the above outlined reasons, it is prohibitively expensive and time consuming to manually review and claim chart each patent that potentially could be essential to the Open RAN Standard. An independent manual review and claim charting of each patent can cost more than $5 K and take as much as 4 to 10 hours. The costs to analyze a portfolio of just 100 declared SEP families could amount to $500 K and take 10 work weeks for a single patent attorney.
Unfortunately, even spending the money and time on manual reviews does not guarantee a true view of essentiality or the size of the essential landscape. Reviewers do make mistakes or may even be biased. Regardless of the qualifications of the reviewers and the quality of their analysis, the results will invariably be disputed. For example, The percentage of self-declared patents estimated by courts and experts to be actually essential ranges from 50% to lower than 20%. In 2005, Goodman and Myers found 21% of self-declared patents in 3G were essential. In 2010, Fairfield Resources estimated 50% of self-declared patents in LTE were essential. In the remanded US TCL vs Ericsson case, the court’s calculation resulted in about 40% of all 2G, 3G, and LTE self-declared patents being essential. Finally, in the UK Unwired Planet vs Huawei case, the court used 16.6% to calculate the number of essential LTE patents from all self-declared ones.
Recognizing this, Alium turned to Unified Patents to build analytics to predict the essentiality of tens of thousands of patents to the Open RAN Standard for the purpose of equitably distributing royalties to licensors. The criteria for the analytics were unwavering objectiveness, transparency, cost-efficiency, and consistency and sufficient reliability. Further, to improve their reliability, the analytics were to be developed year over year. To meet these criteria, Unified Patents developed using best practices a highly trainable ML platform to quickly analyze and predict the potential essentiality of millions of patents around the world.
Sampling, Manual Reviews, and Training Sets
Simply, Unified Patents trained several ML algorithms, one each for LTE RAN and 5G NR RAN, to predict essentiality based on the vectorized semantics of thousands of patents manually reviewed by independent experts to be essential. Unified Patents trained a second ML algorithm to distinguish between, on the one hand, LTE RAN and 5G NR RAN patents that apply to the functionalities covered by the Open RAN Standard and, on the other hand, LTE RAN and 5G NR patents that apply to functionalities that are outside the Open RAN Standard. In both cases, the training requires large sets of patents that meet the criteria, whether essential or whether applying to Open RAN Standard functionalities, as well as large sets of patents that do not meet the criteria. Patents that meet the criteria are referred to as positive training cases or labels and conversely patents that do not meet the criteria are referred to as negative training cases or labels.
To collect its positive and negative training cases, Unified Patents contracted with Concur IP, the Indian IP technical services consultancy that reviewed the GSM, WCDMA, and LTE SEP landscapes for TCL in its FRAND licensing case against Ericsson decided in 2017 (appealed and remanded). Concur IP’s technical experts were also used by Huawei to review the LTE SEP landscape in its FRAND dispute in the UK against Unwired Planet decided in 2017 (upheld on appeal). The accuracy of Concur IP’s reviews of the LTE landscape in the TCL case was found to be around 90% when independently validated by a telecommunications professor at a renowned US university.
Concur IP created a list of patents to be manually reviewed by randomly selecting 6.6 K English language patent families with infrastructure claims declared to the LTE and 5G RAN standards. The random selection was conducted to ensure that the distribution of families was proportional to each declarant’s share of the overall number of families declared to ETSI specifically to the LTE and 5G NR RAN standards.
Concur IP’s independent technical experts then manually reviewed the claims of each family member without knowledge of the purpose of the review until evidence was identified in the relevant technical specifications for each element of a claim for essentiality. The priority of review was for independent claims followed by dependent claims and geographically in the order of US, EP, CN, JP, SK, and WO publications. Once the essentiality of a family member was evidenced, the entire family was deemed for the purposes of the training to be essential. If no claim was found to be essential, then the entire family was deemed not to be essential. The reviews resulted in four sets of families: two Positive Essentiality Sets of 9.9 K publications for LTE RAN and 7.6 K publications for 5G RAN and two Negative Essentiality Sets of 48.6 K publications for LTE RAN and 56.3 K publications for 5G RAN. The Negative Essentiality Sets were augmented with clearly unrelated and non-3GPP publications.
Unified Patents’ team of data scientists and patent attorneys used the Positive Essentiality Sets and Negative Essentiality Sets to train separate ML algorithms for LTE RAN and 5G NR RAN to predict potential essentiality based on semantic similarity. The training was conducted using good ML training practices.
The title, abstract, and claims of each patent in the Essentiality Sets were vectorized using the FastText (word2vec) text vectors and top 1,000 CPC embeddings. Initially, 400 dimensions were used to distinguish the vectors but this was reduced to 40 to reduce the risk of overfitting. The training model used was an ensemble of XGBoost and shallow extra-randomized forest from Scikit-Learn. The class weights of the Positive Essentiality Sets and Negative Essentiality Sets were also balanced.
In training the ML algorithms, a stratified K-fold cross-validation process was deployed. This stratified resampling was used to correct any optimistic errors resulting from imbalanced data sets as well as to preserve the proportionality among the cross-validation testing and training sets.
Unified Patents generated a relevant universe of 4.5 M vectorized publications that were (i) declared to 3GPP, (ii) contained one of the top 1,000 CPCs used in LTE and 5G RAN declared patents, or (iii) referenced 3GPP. Note that the relevant universe includes a lot more publications than just publications declared to 3GPP. Excluded from this universe were clearly unrelated publications such as those covering pharmaceuticals and materials engineering. The ML algorithms were then tasked with predicting the semantic similarity of each vectorized publication in the relevant universe with their trained view of what constitutes an essential patent. The resulting scoring for each publication was on a scale of 0-100, with 100 being the highest with respect to semantic similarity. The distributions of the scores for the positive (blue) and negative (red) LTE Essentiality Sets and positive (blue) and negative (red) 5G Essentiality Sets are shown below against the total scored population (grey).
The performance of the ML algorithms resulting from the training earned very high F-1 scores of 0.999 for 5G NR RAN and .998 for LTE RAN. The F-1 scoring captures the harmonic mean of precision and recall where precision equals the number of true positives divided by the number of all positive results and recall equals the number of true positives divided by the number of all samples that should have been identified as positive.
Open RAN Functionality Filter Training
Unified Patents then trained a ML algorithm called the Open RAN Functionality Filter to identify patents that cover the LTE and 5G NR RAN functionality implemented in Open RAN Radio Units. Two training sets of publications were used for this purpose:
Positive Open RAN Set comprising 19.5 K publications declared to over 100 3GPP technical specification sections specifically covering Open RAN Standard functionality, and
Negative Open RAN Set comprising 61.6 K publications declared to non- Open RAN parts of the LTE and 5G NR RAN but excluding ones containing over 50 Open RAN keywords.
The same FastText word vectors and good ML training practices were used to train the Open RAN Functionality Filter that were used to train the LTE and 5G NR RAN essentiality algorithms. The resulting Open RAN Functionality Filter identified potentially 650 K publications from the 3GPP relevant universe of 4.5 M publications (or about 14.4%) as earning a semantic similarity score of 50 or higher using a neutral binary classification. Such publications qualify as Open RAN Publications for the purposes of the Open RAN IP Analytics. The distributions of the Positive Open RAN Set (blue) and Negative Open RAN Set (red) are shown below against the total scored population (grey).
Open RAN Statistically Essential Patents
Each granted patent exceeding the semantic similarity score that captures a landscape equal to the projected US active declared essential patent universe of LTE RAN and 5G RAN qualifies as an Open RAN Statistically Essential Patent provided that such patent qualifies as an Open RAN Publication.
The threshold semantic similarity scores for LTE RAN and 5G NR RAN were set based on a projected size of the LTE RAN and 5G NR RAN universe that uses as its base the size of the LTE RAN SEP portfolio in 2015 established by the court in the TCL vs. Ericsson FRAND case. This approach to use LTE RAN SEP portfolio growth statistics to project 5G RAN SEP portfolio growth is justified as shown below.
First, there is a great deal of overlap between families declared to 5G NR RAN and LTE RAN. ETSI declaration records show that as of October 1, 2021, 46.45% of 5G RAN declared families were also declared to LTE RAN (6,907 (6,907 + 7,967)) and relatedly 52.38% of LTE RAN declared families were also declared to 5G NR RAN (6,907 (6,907 + 6,280) ). The similarity in LTE RAN and 5G NR RAN technologies has prompted many standardization participants to simply declare the same families for both LTE RAN and 5G NR RAN.
Second, there is a high correlation between the development arcs of the LTE RAN and 5G NR RAN declared family portfolios. These two portfolios do in fact share the same technologies such as Massive MIMO, Carrier Aggregation, Beam Forming, and Sidelink
The TCL vs. Ericsson court set an essentiality rate for LTE in 2015 of 26.12% based on analysis from both TCL and Ericsson that there were ≈ 6.4 K US active families declared essential to LTE and only about ≈ 1.6 K of those families were found likely essential. Using this essentiality rate of 26.12%, it was calculated that the current US active declared family population equals 3.4 K for LTE RAN and 3.9 K for 5G RAN. This is based on the number of US active families declared to ETSI as of October 1, 2021, which were 13 K for LTE RAN and 15 K for 5G NR RAN. As can be seen from the chart below, the cumulative number of US active families declared to 5G NR RAN only eclipsed the cumulative number of US active families declared to LTE RAN only in 2020.
With these calculations, the semantic similarity scores used to qualify a granted LTE RAN or 5G NR RAN Statistically Essential Patent were adjusted to capture statistically essential landscapes of ≈ 3.3 K declared US active families for LTE RAN and ≈ 3.9 K declared US active families for 5G RAN. The adjusted semantic similarity score for LTE RAN is thus 46.1 and the adjusted score for 5G NR RAN is 34.0795. The 5G NR RAN score reduces the 650 K universe of potentially relevant active 5G NR Open RAN families by 93.7% to 41K.
Continued Development of Open RAN IP Analytics
Alium is committed to developing the Open RAN IP Analytics to provide Licensors the opportunity to address perceived errors regarding their own potential SEPs and to improve the reliability of the predictive abilities of the analytics. To that end, Allium will instruct Unified Patents to retrain the ML algorithms annually using good ML training practices for use in determining royalty distributions for the following year.
Retraining will consist of Unified Patents conducting SEP reviews through qualified independent experts during the first half of each year and then retraining the ML algorithms using the new SE reviews in the training sets together with the reviews from the previous years. The results of the retraining (i.e., which patents qualify to be Open RAN Statistically Essential Patents) will be made available, subject to confidentiality, to Alium Licensors for review by the third calendar quarter of each year.
Each Alium Licensor may submit patents of its own that it feels should be included as an Open RAN Statistically Essential Patent but for whatever reason did not qualify as a result of the most recent retraining. In order for any such patents to be included in the positive training sets they must be claim-charted and identified to be essential to the Open RAN Standard through a manual review conducted by an independent expert from a list of such selected by Alium. The costs for such claim-charting and manual review are to be borne by the Alium Licensor requesting such consideration.
Alium will instruct Unified Patents to retrain the Open RAN IP Analytics, adding to the Positive Essentiality Sets any qualifying patents submitted to Alium on or before December 1 of the retraining year. The newly retrained Open RAN IP Analytics will be disclosed, subject to confidentiality, to Alium Licensors by January 31 of the following year and frozen for use in determining royalty distributions for that entire year.