Creatively Misinformed: Mining Social Media to Capture Internet Creators and Users’ Misunderstanding of Intellectual Property Registration System

Xiaoren Wang, Paul Heald, Weihao Ge

Research output: Contribution to conferencePaperpeer-review


Research question and significance:
Intellectual property (IP) law is complicated, but engaging formal legal help is costly. The internet, therefore, becomes a major source of cheap regulatory information for the creative industries. In the absence of formal counsel, media outlets like Twitter and Reddit have become important sources of information about the law and important sources of misinformation. For example, law firm posts suggest that misconceptions about the need for and benefits of copyright, patent, and trademark registration abound.
Misconceptions about the copyright registration system may contribute to the proliferation on-line scams targeting new authors. The failure to understand the patent registration system may result in the loss of protection for an invention.
Our research is to capture the patterns of the misconceptions of IP registration by empirically analyzing thousands of posts on social media. We expect to find a gap between the objectives of the IP registration system and how it is understood “on the ground.” We hypothesize that the gap is partially due to the public misconceptions on the registration system. Without a more precise picture of IP misconceptions, regulators cannot easily respond. We hope to inform regulators and advise them how to reduce the most troublesome of IP misconceptions. Ultimately, our research could be used to improve the quality of media campaigns and public strategies. Finally, we hope our date will help IP scholars increase the relevance of their interventions.
Methodology and expected outcomes:
Our approach involves collecting thousands of posts from Reddit, Twitter, and Facebook as they are the mainstream social media, which contain a significant amount of IP related content. We will use the keywords ’’copyright registration’’, ’’trademark registration’’ and ’’patent registration’’ to extract relevant posts from the three social media.
Table 1
Reddit Twitter Facebook
Copyright registration 1521 posts -- --
Trademark registration -- -- --
Patent registration -- -- --

We will analyze the collected posts to identify the pattern in misconceptions related to copyright, trademark and patent registrations. So far, we have collected 1521 posts (see Table 1) on ”copyright registration” from Reddit, covering the period from January to October 2022. In the initial stage, we will analyze these 1521 Reddit posts focusing on copyright registration.
Subsequently, we will mine and analyze Reddit posts concerning “trademark registration” and “patent registration”. Following this, we will conduct a similar data collection and analysis on Twitter and Facebook.
The expected outcomes of our research are as follows:
(1) The most misunderstood issues in IP registration.
(2) Whether specific misconceptions are associated with jurisdiction and IP types (e.g. trademarks/patents/copyrights). Regarding copyright, we will particularly check whether certain misconceptions are associated with certain subcategories of copyright, such as music works, visual works, literary works, etc.
(3) Suggestions to IP registration offices and legislators on how to address misconceptions based on the findings from outcomes (1) and (2).
Pilot results on copyright registration
For the initial stage, we focus on misconceptions surrounding copyright registration based on the 1521 posts collected on Reddit (see Table 1). We conducted a pilot analysis of the first 100 posts to validate our methodology and gain initial insights into the prevalent misconceptions. This abstract presents the results of this pilot analysis and discusses its potential implications.
The initial manual analysis of the first 100 posts reveals that 52 posts contained misconceptions and unclear knowledge. Among these 52 posts, 60% exhibited misconceptions regarding copyright registration (see Figure 1), which is not supervising given that the keywords used were “copyright registration’’. Furthermore, 21% of the misconceptions pertained to copyright infringement, 4% were related to copyright purposes, and 15% fell under other categories such as licenses (see Figure 1). The following paragraphs will report the misconceptions on copyright registration and copyright infringement.
Figure 1

1. Misconception on copyright registration:
Our analysis indicated that the most common misconception about copyright registration is related to subject matters eligible for registration, namely “what can be registered?”. This type of misconception accounted for 39% of the misconceptions regarding copyright registration (see Figure 2). In this type, People mistakenly believe they can register copyright for ideas, business names, or song names. For instance, one post stated, “I had to register my idea which was a good move….”
The second most prevalent misconception involves the necessity of registration, constituting 29% of the misconceptions surrounding copyright registration. In this case, individuals incorrectly believe that registration is mandatory for copyright protection or publication. For example, one post stated, “I didn’t send my original music just in case he was gonna steal it or anything because I haven’t registered it with copyright stuff.” Other misconceptions about registration relate to the registration authority, formality, function, jurisdiction, and consequences (see Figure 2).
Figure 2

2. Misconception on copyright infringement
Our pilot results indicate that 21% (see Figure 1) of the misconceptions involved copyright infringement. Among these infringement misconceptions, the majority (82%) concerned what behaviors constitute copyright infringement. Some individuals believe that good intentions justify the unauthorized use of others' work. One Redditor commented, “….the small profits we make from our shop we use it to help and to feed poor people…(so we are not infringement)”.
A smaller portion (28%) of the infringement misconceptions revolve around whether an earlier work has a valid copyright. Typical misunderstandings exist within this category. For instance, some Redditors mistakenly think that a work is free to copy as long as it is publicly available online. Others wrongly believe that a work lacks copyright if there is no copyright symbol (©) attached.
3. Jurisdictions:
Identifying the jurisdictions of the posts proved challenging as this information is not disclosed on Reddit. To address this problem, we use some clues within the posts, such as country names, cities (e.g. New York), or institutes (e.g. the USPTO) to infer the countries involved. Out of the 100 posts, 29 were identifiable with specific countries, while the rest remained unidentifiable. Among the 29 identifiable posts, 22 were from the US, 4 from Canada, 2 from the UK, 2 from Australia, and 2 from China (some posts involve with more than one country. So the sum of these numbers exceeds 29). The remaining posts originated from Japan, India, Mexico, and Taiwan.
Figure 3

Due to the limited sample size in the pilot analysis, significant associations between misconceptions and jurisdictions could not be established. It is possible that the associations are identified when all the 1521 posts are analyzed.
4. Copyright subcategories
Out of the 100 posts analyzed, 47 were identified with specific subcategories of copyright, while the remaining posts could not be identified (see Figure 4).The main subcategories are literary works (15 posts), artistic works (13 posts) and music works (13 posts). The primary subcategories observed were literary works (15 posts), artistic works (13 posts), and music works (13 posts). The remaining posts encompassed films, games, and mixed works. Mixed works are those that can fall into two or more subcategories. For example, cloth books for kids would be classified as both literary and artistic works.
Figure 4

Due to the limited sample size in the pilot analysis, no significant associations were found between misconceptions and copyright subcategories. Further analysis of all 1521 posts may reveal such associations.
5. Inconclusive conclusions
Our pilot analysis indicates a significant misunderstanding of registration subject matters and registration necessity. Regarding copyright infringement, the misconceptions primarily revolve around determining what kind of behaviors constitutes infringement. However, these patterns are far from any definitive conclusions because the sample size analyzed so far is very limited. Due to the same reason, no significant associations have been discovered between misconceptions and jurisdiction or misconceptions and copyright subcategories.
Regarding registration misconceptions, supposing a similar patten is observed in all 1521 posts, we would suggest that legislators and regulators (e.g. IP registration offices) make efforts to reduce misconceptions regarding registration subject matters and registration necessity. Misunderstandings regarding registration subject matters may lead to incorrect registrations for unqualified items, such as song names or business names, resulting in the waste of administrative resources and applicants' time and money. Misunderstanding the necessity of registration, such as the mistaken belief that registration is required to obtain copyright protection, could lead to excessive registrations, again wasting administrative resources and potentially delaying the publication or commercialization of works. These situations are undesirable for IP legislators, regulators, and creators.
Regarding infringement misconception, if the analysis of all 1521 posts confirms the same pattern observed in the pilot study (i.e., the main misconception revolves around determining what kind of behaviors constitute copyright infringement, these misconceptions may lead to two consequences. One is unconscious infringement, where infringers believe they are not infringing when, in fact, they are. For example, the belief that a righteous intention justifies the unauthorized usage of others' work would result in unconscious infringement. The other one is overcautiousness, where users mistakenly believe certain behaviors constitute infringement when they do not. Overcautiousness unnecessarily restricts access to and use of works, stifles future innovations, and hinders the maximization of value for existing works. For instance, if someone believes they are infringing by mentioning the title of another person’s song to make a critical comment, they might unnecessarily spend time seeking consent from the songwriter or limit their critical comments to songs in the public domain. Neither situation aligns with the goals of copyright law. Therefore, we recommend that regulators clarify these commonly misunderstood behaviors related to infringement. After analyzing the 1521 Reddit posts, we will provide specific suggestions on the most misunderstood behaviors and how to reduce these misconceptions.
Next step:
Our next step involves analyzing all 1521 posts to capture the main patterns of misconceptions based on this larger dataset. Additionally, we will investigate whether specific misconceptions are associated with particular jurisdictions and subcategories of works. Following the analysis of copyright misconceptions, we will expand our research to include trademark and patent misconceptions. We also plan to conduct similar research on Twitter and Facebook.
We hope to share our pilot results at EPIP2023 and collect feedback to improve the methodology.
Original languageEnglish
Publication statusPublished - 13 Sept 2023
Event18th European Policy for Intellectual Property (Epip2023) - Kraków, Poland
Duration: 11 Sept 202313 Sept 2023


Conference18th European Policy for Intellectual Property (Epip2023)
Internet address


