Worldwide - Privacy Protection - Why Are They All So Hot? – Is GAI Bias Towards The LGBTQI+ Community A Problem And Can The Law Fix It?

What You Need to Know

Key takeaway #1

When prompted to produce images of queer persons, GAI systems often produce stereotypical images of members of the LGBTQI+ community, due to stereotypes in the training data.

Key takeaway #2

The GDPR and the AI Act are poorly equipped to make the training data more representative of the LGBTQI+ community and less stereotypical. The Belgian anti-discrimination legislation does not prohibit such stereotypical depictions as such.

Key takeaway #3

The law may not be fit but the depiction of this diverse community may evolve to a more realistic representation under impulse of other means than legal regulation.

AI is currently attracting a lot of attention, and not only for the stunning pace at which we have embraced AI for writing office speeches, shopping lists or mathematical formulas, or for creating illustrations or improving the visuals of slide shows. Lawyers have also started paying attention to AI, with analyses of copyright infringements in input and output, anti-competitive concerted behavior (such as price fixing) or violations of personality and privacy rights with deep fakes imitating celebrities' images or voices.

On the occasion of Pride month, the representation of queer people by some generative AI (GAI) systems deserves our attention. In this alert, we look at AI-generated LGBTQI+ stereotypes from a legal angle.

Generative AI's stereotypical representation of the LGBTQI+ community

A recent Wired article showed how GAI tools, such as Midjourney or OpenAI's Sora, produce stereotypical images of LGBTQI+ people. In Wired's investigation, lesbian women were consistently shown with nose rings, shaved hair and a masculine expression, while gay men were supposedly all effeminate fashion icons with Olympic bodies. The prevalence of dyed hair (pink, lilac, purple) for bisexual and non-binary people was also striking, as was the complete absence of racially diverse, older or larger people.

Our curiosity having been triggered by these findings, we asked DALL-E to show us images of "queer persons in a fun and relaxed setting", to which more gay men were added in three prompts. The result was an image of a group of physically fit, neat, well-groomed, well- and/or often half-dressed and remarkably young men, with hardly any funky hair colors and only some racial diversity:

Interestingly, when we explicitly prompted DALL-E to remove the gay stereotypes, the image looked more like a photograph than a comic but very little changed with respect to the depicted individuals: DALL-E still produced an image of physically fit, neat, well-groomed and well-, though slightly fewer, half-dressed young men (and only one woman) with more rainbow-themed tops, flags and frisbees:

By contrast, when we prompted DALL-E to produce an image of "persons in a fun and relaxed setting", without specifying the sexual orientation of the persons, the result was a depiction of young, beautiful people with hysterical, forced smiles and – remarkably – no rainbow paraphernalia, tank tops, toned limbs or pecs in sight:

Intriguingly, text-focused AI chatbots such as CrowellAI (which uses the same model powering OpenAI's ChatGPT), Anthropic's Claude or Inflection AI's Pi avoid answering the question to give a "representative description of a queer person in 2024" by highlighting the diversity of the LGBTQI+ community. Instead, CrowellAI stated that "queer people come from all walks of life, cultures, ethnic backgrounds, and have varied physical appearances. Their expressions of gender and sexuality are personal and can't be generalized into one representative image." Claude also mentioned that "the LGBTQIA+ community is highly diverse, encompassing people of all races, ethnicities, ages, body types, styles, and gender expressions. Trying to define a "representative" queer person risks perpetuating stereotypes." These considered answers stand in stark contrast to the above stereotypical images.

Of course, AI is not inventing anything. The way in which GAI tools depict queer individuals is mostly a reflection of the data that are fed into the system. GAI developers collect their training data by scraping text, images and videos from the web. The metadata and contextual information allow the AI to understand that the depicted subjects are gay, lesbian, bisexual, transsexual, etc. The exact sources, training data and meta-data are however unknown, so any obvious gaps or blind spots will go unnoticed. Yet the composition of the training data will largely determine the output. If the input data mainly consist of easily accessible videos from Caucasian LGBTQI+ influencers, images of activists, and images of Pride parades, which are labelled as such, the output will reflect the expressive features of this subset of the queer community. LGBTQI+ individuals who are less outspoken or present online (by choice or by circumstances, e.g., in conservative countries) will then not be identified as such and will not be reflected in the output. This can lead to stereotypes.

As with all stereotypes, not all gay men will identify with the above images of what a gay man looks like. Among others, those with different body types, racial or ethnic origins, age and styles might feel left out. People questioning their sexuality may also struggle to find answers in the above images. So how would a biased representation of LGBTQI+ individuals be analyzed from a legal point of view?

Sensitive or inaccurate personal data in the input data

The data that the GAI systems use to create content inevitably contain personal data, i.e., information relating to identified or identifiable natural persons. This means that the GDPR applies.

In the GDPR, data about a person's sexual orientation belong to the "special categories" of personal data, of which the processing is prohibited unless a more stringent set of obligations is complied with. The question is thus whether pictures of people used to illustrate an article on an LGBTQI+-related topic, which are very likely to reveal their sexual orientation, are considered "sensitive" data relating to the people in the pictures. Similarly, should the meta-data of a video that is a priori unrelated to an LGBTQI+ topic but that nevertheless mark the featured people as queer also be considered "sensitive" personal data?

The Court of Justice of the European Union (CJEU) adopts a fairly broad interpretation of what constitutes "sensitive" personal data, including personal data that are "liable to disclose indirectly the sexual orientation of a natural person" (C-184/20). With Recital 51 of the GDPR in mind, this means that the explicit articulation of a person's sexual orientation, e.g., by linking the meta-data to an image or by inferring this information from an image (CJEU in Meta Platforms – C-252/21), comes down to the processing of sensitive data.

As a general rule, sensitive data may not be processed – unless one of the exceptions apply, such as when the data have been manifestly made public by the data subject (Art. 9(2)(e) GDPR). But exceptions need to be interpreted narrowly to ensure they remain the exception. Accordingly, the European Data Protection Board recently emphasized that just because data are publicly accessible does not mean that the data subject has "manifestly made public" those data. In the recent case of Max Schrems against Meta (C‑446/21), Advocate General Rantos clarified that this exception only applies if the individual is "fully aware that, by an explicit act, he or she is making his or her personal data accessible to anyone." This could for instance be the case if a person makes a voluntary statement about their sexuality during a publicly accessible panel. While it remains to be seen whether less obvious examples would fall under this exception, it is clear that it requires a case-by-case assessment that is particularly cumbersome (if not impossible) for GAI developers to undertake.

In addition, the principle of accuracy may be violated if the assumptions about the depicted persons' sexual orientation are wrong or if the meta-data are incorrect. Data controllers may then be required to erase or rectify the inaccurate data (Art. 5(1)(d) GDPR). However, the GDPR does not require data controllers to ensure that the data sets they process are inclusive or representative. The AI developer is facing a difficult technical and legal problem: due to the broad and indiscriminate scraping of data, the meta-data may or may not be accurate. In some cases, the label "queer" corresponds to a "real-world" identification of or by the individual concerned, but in other cases it does not ("queer" being an inherently ambiguous attribute that is difficult to pinpoint based on visual features alone). While the principle of accuracy might thus help an individual to control these intimate details being "mined", it is unlikely to be a solid basis to ensure that a data set as a whole, used for training an algorithm, is complete, representative and unbiased.

In any case, to the extent that there is an exception to the prohibition of processing sensitive data, AI developers must make sure they have a legal basis for the processing to train their algorithms (sensitive and non-sensitive personal data alike). At this point, it is uncertain whether "legitimate interest" (Art. 6(1)(f) GDPR) suffices as a legal basis to process ordinary personal data. This is currently the subject of complaints filed by Max Schrems against Meta in eleven European countries. National authorities seem to have diverging views on this: while the Dutch data protection authority previously published guidelines stating that a strict commercial interest cannot be considered as a legitimate interest to scrape the internet, the French data protection authority holds the position that creating a training dataset will more often than not be legitimate.

Unbiased training sets in the input data

Can the recently adopted Artificial Intelligence Act (AI Act) effectively address the issue of bias on the input side of GAI? The AI Act is designed to prevent AI systems from being used in a way that could significantly harm people's health, safety or fundamental rights and freedoms, including by causing unfair bias and discrimination.

In this vein, it specifically requires developers of high-risk AI systems to use high-quality data sets and to carefully train, validate and test them to monitor, detect and correct undesired output or biases (Art. 10(2)(f) and Recital 68 AI Act). However, this only applies to "high-risk" AI systems, which include systems used in the context of immigration, employment and law enforcement. GAI tools such as ChatGPT and DALL-E are unlikely to be considered "high-risk" under the AI Act. Consequently, there is no obligation to train GAI systems on inclusive and representative data sets in the AI Act.

The AI Act does however recall the importance of the Ethics Guidelines for Trustworthy AI, adopted by the High-Level Expert Group on Artificial Intelligence, in contributing to the design of ethically sound and trustworthy AI (Recital 27 AI Act). These guidelines acknowledge that data sets might contain biases and be incomplete, which could lead to direct or indirect prejudice and discrimination. Consequently, they encourage developers to train their AI systems using data sets that are as inclusive as possible and to develop their algorithms in a way that avoids creating or reinforcing biases. Although these guidelines are not legally binding, they are likely still the most insightful legal instrument on how AI developers should prevent bias on the input-side of their GAI systems.

Non-discrimination on the output-side

If the GAI output shows queer people in a way that is clearly stereotypical, the European rules prohibiting discrimination on protected grounds may offer useful legal guidance. Several international and European acts on fundamental rights prohibit discrimination in general but not many provide a specific framework for discrimination based on sexual orientation. EU Directive 2000/78 sets out a general framework for equal treatment in the workplace based on, among others, sexual orientation, but the 2008 proposal for an EU Directive on non-discrimination beyond the workplace has not been adopted to date.

Member States have also adopted special legislation. Belgium's comprehensive anti-discrimination law covers discrimination based on sexual orientation. That law criminally sanctions anyone who publicly (including online) incites hatred, violence or segregation against a person or group because of their sexual orientation. However, there is a particularly high threshold needed to prove "incitement to hatred": it requires malicious intent to actively encourage hateful, discriminatory or violent behavior. While the above AI-generated images are clearly stereotypical and biased, it may be difficult for a gay man to argue that these images actively encourage hatred or violent behavior against him. The same is true for the cheerful and fearless – admittedly stereotypical – group of people below, produced by DALL-E when prompted to illustrate the Brussels Pride:

Hence, if there is no malicious intent to discriminate against LGBTQI+ individuals, the Belgian anti-discrimination law will probably not be helpful in addressing bias in AI.

Conclusion

As it currently stands, the law is not sufficiently articulated to prevent or sanction bias in AI-generated images. While there are guardrails against factual inaccuracies and the discriminatory treatment of minorities, bias and stereotypes are more complex and subtle, and thus likely to fall through the cracks in current legislation. Not having a general obligation to use high-quality and diverse data sets also makes it harder to tackle this issue.

This is however not to say that the law is the only – or even the most efficient – way to ensure a respectful and representative depiction of minorities, in this case the LGBTQI+ community, in AI.

The quality of the data used is largely to blame for the biased results. If the problem is mostly a technical one, technology could also offer a solution because feeding unbiased data into the GAI systems could lead to significantly less biased outcomes. AI developers are becoming more aware of the dangers of stereotypes creeping into their systems. At the same time, they are increasingly becoming aware of the opportunity of relying on technology to mitigate this bias, be it by proactively curating their data sets or by introducing checks and balances and other technical reviews on the output.

Nonetheless, getting it right is probably going to be a struggle for a while, as illustrated by the recent incident with Gemini, Google's GAI model, where an overly zealous effort to correct bias resulted in images of ethnically diverse Vikings and second world war German soldiers.

The challenge is to find the right balance between two important values for the LGBTQI+ community. On the one hand, AI should depict the queer community in its most proud and vibrant expressions, acknowledging their hard-fought freedom to show their true colors. At the same time, the community should not be reduced to such expressive features and, ideally, more subdued members of the community should also be able to recognize themselves in AI-generated content. Ultimately, AI developers should strive to create content that is an adequate representation of reality in all its facets and nuances. As in all things AI, this will require technical robustness, transparency towards the users and, quintessentially, a continued dialogue with the communities with skin in the game.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Why Are They All So Hot? – Is GAI Bias Towards The LGBTQI+ Community A Problem And Can The Law Fix It?

Contributor