101translations

Categories
Interesting facts

Interesting History | Cyrpus

Cyprus has a famously complex history and set of identities, but not everyone knows that some parts of Cyprus are still under British governance. The small areas of Akrotiri and Dhekelia have been British Overseas Territories since Cyprus became independent in 1960. Cyprus has been asking for the return of these territories ever since, but so far Britain has resisted giving them back because of their strategic location. To maintain the situation, Britain forbids the permanent settlement of these areas by local residents. 

Categories
Culture Interesting facts

LLMs are opaque search engines – change my mind

I am going to start with stating the obvious, and continue in a bit of a roundabout way, so please bear with me here…

When someone enters text into a search engine, their purpose is not normally to see how many or which web pages contain those words, but to find something out, say for example how to clean the filter on their dishwasher, or how to buy a widget. So far, so obvious.

So, in a way, Large Language Models (LLMs) are very much like search engines: they distil the content they find and present it in a way that, hopefully, will result in the user finding the answer they seek. The problem is that LLMs are much less transparent in how that occurs.

Now, I may be showing my age, but I remember when the first search engines came online. Initially, they were simple systems that used something akin to SQL queries to search pages containing certain keywords and presented them to the user. If I wanted to find out how to clean the filter on my dishwasher, for example, I would type “+clean +filter +dishwasher” into Altavista and it showed me the pages containing those three words in the hope that at least one would contain the relevant information.

This system relied on the assumption that the authors of the indexed pages wrote them without considering that they would be indexed by, and accessed through, a search engine. Once more people started using search engines to access information, the authors of web pages realised that their visibility, and therefore their revenue, depended on the results delivered by search engines. This changed how content was written and presented – one stopped writing for readers and started writing for search engines. It was the birth of Search Engine Optimisation. At that point, Internet users started encountering pages created specifically to take advantage of the search engine, with titles like: “Clean the filter on your dishwasher” followed by a deluge of spam and virus links, which the search engine was not equipped to filter out.

Search evolved with “smarter” engines like Google, which devised search technologies to outsmart SEO techniques and keep the results of the search relevant to the users.

As everyone here knows, an online search has three actors with often divergent goals:

  • Users want to find the information they are looking for.
  • Search engines want to keep users coming back but also to direct them to paid advertising.
  • Content strategists want to either “trick” the engines into sending users to their advertising pages as if they were informative or accept the advertising model – in other words, to adopt a SEO and/or a PPC strategy, respectively. Even here, it is in the engines’ interest to maximise the amount of money that advertisers pay for each actual lead, which against the advertisers’ interest.

 

This divergence of interest, especially between search engines and page optimisers, has created an “arms race” of techniques. SEO tries to create content so that search engines will present it as relevant, while search engines try to filter out that content so that people will either click on paid content or find actual information.

Again, nothing new here.

Back in the day, search engines like Altavista worked on the assumption that the web pages it indexed had not been created specifically to game the system. Today, Google works on the assumption that it can always stay a few steps ahead of the pages that actually aretrying to game it (with varying degrees of success). Similar to Altavista, today’s LLMs rely on the fact that the content they process was not designed with them in mind. In other words, they are using a “naïve” dataset. This won’t last long, however.

Right now, only a minority of people use LLMs like ChatGPT to get answers. But as LLMs are used by more people, the commercial potential of nudging those tools will become greater. Because of the lower transparency of the LLMs in distilling the content of their training sets, this is more complicated than stuffing a page with keywords. If there is a “LLM-optimisation” industry it’s still in its infancy, but it’s likely that the conflict of interest between the tools used to digest the Internet’s content, and the makers of that content, will create the same problems of relevance in the LLM world as they do for search engines today.

It is also possible that there will be insurmountable technical obstacles to influencing the output of LLMs – more than there are with Google results today. After all, SEO was a lot easier with Altavista. Today, it is much more difficult, largely because of the relative lack of transparency in how Google produces its results. LLMs are very opaque regarding how the knowledge they collect from their datasets is distilled into an answer. It would be quite unprecedented in the history of internet searches for it to become impossible to trick search engines, but ultimately it may come down to the difficulty of creating sufficiently large “biased” datasets.

As LLMs become mainstream tools, we will see the same scenarios that we saw with search engines play out: conflicts of interest between users, LLM providers and creators of content, as long as the latter can shape their output to sway the results of LLMs in their favour. And lurking above all this is the eye of regulators, who will certainly have plenty to say if LLMs start to substantially influence what people think about the world.

Categories
Interesting facts

Nunchi: Korean Intuitive Insight

“Nunchi” (pronounced: noon-chee) is a Korean concept highly valued in Korean culture. It refers to the ability to intuitively understand others’ emotions and intentions, without the need for explicit communication. People with good ‘nunchi’ are adept at sensing social cues and adapting to various situations gracefully. 



Categories
Interesting facts

The History of Mime

Mime, an ancient and globally practiced art form, traces its origins back to civilizations across history. From the Greek tradition of pantomime to the Italian Commedia dell’arte of the Renaissance era, mime has evolved and flourished in diverse cultural contexts. Through expressive movements and gestures, mime artists convey narratives, emotions, and ideas without the need for spoken language. 

Categories
Interesting facts

From Octothorpe to Hashtag: The Evolution of a Symbol

The symbol “#” is commonly known as the hashtag today, but its original name is the octothorpe. The “octo-” prefix refers to its eight points, and “thorpe” was a playful addition. In the context of telephony, the octothorpe symbol was used on switchboards to represent the number “8” in the 1960s. The term “hashtag” gained prominence with the rise of social media.

Categories
Interesting facts

The origin of the word ‘Robot’

The term “robot” was coined by Czech writer Karel Čapek (1880-1938) in his 1920 play “R.U.R.” It is derived from the Czech word “robota,” meaning forced labour. The play explores the consequences of creating artificial life to serve humans, and the word “robot” has since become ubiquitous in science fiction and technology discussions.

Categories
Interesting facts

The ampersand symbol

The ampersand symbol “&” originated as a ligature of the letters “e” and “t,” representing the Latin word “et,” meaning “and.” The symbol evolved from the handwritten combination of these letters into a distinct symbol over time. The term “ampersand” is a corruption of the phrase “and (&) per se and,” historically used when reciting the alphabet to acknowledge that “&” represented the word “and.” This linguistic evolution reflects the dynamic nature of symbols and their adaptation throughout history.

Categories
Interesting facts

Sign Language Diversity

Did you know there’s no universal sign language? Each country, like the U.S. with ASL and the U.K. with BSL, has its own. In fact, there are an estimated 300 sign languages that are currently in use worldwide.  

There is ‘International Sign’, which is not a full language but a cross-cultural communication method and is used globally at events like the Deaflympics. While more Western-friendly, it’s not as clear to those from Africa and Asia. Still, it shows how sign languages adapt for shared understanding.



Categories
Interesting facts

Rotokas and English: Phonemic Contrasts

Rotokas, spoken in the mountains of Bougainville, Papua New Guinea, boasts one of the world’s smallest phonemic inventories with just 11 phonemes. 

 

For comparison, English has 44 phonemes. A phoneme is generally regarded as “a set of speech sounds that are seen as equivalent to each other in a given language”. So even if the “k” sounds in the English word “kill” and “skill” are not exactly identical, we see them as equivalent within the language, and that’s why we call it a phoneme.



Categories
Interesting facts

The document available in 500 languages

The Guinness World Record for the most translations of a single document goes to the Universal Declaration of Human Rights. Available in over 500 languages, this foundational document has been translated more widely than any other text, emphasising its global significance in promoting human rights and dignity.