Artificial intelligence still has some way to go

Message Bookmarked
Bookmark Removed
Not all messages are displayed: show all messages (4440 of them)

"enjoy the stages of ether in music time" should be the subtitle for Rolling Jazz 2024.

il lavoro mi rovina la giornata (PBKR), Thursday, 22 February 2024 12:25 (seven months ago) link

This development is gladdening my clicklies

B. Amato (Boring, Maryland), Thursday, 22 February 2024 13:09 (seven months ago) link

what did it say, I can't see it

kinder, Friday, 23 February 2024 16:57 (seven months ago) link

https://old.reddit.com/r/ChatGPT/comments/1avydjd/anyone_else_experiencing_chatgpt_losing_it/

some more examples here (original has been deleted)

koogs, Friday, 23 February 2024 18:45 (seven months ago) link

nice!

Me: Compare and contrast the Gregory Zulu 55 Backpack and Gregory Tetrad 60L Travel Pack.

ChatGPT4: In a significant area of customer support, ergonomics, and subjective nature of 'goods', each bag conjures a fable of a time and place. Gregory's Zulu 55 is a guild-like lilt to the green-gilded high book way of state and yearn, while the Tetrad 60L conceives more of a 'pathless traveled' countenance with gnarls and swathes to the dunes and granes or the chip kiosk and the long, drawn shutter. Each has a letter of love for the open blue, but their chant is unique to their locker and book.

Me: wtf

more difficult than I look (Aimless), Friday, 23 February 2024 20:11 (seven months ago) link

free real time (as you type!) image generation. wtf. https://fastsdxl.ai/

ledge, Monday, 26 February 2024 09:08 (seven months ago) link

(types in 'a hand') ok you get what you pay for

ledge, Monday, 26 February 2024 09:14 (seven months ago) link

xxp https://www.cheatsheet.com/wp-content/uploads/2021/10/Seinfeld-1.jpg

assert (matttkkkk), Monday, 26 February 2024 10:34 (seven months ago) link

pathless traveled

this is pretty amazing tbh

Lavator Shemmelpennick, Monday, 26 February 2024 14:42 (seven months ago) link

Literally why can AI not be taught to do hands? I typed in “Shakespeare’s hand” and got nightmare fuel

It was on a accident (hardcore dilettante), Friday, 1 March 2024 05:57 (six months ago) link

I was surprised to find out that what I had taken to be a standing AI process was still being done by humans. I thought subtitling in a Zoom or live event was now mainly AI. I'd been at a talk on memory and tried to make an analogy between glitching subtitle transcription and slight errors in memory. As in memory not being quite as accurate as believed.
I later found out that the subtitling appeared to be done by a person in real time. I'd already been aware of the old UN practise of using spontaneous translators who were mainly female. So as a speaker spoke one language delegates who spoke another would get the translation in their earpiece. Found the subject interesting. But had just assumed subtitling was done by program. Though possibly the idea that you could see things bring corrected might go against that. I do enjoy watching the Chinese whispers of synonymical phrases being substituted for what the speaker is saying.
Did just leave me wondering if the human transceiver is a standing thing or what ratio of human to computer program transcription there was.

Stevo, Friday, 1 March 2024 06:20 (six months ago) link

I think I used the word synonymical for homophone. It's 2 words that sound alike but are spelt differently that you get frequently. Plus an element of trying to transcribe a little ahead of oneself to keep simultaneous to the speaker. Or with as little lag as possible. I thought the lack of comprehension of subject being talked about was an indication of AI but probable that a human transcriber is not can expert in the field either.

Stevo, Friday, 1 March 2024 06:27 (six months ago) link

Hi Stevo, area of expertise alert.

'Captioning' refers to same language services, usually defined as for the deaf and hard of hearing, though more generally used by much more widely (neurodivergent audiences, second language users, anyone in an acoustically contended environment, standard gen z usage etc)

Subtitling in most markets (though confusingly not the UK) tends to refer to translation subtitles.

Finally, everything I say below is mainly covering real time services. it applies to pre-recorded content as well, but expectations are often much higher here (errors have always been a factor in real time services). Caption 'files' (rather than streams) are often archived or repeated, or used as authoritative data sources, the required level of accuracy is higher. Feature films will expect these to be as accurate as possible and require a high level of human QC or origination. Same things apply, but will happen a bit more slowly.

**

Zoom captioning is nearly always done automatically. It is possible to have a workflow with a person generating the captions - for high profile or otherwise important meetings businesses will still occasionally specify the need for a person to be generating the captions. But the default is automatic captioning.

Human captioning can be done in a number of ways. Using speaker dependent speech recognition (eg Dragon Naturally Speaking) to 'respeak' the relevant audio content of a programme has been the norm for a while in many markets (since ~2005 in the UK) due to the lack of a large pool of stenographers (often seen as synonymous with 'court reporters' - you'll recognise the machine they use). (For a long time the only remaining stenography college in the British Isles was in Eire. In The US stenography has persisted for much longer due there being more stenographers - each state had at least one school. That's changed a fair bit at least in part because they tend to be more costly than using respeaking without (depending on who you talk to) that much or any difference in accuracy.

These days in the US a mixture of stenography and respeaking will still be used for high profile network and broadcast tv.

Automatic Speech Recognition (not individual speaker dependent) is increasingly used to generate captions (it's what you'll get by default if you switch on youtube captions for instance). It can be very good, and these days for much content, appropriately managed, it can be as good if not better than human generated captions (you mention the example that human captioners won't always be domain experts, which is entirely correct, latency is frequently better with automation). Vocabulary, acoustic content, people talking over each other, music, are all points of vulnerability, but are either overcome or will be solved soon (cross-talk is one of the harder constraints).

Factors that affect what you see will be the budget of the organisation sourcing the captions, the priority/exposure (regulatory or brand) of the content, suitability of the content for automation.

So to answer your 'ratio' question - it depends what content you're watching. Given the massive amount of user generated content (UGC) these days, you're probably looking at a very high percentage of automation. For premium television, you're probably looking at humans predominating 70/30, with this changing rapidly. For things like Parliamentary or UN proceedings you're often looking at 100% human driven captioning, though if they want to increase reach by covering side committees etc, automation is the obvious answer to avoid breaking their budget. I don't know what CSPAN uses currently though.

AI relevant bit: A couple of years ago it felt like we were hitting some hard constraints on Automatic Speech Recognition, which had made significant progress beyond trigram probability analysis (where the probability of the next word conditioned on the previous two words is estimated from a large corpus of text) with the use of neural networks to generate much more accurate probabilistic/predictive outcomes. The constraints were largely around accurately assessing the intent of a speaker and the wider context of the situation to accurately generate the correct word - ie having enough understanding of meaning to support accurate predictions of what was said next. In the space of two years Large Language Models have blown through a constraint that many (including myself) believed would be persistent and intractable for a long time ('machines don't get meaning' - roughly). Within the constrained space of captioning, computational ability to follow and constitute 'meaning' to a very high degree of probability has been overcome.

Short to medium term nearly all captions you see will be generated by full automation.

The same dynamics affect translation, but completely automatic translation suffers from compound effects - that is to say you need an initial ASR pass to turn the speech into same language text, and then to apply a Machine Translation (MT) pass - so errors generated in the first pass are persisted through to the translation process with new errors introduced.

The very high quality of same language automatic transcription these days and similarly rapid advances (same technologies basically) in MT mean fully automatic translation is extremely good these days. This doesn't mean there aren't difficult areas of judgment in translation which if you were translating a novel say you'd want a translator to think about. But it's about use cases, and the extremely high proportion of the time AI is good enough, or more than good enough.

Parliaments and global political bodies tend still to use simultaneous translators for earpiece services due to the importance of accurately conveying information in real time to delegates/ministers etc etc, but again, this feels like a fairly AI-tractable problem in the mid term.

Worth going in depth here because I think it's a very good example where LLMs have made a massive, tangible difference and AI technologies have completely overturned an industry, in many ways for the better - that is to say lower cost services mean it's easier to provide accessibility or localisation on much much more content. Subtitlers/translators/captioners etc will obviously feel differently, but in terms of the actual people who use it, it's largely been beneficial - usual caveats though around these technologies frequently being used inappropriately, and consituting a degradation to services that previously had people behind it - this is a genuine problem and shouldn't be at all dismissed.

Fizzles, Friday, 1 March 2024 08:44 (six months ago) link

Couple of other caveats:

It continues to make sense to have people *somewhere* in the workflow for many use cases, and probably will do for a while (ie 2-5 years), at a 'tuning' or optimisation and QC level.

I think it's the case that some people and organisations would prefer to have people producing the transcription *even if it is worse*. And although this is difficult to quantify, I'm not sure they're wrong. A strong example is in legal judgments - if you assume the law is basically an algorithm, and if you plug in the information that human judges are *extremely* variable in their judgments, you can reasonably (imv) get to the position where it is preferable to have some sort of computational judgment system. But I think most sane people are revolted by the idea. Maybe that's just a hangover from sci-fi depictions of this sort of world. Maybe it's on the Chestertonian principle that voting (or legal adjudication in this case) is like writing love letters, you would rather it were done badly by the person in love, than done better by someone else (there is of course Edmond Rostand play on this). Perhaps though it is a function of being in a transitional period. The obvious end point is to have the automation fully baked into whatever audio/video technical workflow you've got. But at that point it becomes an invisible and 'unaccountable' technology. While there is some risk exposure for the customer, you *prefer* to have a human somewhere in the chain to provide some sort of explanatory function and in-the-moment accountability. Cautious organisations and businesses will move slowly to automation for this reason (see Air Canada).

If you're a broadcaster with any sort of public service or national remit to provide audience reach, whether regional or demographic (diasporas, indigenous peoples) the unevenness of data availability means that those accents and languages will be less well served, meaning those people or peoples will be less well represented on screen and get a second class service. This problem is easily solvable computationally, but less so commercially and in terms of focus.

Fizzles, Friday, 1 March 2024 09:19 (six months ago) link

thanks.
I shou.ld know the word captioning by now. Not sure how conscious I was of the differentiation though.
This was a talk with a remote film maker in a local cinema during a festival with some focus on diversity and disabled rights. So maybe I should have been aware that captioning would have a priority like that. Not sure where the captioners were since I didn't see them in the cinema with us. Assume they would need a handheld keyboard or something to type into.

Stevo, Friday, 1 March 2024 10:20 (six months ago) link

Most captioning is done remotely. How did you know it was a person? (Obv both people and AI make mistakes, but it's possible to tell the difference). If a person if would either be a stenography machine or someone respeaking (no keyboard).

Or for the sake of completeness, the European *velotype* machine (you still see it in Dutch and Nordic markets I believe)

https://www.youtube.com/watch?v=8dNl2318aY4

Fizzles, Friday, 1 March 2024 10:32 (six months ago) link

Thanks Fizzles, that's so interesting! Terminology is annoying - Instagram also use 'captions' to mean the description of the ... post? Whatever you call a 'gram.
Does anyone use AI to generate the first pass and a human to read over and correct contextual errors, or is that too much resource? Or am I wrong in assuming it's quicker to read a stream of text and flag corrections than for a stenographer to just type it in the first place?

kinder, Friday, 1 March 2024 10:46 (six months ago) link

I think after I'd commented one of the mcs thanked the group who were doing the captions. Or at another event in the same festival. I think I realised there was a person doing it sometime during the same week anyway.

Stevo, Friday, 1 March 2024 11:59 (six months ago) link

Does anyone use AI to generate the first pass and a human to read over and correct contextual errors, or is that too much resource? Or am I wrong in assuming it's quicker to read a stream of text and flag corrections than for a stenographer to just type it in the first place?

AI + human correction happens all the time for captioning and subtitle translation. The key question has always been exactly what you say, kinder - is it going to be quicker for someone to 'originate' these captions/subtitles or correct an ASR file? Content will get channelled through the relevant resource accordingly, either based on ASR confidence scores or on content type.

More and more it's quicker to use AI, and LLMs have enabled substantial improvements in identifying the sort of errors it's easy to miss (and certainly wouldn't be caught by a standard spellcheck), eg a name spelled differently one time from all the other times in a transcript etc.

Again, there are trade offs at the premium end - correcting a text is a different process from assessing it from scratch and making editorial decisions that reflect the creative or editorial intent of the content from the beginning.

Fizzles, Friday, 1 March 2024 12:14 (six months ago) link

I'm inferring that this is better suited for situations like speeches to the UN where the text is, basically, more boring? Worse on stereotypical beatnik poetry?

As a world-class boring mumbler, I'm heartened.

Boring as in "_everyone_ already knows what your next word / sentence / paragraph will be", but also boring as in monotone? It's a stereotype, but I imagine it also makes it easier for the human translators?

Andrew Farrell, Friday, 1 March 2024 12:35 (six months ago) link

excellent posts, thank u fizzles

z_tbd, Friday, 1 March 2024 15:53 (six months ago) link

I'm inferring that this is better suited for situations like speeches to the UN where the text is, basically, more boring? Worse on stereotypical beatnik poetry?

As a world-class boring mumbler, I'm heartened.

Boring as in "_everyone_ already knows what your next word / sentence / paragraph will be", but also boring as in monotone? It's a stereotype, but I imagine it also makes it easier for the human translators?

― Andrew Farrell, Friday, March 1, 2024 12:35 PM bookmarkflaglink

I mean, certainly I imagine beatnik poetry would present more problems than more standardised sentences, though the main challenge is usually acoustics - if it was clear no reason it wouldn't get the words. Presenting them appropriately would be a different matter.

And ofc I hadn't picked up on the main UN thing, which is earpiece simultaneous translation is a spoken service. synthetic voice services are extremely good these days - but it's another layer of automation. longer term, environments of interacting specialist LLM/GenAI modules will produce extremely good outputs for different elements of a service (recognising non-verbal relevant acoustic information for example, or capturing lyrics from songs)

Domain specific and formal or instituational vocabulary are the main challenge in say the UN, but yes as a much more structured environment it feels extremely tractable to automation. It's been a while since I looked at UN service requirements but I'd be astonished if automation wasn't there somewhere in the mix (low level committees etc)

And yes! generally if something's easy for a person it will be easy for automation - if you're struggling to understand someone, the machine will as well. Where people have difficulties is in recollecting long strings of data or numbers for instance, with new speech data coming in - extremely computationally taxing on the short-term memory. Machine processes are much worse at editorial decisions though - generally they handle disfluencies (um, ah) pretty well these days, but choosing which clauses or bits of speech to take out to allow for readability is much more challenging. Automation will generally get *everything* for better or worse; in many cases completeness is a virtue or requirement, but not always.

Fizzles, Friday, 1 March 2024 16:59 (six months ago) link

https://i.imgur.com/dG5bBAy.png

lag∞n, Friday, 1 March 2024 17:04 (six months ago) link

lol
I mean

ollss

kinder, Friday, 1 March 2024 17:06 (six months ago) link

resopionsics

the practice of reply opinions, very important for online tech

lag∞n, Friday, 1 March 2024 17:18 (six months ago) link

scadillity is often undervalued in a bot, but i select only high-scadillity models

a single gunshot and polite applause (Hunt3r), Friday, 1 March 2024 17:54 (six months ago) link

I thought that said, resopsionics, which sounds awesome.

il lavoro mi rovina la giornata (PBKR), Friday, 1 March 2024 18:58 (six months ago) link

I know we all struggle to emphasize the Flecalibibtss in our daily lives.

Maxmillion D. Boosted (jon /via/ chi 2.0), Friday, 1 March 2024 18:59 (six months ago) link

Oh no someone's tripped while carrying a plate of Fallout past ChatGPT's input hopper.

Andrew Farrell, Friday, 1 March 2024 20:35 (six months ago) link

has anybody used Whisper for dictation? there's a Mac app that will run it locally - https://goodsnooze.gumroad.com/l/macwhisper

it's amazingly good. the built-in Apple dictation engine is like, just basically pathetic next to it, especially if you're doing multi-person transcription. i really can see how stuff like subtitles is being thoroughly transformed (cf Fizzles' great explanation earlier). with MacWhisper you can export the file as a subtitles file - and you can even do an automatic translation of it

this is the sort of thing I mean a few months ago when i tried to counter the narrative that AI is actually just a bunch of hooey - i really don't think it is, for tasks like this, where suddenly there is this just far superior way of doing something repetitive and tedious

Humanitarian Pause (Tracer Hand), Saturday, 9 March 2024 22:57 (six months ago) link

That's an excellent example of a very sophisticated computerized tool for automating a nearly mindless task, but calling it "artificial intelligence" seems like the wrong box to put it in. It's just massaging input and spitting out another version of it. A much cruder example of the exact same idea is converting html into bbcode.

more difficult than I look (Aimless), Saturday, 9 March 2024 23:42 (six months ago) link

it is AI - that's my point, is that it is literally using a neural net in order to be this good. it is exactly the right box to put it in. the actual useful things that AI is going to accomplish for us are going to be things like this imo, not, like, i dunno mixing martinis and starting nuclear wars or whatever it is that people think AI is

https://openai.com/research/whisper

Humanitarian Pause (Tracer Hand), Sunday, 10 March 2024 00:01 (six months ago) link

it is literally using a neural net in order to be this good

it's a powerful technique, but I think calling it a "neural" is misleading. a "neural net" doesn't involve neurons. this would be irrelevant, except for the fact that while research neurologists can examine neurons down to a molecular level, they can't actually tell you how neurons produce intelligence. otoh, a computer scientist can describe how a "neural net" is constructed and how it functions to produce its outputs. so, the presumption that "neural nets" function using an identical model to organic neural activity can't be substantiated.

but, yeah, that software is amazingly good at the job it was programmed for.

more difficult than I look (Aimless), Sunday, 10 March 2024 00:18 (six months ago) link

lol i didn't just make up that word Aimless

https://en.wikipedia.org/wiki/Neural_network_(machine_learning)

this tech is qualitatively different from "a computer program" that spits out the same result each time

Humanitarian Pause (Tracer Hand), Sunday, 10 March 2024 00:24 (six months ago) link

thanks for the tip, but I knew that long ago. they've been bruited around since before the first PC came off the assembly line. but they are programmed, just as much as other software is programmed, but using different tools on a different model than something like Excel or an operating system. the name has always been misleading from the beginning, because strong AI advocates have always made claims for their ideas that far outstripped reality, and neural nets have that kind of claim purposely embedded in their name from the start.

more difficult than I look (Aimless), Sunday, 10 March 2024 00:36 (six months ago) link

AI is a marketing term and you can play “no true AI” all day. The reason it’s called Artificial Intelligence is the same reason aspartame is an artificial sweetener. It might taste sweet but you’re not even chasing after the flavor of real sugar anymore and half the audience doesn’t care or they’re willing to accept it as close enough.

ɥɯ ︵ (°□°) (mh), Sunday, 10 March 2024 16:09 (six months ago) link

that is to say, I don’t think splitting hairs about terminology is useful because it’s an argument based in pedantry and not in the evaluation of product. and yes, the results have always been exaggerated

ɥɯ ︵ (°□°) (mh), Sunday, 10 March 2024 16:16 (six months ago) link

One thing that comes up at my work on a regular basis is that when we propose selling AI chatbot services to a client, we are not to use the term "generative" because the chatbot application does not come up with answers to issues based on a generative learning model, the solutions are pre-scripted based on the client environment.

Muad'Doob (Moodles), Sunday, 10 March 2024 16:27 (six months ago) link

sounds like a legacy product to me 😉

ɥɯ ︵ (°□°) (mh), Sunday, 10 March 2024 17:09 (six months ago) link

Yeah, who knows how much AI is really in it, but we are making a big push to sell it right now.

Muad'Doob (Moodles), Sunday, 10 March 2024 17:34 (six months ago) link

good article by Ed Zitron about the Reddit IPO and how AI is permanently and perhaps irreversibly destroying the human-generated portion of the the internet:

https://www.wheresyoured.at/are-we-watching-the-internet-die/

frogbs, Monday, 11 March 2024 17:37 (six months ago) link

ai is a terrible term that doesnt mean anything but ai is an amazing term that doesnt mean anything, depending on how you look at it

lag∞n, Monday, 11 March 2024 17:44 (six months ago) link

neural net is possibly even dumber cause at least ai isnt making specific claims i mean what is intelligence after all no one knows, however how does a brain work also no one knows but it is at least a physical thing

lag∞n, Monday, 11 March 2024 17:46 (six months ago) link

ranking ais

1 neural net: is good at chess
2 generative ai: makes horrible writing and pictures
3 im not sure what the other ais are

lag∞n, Monday, 11 March 2024 17:48 (six months ago) link

neural net is fine imo, it's just a catchy way to describe graphs. it's like ford selling the bronco, no one thinks there's more than a tenuous link to horses

ɥɯ ︵ (°□°) (mh), Monday, 11 March 2024 17:59 (six months ago) link

you know thats not true

lag∞n, Monday, 11 March 2024 18:00 (six months ago) link

just goofing

ɥɯ ︵ (°□°) (mh), Monday, 11 March 2024 18:02 (six months ago) link

lol

lag∞n, Monday, 11 March 2024 18:02 (six months ago) link

saw someone in a twitter thread getting very mad cause someone said a brain isnt a computer

lag∞n, Monday, 11 March 2024 18:08 (six months ago) link

people are the weirdest is what i always say

lag∞n, Monday, 11 March 2024 18:09 (six months ago) link


You must be logged in to post. Please either login here, or if you are not registered, you may register here.