This works pretty well. I tried it with this guidance prompt:
You are both pelicans who work as data
journalist at a pelican news service.
Discuss this from the perspective of
pelican data journalists, being sure
to inject as many pelican related
anecdotes as possible
Against this article: https://simonwillison.net/2024/Oct/17/video-scraping/You can listen to the 7m40s resulting MP4 here: https://simonwillison.net/2024/Oct/17/notebooklm-pelicans/
Example snippets:
You ever find yourself wading through
mountains of data trying to pluck out
the juicy bits? It's like hunting for
a single shrimp in a whole kelp forest,
am I right?
And: The future of data journalism is
looking brighter than a school of
silversides reflecting the morning sun.
Until next time, keep those wings
spread, those eyes sharp, and those
minds open. There's a whole ocean
of data out there just waiting to be
explored.
NotebookLM is contributing to fake podcasts across the internet, with over 1,300 and counting:
https://github.com/ListenNotes/ai-generated-fake-podcasts/bl...
Google is taking a different approach this time, moving quickly. While NotebookLM is indeed a remarkable tool for personal productivity and learning, it also opens the door for spammers to mass-produce content that isn't meant for human consumption.
Amidst all the praise for this project, I’d like to offer a different perspective. I hope the NotebookLM team sees this and recognizes the seriousness of the spam issue, which will only grow if left unaddressed. If you know someone on the team, please bring this to their attention - Could you please provide a tool or some plain-English guidelines to help detect audio generated by NotebookLM? Is there a watermark or any other identifiable marker that can be used?
Just recently, a Hacker News post highlighted how nearly all Google image results for "baby peacock" are AI-generated: https://news.ycombinator.com/item?id=41767648
It won't be long before we see a similar trend with low-quality, AI-generated fake podcasts flooding the internet.
Where do you get the "low-quality" part from - my experience with NotebookLM is that they create much higher quality, more informative, more fact based, and more concise podcasts than 99% of the stuff I listen to. I've mostly switched entirely over to NotebookLM for my podcast listening. They, generally, offer a far higher quality experience from my perspective.
Maybe you have the problem backwards - we accidentally end up listening to non NotebookLM podcasts?
It's interesting assumption that by virtue of being AI generated, it's considered bad/fake. 20 years ago, people hated how photoshop changed the photo design industry, NotebookLM is knocking on the door now.
This doesn't strike me as much of a problem as it appears for you. What are the biggest issues you foresee?
I'm an avid podcast listener, but I already ignore 99.9% of podcasts out there. I'm not concerned that this is going to become 99.99%.
If these AI generated podcasts are all bad, I will just continue to ignore them. If some turn out to be good, it seems like a win to me.
If you're worried about an existential "what happens to the world if all media is machine generated", I guess I'm willing to hop on the ride and see what we find out.
99.9? There are roughly 3mm podcasts out there right now - I listen, regularly, to about 10 over a year (in any given week maybe 3-4). I'm therefore ignoring 2,999,990 or 99.9997% of podcast. I definitely agree with you that this isn't a problem.
(Also - ironically, one of the podcast out of those 10 that I listen to regularly - it's the Deep Dive on AI. A NotebookLM production! )
This is like saying: “Text based LLMs should do more to stop people from publishing the results of what they produce”
NotebookLM seems wonderful for digesting various content in an alternative way. It’s not a “fake podcast” either.
Nobody is saying that the audio output should or should not be published somewhere. That’s a user decision for both publishing and subscribing.
Indexes and discovery on the internet is where you advocate policing instead of nit picking a useful tool.
> it also opens the door for spammers to mass-produce content that isn't meant for human consumption.
What's new? Every novel class of genAI product has brought a tidal wave of slop, spam and/or scams to the medium it generates. If anyone working on a product like this doesn't anticipate it being used to mass produce vapid white-noise "content" on an industrial scale then they haven't been paying attention.
This is definitely not a new issue.
What I’m aiming for is to ensure that the NotebookLM team is aware of the impact and actively considering it. Hopefully, they are already working on tools or mechanisms to address the problem—ideally before their colleagues at YouTube and Google Search come asking for help to fight NotebookLM-generated spams :)
It's certainly easier for the creators of genAI to build detection tools than for outsiders to do so. AI audio detection is a hard problem - https://www.npr.org/2024/04/05/1241446778/deepfake-audio-det...
> What I’m aiming for is to ensure that the NotebookLM team is aware of the impact and actively considering it.
What is the impact? Have any of them attracted an audience of any meaningful size? If a month from now there are 1.3 million generated podcasts, what do you anticipate the fallout to be?
Podcasts - episodic radio shows hosted on Apple Music and Spotify - haven't been around for very long. Not long enough to have kids being tutored in making podcasts and then becoming adults with that sentimental hobby, like with playing violin or oil painting. If you believe that the "Human Authenticity Badge" is meaningful for podcasts, it's complicated: traditions play the biggest role in the outrage you are trying to spin, not an appeal to slop and spam, which of course, there is already a ton of low quality podcasts, music and art written by real people for no nefarious purpose whatsoever. Like with many of these posts, which are really common on HN, there isn't a sensible remedy suggested besides pointing the fingers at some giant corporation, and asking them to do something impossible.
If you care a lot about podcast quality, go and make your own podcast service with better discovery. Once you realize the antagonist was collaborative filtering, made possible by non-negative matrix factorization dating from the year 2000, and not AI, you will at least have learned something from the comment, instead of just feeling better. And then, how do you propose to curate by hand, and why would someone choose your curation over the New Yorker's? And maybe those very purists, trying to make everything sentimental, accusing everyone of slop and spam - well, why do so many creators thrive and ignore the New Yorker's opinion about them entirely? Perhaps curation is not only not scalable, but also wrong. Difficult questions for listeners and podcast authors alike.
Only 1300? I imagine it would be soo many more.
It’s definitely more than that.
The 1,300+ shows are just the ones recently removed from Listen Notes.
Give it a few days, and I’m sure the number will double, quadruple, and continue to grow. :(
So what do you propose Google do to prevent this from happening?
The comments' default remedy is tribal: "The only moral content is my content." We sort of used to live in that world under the studio and TV networks system. Most consumers would say, it was not so bad, maybe better even.
Of course, the commenter never says this, living in the world today, where the writing he likes would never be published by the New York Times like it is on Twitter, the TV he likes would never be offered for free like it is on YouTube, and the music he likes would never been offered for pennies on Spotify. Some meaningful creators will lose from every remedy you could think of, where Google "something somethings" AI. Maybe the root problem is generalizing.
I created a “podcast episode” (???) of my personal blog (not trying to get traffic to it. It’s more of a journal) using NotebookLM. It sounded just as bland and overproduced as a “professional” podcast by NPR like “Planet Money” and “The Indicator”.
Whether that is saying how high quality NotebookLM is or how low quality NPRs podcast are is an exercise for the reader.
The only reason “Stuff you should know” is better is because of the random off topic discussions they go into and that’s not a complaint about SYSK.
Geez, I hope there aren't people like you working at Google
Counterpoint: Most podcasts were utterly worthless before AI too. The world will do fine losing a few mattress ad vehicles.
Like other data, provenance suddenly matters a lot. From my POV, that's good. Not all data sources are created equal, and this is putting it into stark enough relief it might actually change the landscape. (In case it isn't obvious, I strongly believe most of the Internet was garbage well before LLMs. We just called it "SEO". Still garbage)
Nice, I've only scratched the surface of Notebook LM, mainly for dumping lots of component reference material (datasheets, reference guides, application notes, etc). The text querying works great, but the audio overview wasn't very useful when it stuck to the high level of the content. With some ability to steer the topic out might be quite useful!
Surprised this was not there from the beginning. It can result in much better output. My problem with the default prompt is that it often is just two equally "knowledgeable hosts" kind of just bouncing information back and forth. With being able to customize the prompt you can create a kind of "explainer" and "listener" dynamic among the hosts that really helps the overall flow of the episode.
Something like this:
The two podcast hosts have very different levels of knowledge on the topic. The first host is the expert on the topic and explains the subject and the details to the second host. The second host has very little existing knowledge about the subject but will react to the information and ask follow up questions.
Google Illuminate recently also introduced a customization feature. I use this customization with it:
audience=technical, duration=long, tone=professional & engaging
Not an improvement for me. I've been instructing NotebookLM for weeks now already by including the instructions into the sources. That way I have version control on my prompts and can easily drag into the sources upload. This requires finding my instructions and copying and pasting, there's also a 500 character limit which is very small, I have over 2000 characters for my standard prompts.
I think it's an easy affordance for those users who are just interested in the basic functionality of the product.
However like you I cottoned on early that one could put a "Podcast Production Notes.txt" in each one of my Notebooks that allowed me to really put some horsepower behind the generated audio :D
Really wish there was an API so I can upload my content and connect it to my website to make it interactive for my potential clients.
I want the HN comment section as a podcast
That could actually be a top quality podcast - well moderated content from thoughtful people, many subject matter experts, with mostly thoughtful discussion.. (I read hn for the comments..) sounds good to me.
> With over 80,000 organizations already using NotebookLM
Really. "Using"? (as in an email from an org owned domain logged in to notebooklm page?..)
In a sea of similar tools, Google seems to have struck on something semi-viral with NotebookLM. Output can be mediocre, but with the bar for many podcasts being set at "read pages from Wikipedia", that's not bad at all for zero effort.
https://trends.google.com/trends/explore?geo=US&q=NotebookLM...
The 100 baseline on that graph is the highest attention the term has received, and it correlated with a launch and has since decreased.
Google never has problems with first the first few millions for consumer-launched tools. They have problems with the first few millions of net profit almost 100% of the time and shut it down a few years later.
But I do agree this is a good play for Google, it plays to their strengths.
> The 100 baseline on that graph is the highest attention the term has receive
Good point. I couldn't come up with a well-enough known competing tool to compare against.
I really wish it had more voices, notebooklm-guy and notebooklm-girl get tiring
Now that NotebookLM has gone from "small experiment" to "moderate viral success", I expect all kinds of roadmaps are being drawn up to use it to hook users into the broader Google AI ecosystem (e.g. automatically add images and illustrations by Imagen 3, etc.
Definitely hear what you are saying but I personally think it is for the best that they are instantly recognizable as NotebookLM podcasters. Especially as this makes the rounds on the internet. If you could manipulate the voices it would just make it more challenging to detect if a "Podcast" is using this tool.
I realize now that this is actually a clever way to collect training data. If it were any company other than Google, I'd be like, Awesome toy. With them, I am uneasy.
One day too late. ^-^
Is there an open source tool that copies NotebookLM yet, or did anyone dig a bit into how the prompting is done to generate output in this dialogue format?
Check out this prompt: https://github.com/souzatharsis/podcastfy/blob/6ad5734c3ffb5...
I’ve recently started using NotebookLM and I wish either it was from any other company besides Google or that Google would charge for it.
Google has the attention span and product focus of a crack addled flea. I’m afraid the entire project will be killed.
NotebookLM is a great product. I just started using it this week to ingest artifacts for a new project and get an overview.
0919553550
Ahh excellent! Podcast listings and Youtube weren't filling up with quite enough AI slop yet.