However, DeepThink's datasets are clearly based on the data and output of other LLMs that came before it. All the economic and environmental costs of the LLMs that came before it contribute to the models and datasets DeepSeek has today.
And DeepThink's datasets are closed-source. You can't inspect them, you can't compile them for yourself, and you have no certainty what information is in there nor the providence of that information.