乐闻世界logo
搜索文章和话题

How can i use a cursor foreach in mongodb using node js

5 个月前提问
3 个月前修改
浏览次数68

4个答案

1
2
3
4

在Node.js中使用Mongoose操作MongoDB时,我们经常需要处理大量的数据。为了有效地处理这些数据,我们可能会使用游标(cursor)。游标允许我们遍历查询结果,而不必一次性将所有结果加载到内存中。在Mongoose中,我们可以通过使用find()方法后接cursor()方法来获取一个游标。接下来,我们可以使用foreach循环来迭代游标中的每个文档。

下面是一个使用foreach游标的例子:

javascript
const mongoose = require('mongoose'); const { Schema } = mongoose; // 首先,定义模型的schema const userSchema = new Schema({ name: String, age: Number }); // 然后,创建模型 const User = mongoose.model('User', userSchema); // 连接到MongoDB数据库 mongoose.connect('mongodb://localhost:27017/myapp'); // 一旦连接到数据库,我们就可以通过游标来处理数据 mongoose.connection.on('open', function() { const cursor = User.find().cursor(); // 使用foreach迭代游标 cursor.eachAsync(function(user) { // 这里你可以对每个用户(user)进行操作 console.log(user.name); }) .then(() => { console.log('所有用户处理完毕'); mongoose.disconnect(); // 处理完成后记得断开数据库连接 }) .catch(err => { console.error('处理过程中发生错误:', err); }); });

以上代码展示了如何在Mongoose中使用游标来迭代数据库中的用户集合。通过调用eachAsync方法,我们可以为每一个迭代到的文档执行一个异步操作,该操作在本例中是一个简单的控制台日志输出。处理完成后,我们通过then链来关闭数据库连接,这样做可以确保所有数据都已经被处理完毕。如果在迭代过程中遇到任何错误,catch块将捕获这些错误并输出。

这种方法特别适用于需要处理大量数据且不想一次性加载到内存中的情况,可以有效防止内存溢出问题。

2024年6月29日 12:07 回复

The answer depends on the driver you're using. All MongoDB drivers I know have cursor.forEach() implemented one way or another.

Here are some examples:

node-mongodb-native

shell
collection.find(query).forEach(function(doc) { // handle }, function(err) { // done or error });

mongojs

shell
db.collection.find(query).forEach(function(err, doc) { // handle });

monk

shell
collection.find(query, { stream: true }) .each(function(doc){ // handle doc }) .error(function(err){ // handle error }) .success(function(){ // final callback });

mongoose

shell
collection.find(query).stream() .on('data', function(doc){ // handle doc }) .on('error', function(err){ // handle error }) .on('end', function(){ // final callback });

Updating documents inside of .forEach callback

The only problem with updating documents inside of .forEach callback is that you have no idea when all documents are updated.

To solve this problem you should use some asynchronous control flow solution. Here are some options:

Here is an example of using async, using its queue feature:

shell
var q = async.queue(function (doc, callback) { // code for your update collection.update({ _id: doc._id }, { $set: {hi: 'there'} }, { w: 1 }, callback); }, Infinity); var cursor = collection.find(query); cursor.each(function(err, doc) { if (err) throw err; if (doc) q.push(doc); // dispatching doc to async.queue }); q.drain = function() { if (cursor.isClosed()) { console.log('all items have been processed'); db.close(); } }
2024年6月29日 12:07 回复

Using the mongodb driver, and modern NodeJS with async/await, a good solution is to use next():

shell
const collection = db.collection('things') const cursor = collection.find({ bla: 42 // find all things where bla is 42 }); let document; while ((document = await cursor.next())) { await collection.findOneAndUpdate({ _id: document._id }, { $set: { blu: 43 } }); }

This results in only one document at a time being required in memory, as opposed to e.g. the accepted answer, where many documents get sucked into memory, before processing of the documents starts. In cases of "huge collections" (as per the question) this may be important.

If documents are large, this can be improved further by using a projection, so that only those fields of documents that are required are fetched from the database.

2024年6月29日 12:07 回复

var MongoClient = require('mongodb').MongoClient, assert = require('assert');

shell
MongoClient.connect('mongodb://localhost:27017/crunchbase', function(err, db) { assert.equal(err, null); console.log("Successfully connected to MongoDB."); var query = { "category_code": "biotech" }; db.collection('companies').find(query).toArray(function(err, docs) { assert.equal(err, null); assert.notEqual(docs.length, 0); docs.forEach(function(doc) { console.log(doc.name + " is a " + doc.category_code + " company."); }); db.close(); }); });

Notice that the call .toArray is making the application to fetch the entire dataset.

shell
var MongoClient = require('mongodb').MongoClient, assert = require('assert'); MongoClient.connect('mongodb://localhost:27017/crunchbase', function(err, db) { assert.equal(err, null); console.log("Successfully connected to MongoDB."); var query = { "category_code": "biotech" }; var cursor = db.collection('companies').find(query); function(doc) { cursor.forEach( console.log(doc.name + " is a " + doc.category_code + " company."); }, function(err) { assert.equal(err, null); return db.close(); } ); });

Notice that the cursor returned by the find() is assigned to var cursor. With this approach, instead of fetching all data in memory and consuming data at once, we're streaming the data to our application. find() can create a cursor immediately because it doesn't actually make a request to the database until we try to use some of the documents it will provide. The point of cursor is to describe our query. The 2nd parameter to cursor.forEach shows what to do when the driver gets exhausted or an error occurs.

In the initial version of the above code, it was toArray() which forced the database call. It meant we needed ALL the documents and wanted them to be in an array.

Also, MongoDB returns data in batch format. The image below shows, requests from cursors (from application) to MongoDB

MongoDB cursor requests

forEach is better than toArray because we can process documents as they come in until we reach the end. Contrast it with toArray - where we wait for ALL the documents to be retrieved and the entire array is built. This means we're not getting any advantage from the fact that the driver and the database system are working together to batch results to your application. Batching is meant to provide efficiency in terms of memory overhead and the execution time. Take advantage of it, if you can in your application.

2024年6月29日 12:07 回复

你的答案