ScyllaDB Docs ScyllaDB Node.js Driver Fetching Large Result Sets

Fetching Large Result Sets¶

When dealing with a large number of rows, the driver breaks the result into pages, only requesting a limited number of rows each time (5000 being the default fetchSize). To retrieve the rows beyond this default size, use one of the following paging mechanisms. Paging is enabled by default and you can disable paging by setting QueryOptions.paged to false.

Caution

Issuing unpaged SELECTs can have significant performance consequences. If the result set is large (or there are many tombstones), the following problems may occur:

the cluster may experience high load,
queries may time out,
the driver may consume large amounts of RAM,
latency will likely increase.

Always page your SELECTs.

Automatic paging¶

Async iterators¶

The driver supports asynchronous iteration of the ResultSet using the built-in Async Iterator, fetching subsequent pages automatically as the previous one has been consumed.

Large result sets can be iterated using the for await ... of statement:

const result = await client.execute(query, params, { prepare: true });

for await (const row of result) {
  console.log(row[columnName]);
}

Under the hood, the driver will get all the rows of the query result using multiple requests. Initially, when calling execute() it will retrieve the first page of results according to the fetch size (defaults to 5000). If there are additional rows, the driver fetches the next page automatically once the async iterator has yielded rows from the previous page.

If needed, you can use isPaged() method of ResultSet instance to determine whether there are more pages of results than initially fetched.

Note

The following iteration methods are mutually exclusive — use only one per ResultSet instance:

rows property: contains the row instances of the first page of results,
sync iterator: yields all rows in the current page,
async iterator: yields all rows in the result regardless of the number of pages.

Each row callback¶

You can also iterate through pages by setting a per-page callback. eachRow() works in two modes, depending on the QueryOptions.autoPage configuration:

Automatic paging (default): keeps fetching subsequent pages until all rows are processed.
Manual paging: fetches one page at a time. This mode gives you access to result sets of intermediate pages. To fetch the next page, you need to call result.nextPage().

eachRow() calls:

rowCallback(rowIndex, row) for each row as soon as it is received,
callback(err, result):
- when autoPage is set to true: after all pages are fetched and rowCallback was called for each row,
- when autoPage is set to false: after the current page is fetched and rowCallback was called for each row,
- when an error occurs.

client.eachRow(
  query,
  parameters,
  { prepare: true, autoPage },
  (rowIndex, row) => {
    // process row
  },
  (err, result) => {
    if (err) {
      // handle error
      return;
    }
    if (result.nextPage) {
      // Handle intermediate page result set. This branch will be taken only when autoPage is disabled.
      // When autoPage is enabled you will not have access to those result sets.
      result.nextPage();
    } else {
      // Handle last page result.
    }
  }
);

Row streams¶

If you want to handle a large result set as a Stream of rows, you can use stream() method of the Client instance. The stream() method automatically fetches the following pages, yielding the rows as they come through the network and retrieving the following page only after the previous rows were read (throttling).

client.stream(query, parameters, options)
  .on('readable', function () {
    // readable is emitted as soon as a row is received and parsed
    let row;
    while (row = this.read()) {
      // process row
    }
  })
  .on('end', function () {
    // emitted when all rows have been retrieved and read
  });

Manual paging¶

Sometimes it is convenient to save the paging state in order to restore it later. For example, consider a stateless web service that displays a list of results with a link to the next page. When the user clicks that link, we want to run the exact same query, except that the iteration should start where we stopped on the previous page.

To do so, the driver exposes a pagingState object that represents where we were in the result set when the last page was fetched:

const options = { prepare: true, fetchSize: 1000 };
const result = await client.execute(query, parameters, options);

// Property 'rows' will contain only the amount of items of the first page (max 1000 in this case)
const rows = result.rows;

// Store the page state
let pageState = result.pageState;

In the next request, use the pageState to fetch the following rows.

// Use the pageState in the queryOptions to continue where you left it.
const options = { pageState, prepare: true, fetchSize: 1000 };
const result = await client.execute(query, parameters, options);

// Following rows up to fetch size (1000)
const rows = result.rows;

// Store the next paging state.
pageState = result.pageState;

Saving the paging state works well when you only let the user move from one page to the next. But it doesn’t allow arbitrary jumps (like “go directly to page 10”), because you can’t fetch a page unless you have the paging state of the previous one. Such a feature would require offset queries, which are not natively supported by ScyllaDB and Apache Cassandra.

Note: The page state token can be manipulated to retrieve other results within the same column family, so it is not safe to expose it to the users in plain text.

Best practices¶

Query result fetching	Unpaged	Paged manually	Paged automatically
Exposed Client API	`execute` with `QueryOptions.paged = false`	`execute` with `QueryOptions.pageState`	Async iterators, each row callbacks, streams
Working	get all results in a single CQL frame, into a single result set	get one page of results in a single CQL frame, into a single result set	upon high-level iteration, fetch consecutive CQL frames and transparently iterate over their rows
Cluster load	potentially HIGH for large results, beware!	normal	normal
Driver overhead	low - simple frame fetch	low - simple frame fetch	low - simple frame fetch
Driver memory footprint	potentially BIG - all results have to be stored at once!	small - only one page stored at a time	small - at most constant number of pages stored at a time
Latency	potentially BIG - all results have to be generated at once!	considerable on page boundary - new page needs to be fetched	considerable on page boundary - new page needs to be fetched
Suitable operations	- in general: operations with empty result set (non-SELECTs) - as possible optimisation: SELECTs with LIMIT clause	- for advanced users who prefer more control over paging	- in general: all SELECTs

Was this page helpful?