Re: Easiest way to export the entire index

Edward Ribeiro Wed, 29 Jan 2020 14:30:15 -0800

HI Amanda,

Below is crude prototype in Bash that fetches documents from Solr using
cursorMark:
https://gist.github.com/eribeiro/de1588aaa1759c02ea40cc281e8aedc8


This is a crude prototype, but should shed some light for your use case (I
copied the code below too):

Best,
Edward

----------------------------------------------------- fetcher.sh
--------------------------------------------------

#!/bin/bash

## Usage:
## $ chmod +x fetcher.sh
## $fetcher.sh <output file>

SOLR_URL="http://localhost:8983/solr";
COLLECTION="teste"
ROWS=10
CURSORMARK=*
Q=*:*
SORT=id%20desc

NEXT_CURSORMARK=

FILENAME=$1

cat /dev/null > $FILENAME   ## truncante file content, if file exists

echo "[" >> $FILENAME       ## open bracket so file content is a valid
json (list of lists of records)

counter=0
while [[ True ]]; do

        
url="$SOLR_URL/$COLLECTION/select?q=$Q&rows=$ROWS&cursorMark=$CURSORMARK&sort=$SORT"
        resp=$(curl -s "$url")
        
        ## jq '.' <<< "$resp"

        NEXT_CURSORMARK=$(jq '.nextCursorMark' <<< "$resp")
        NEXT_CURSORMARK=$(echo $NEXT_CURSORMARK | sed -e 's/\"//g')

        docs=$(jq '.response.docs' <<< "$resp")
        num_docs=$(echo $docs | jq '. | length')
        echo $docs
        counter=$((counter + num_docs))

        echo $docs >> $FILENAME

        if [[ "$CURSORMARK" == "$NEXT_CURSORMARK" ]]; then
           echo "]" >> $FILENAME   ## make content a valid json file
           # echo "Num docs: "$counter
           echo "Finished."
           exit
        else
           echo "," >> $FILENAME  ## make content a valid json file
        fi

        CURSORMARK=$NEXT_CURSORMARK

        # sleep 1 ## optional, sleep a bit before fetching the next page
done;

----------------------------------------------end fetcher.sh
---------------------------------------------------



On Wed, Jan 29, 2020 at 1:12 PM Steve Ge <s...@yahoo.com.invalid> wrote:

> @Amanda
> You can try using curl and write output to a file
>   curl http://localhost:8983/Solr?q={theSolrQuery) > out.json
>   theSolrQuery - you need to specify all attrs you want exported, not just
> *
> If you are on Windows, there is a Windows curl tool you can download to use
>
>
>
>
> Steve
>
>   On Wed, Jan 29, 2020 at 10:21 AM, Emir Arnautović<
> emir.arnauto...@sematext.com> wrote:   Hi Amanda,
> I assume that you have all the fields stored so you will be able to export
> full document.
>
> Several thousands records should not be too much to use regular start+rows
> to paginate results, but the proper way of doing that would be to use
> cursors. Adjust page size to avoid creating huge responses and you can use
> curl or some similar tool to avoid using admin console. I did a quick
> search and there are several blog posts with scripts that does what you
> need.
>
> HTH,
> Emir
>
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2020, at 15:43, Amanda Shuman <amanda.shu...@gmail.com> wrote:
> >
> > Dear all:
> >
> > I've been asked to produce a JSON file of our index so it can be combined
> > and indexed with other records. (We run solr 5.3.1 on this project; we're
> > not going to upgrade, in part because funding has ended.) The index has
> > several thousand rows, but nothing too drastic. Unfortunately, this is
> too
> > much to handle for a simple query dump from the admin console. I tried to
> > follow instructions related to running /export directly but I guess the
> > export handler isn't installed. I tried to divide the query into rows,
> but
> > after a certain amount it freezes, and it also freezes when I try to
> limit
> > rows (e.g., rows 501-551 freezes the console). Is there any other way to
> > export the index short of having to install the export handler
> considering
> > we're not working on this project anyone?
> >
> > Thanks,
> > Amanda
> >
> > ------
> > Dr. Amanda Shuman
> > Researcher and Lecturer, Institute of Chinese Studies, University of
> > Freiburg
> > Coordinator for the MA program in Modern China Studies
> > Database Administrator, The Maoist Legacy <https://maoistlegacy.de/>
> > PhD, University of California, Santa Cruz
> > http://www.amandashuman.net/
> > http://www.prchistoryresources.org/
> > Office: +49 (0) 761 203 96748
>
>

Re: Easiest way to export the entire index

Reply via email to