Hi,    I did a simple indexing of a directory that contains a lot of pdf, text, 
doc, zip etc. There are no structures for the content of the files and I would 
like to index them and later on search "key words" within the files.
    After creating the core, I indexed the files in the directory using the 
following command: 
bin/post -p 8983 -m 10g -c myCore /DATA_FOLDER > solr_indexing.log
    The log file shows something like below (the first and last few lines in 
the log file):
java -classpath /solr/solr-8.3.0/dist/solr-core-8.3.0.jar -Dauto=yes 
-Dport=8983 -Dm=15g -Dc=myCore -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /DATA_FOLDERSimplePostTool version 
5.0.0Posting files to [base] url 
http://localhost:8983/solr/myCore/update...Entering auto mode. File endings 
considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log.........POSTing
 file Report.pdf (application/pdf) to [base]/extract47256 files 
indexed.COMMITting Solr index changes to 
http://localhost:8983/solr/myCore/update...Time spent: 1:03:59.587

But when using browser to try to look at the result, the "overview" 
(http://localhost:8983/solr/#/myCore/core-overview) shows:Num Docs: 47648
Most of the files indexed has an metadata id has the value of the full path of 
the file indexed, such as /DATA_FOLDER/20180321/Report.pdf 
But there are about 400 of them, the id looks like: 
232d7bd6-c586-4726-8d2b-bc9b1febcff4.
So my questions are:(1)why the two numbers are different (in log file vs. in 
the overview).(2)for those ids that are not a full path of a file, how do I 
know where they comes from (the original file)?

Thanks for your help!Nan

PS: a few examples of query result for those strange ids:
 { "bolt-small-online":["Test strip-north"], "3696714.008":[3702848.584], 
"380614.564":[376900.143], "100.038":[111.074], "gpo-bolt":["teststrip"], 
"id":"232d7bd6-c586-4726-8d2b-bc9b1febcff4", "_version_":1652839231413813252}

{ "Date":["8/24/2001"], "EXT31":[0], "EXT32":[0.12], "Aggregate":[0.12], 
"Pounds_Vap":[37], "Gallons_Vap":[5.8], "Gallons_Liq":[0], "Gallons_Tot":[5.8], 
"Avg_Rate":[1.8], "Gallons_Rec":[577], "Water":[577], 
"id":"840c05af-caf0-4407-8753-dcc6957abcc5", "Well_s_":["EXT31;EXT32"], 
"Time__hrs_":[3.25], "_version_":1652898731969740800}] }
 { "2":[4], "SFS1":["PLM1"], "1.00":[1.0], "69":[79], 
"id":"e675a6f5-0a3e-41b1-b1fe-b3098d0be725", "_version_":1652825435791163395}


Reply via email to