We have similar date and language based collection.
We also ran into similar issues of having huge clusterstate.json file which
took an eternity to load up.
In our case the search cases were language specific so we moved to multiple
solr cluster each having a different zk namespace per language, s
One of things to keep in mind with Grouping is that if you are relying on
an accurate group count (ngroups) then you will also have to collocate
documents based on the grouping field.
The main advantage to the Collapsing qparser plugin is it provides fast
field collapsing on high cardinality field
Your findings are the expected behavior for the Collapsing qparser. The
Collapsing qparser requires records in the same collapsed field to be
located on the same shard. The typical approach for this is to use
composite Id routing to ensure that documents with the same collapse field
land on the sam
Seems simple enough that the source answers all the questions:
https://github.com/apache/lucene-solr/blob/lucene_solr_4_9/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java#L66
It just looks for a couple of versions of apostrophe followed by s or S.
Regards
Hi everyone,
I'm running into a trouble building a query with DateRangeField. Web-based
queries work fine, but this code throws an NPE:
dateRangeQuery = dateRangeField.getRangeQuery(null,
SidxS.getSchema().getField("sku_history.date_range"), start_date_str,
end_date_str, true, true);
ERROR
So unloading a core doesn't delete the data? That is good to know.
On Mon, Aug 3, 2015 at 6:22 PM, Erick Erickson
wrote:
> This doesn't work in SolrCloud, but it really sounds like "lots of
> cores" which is designed
> to keep the most recent N cores loaded and auto-unload older ones, see:
> ht
Some further information:
The main things use memory that I see from my heap dump are:
1. Arrays of org.apache.lucene.util.fst.FST$Arc classes- which mainly seem
to hold nulls. The ones of these I've investigated have been held by
org.apache.lucene.util.fst.FST objects, I have 38 cores open and
This doesn't work in SolrCloud, but it really sounds like "lots of
cores" which is designed
to keep the most recent N cores loaded and auto-unload older ones, see:
http://wiki.apache.org/solr/LotsOfCores
Best,
Erick
On Mon, Aug 3, 2015 at 4:57 PM, Brian Hurt wrote:
> Is there are an easy way for
Upayavira, manual commit isn't a good advice, especially with small bulks
or single document, is it? I see recommendations on using
autoCommit+autoSoftCommit instead of manual commit mostly.
вт, 4 авг. 2015 г. в 1:00, Upayavira :
> SolrJ is just a "SolrClient". In pseudocode, you say:
>
> SolrCli
SolrJ is just a "SolrClient". In pseudocode, you say:
SolrClient client = new
SolrClient("http://localhost:8983/solr/whatever";);
List docs = new ArrayList<>();
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "abc123");
doc.addField("some-text-field", "I like it when the sun s
Hi Everyone,
Does anyone knows where I can find docs on ? The only one I found is the
API doc:
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/en/EnglishPossessiveFilterFactory.html
but that's not what I'm looking for, I'm looking for one to describe in
details how
Thanks for your response. For TermsComponent, I am able to get a list of all
terms in a field that have a document frequency under a certain threshold, but
I was wondering if I could instead pass a list of terms, and get back only the
terms from that list that have a document frequency under a c
Well,
If it is just file names, I'd probably use SolrJ client, maybe with
Java 8. Read file names, split the name into parts with regular
expressions, stuff parts into different field names and send to Solr.
Java 8 has FileSystem walkers, etc to make it easier.
You could do it with DIH, but it wo
Is there are an easy way for a client to tell Solr to close or release the
IndexSearcher and/or IndexWriter for a core?
I have a use case where we're creating a lot of cores with not that many
documents per zone (a few hundred to maybe 10's of thousands). Writes come
in batches, and reads also te
Yeah a separate by month or year is good and can really help in this case.
Bill Bell
Sent from mobile
> On Aug 2, 2015, at 5:29 PM, Jay Potharaju wrote:
>
> Shawn,
> Thanks for the feedback. I agree that increasing timeout might alleviate
> the timeout issue. The main problem with increasing t
>From my reading of the solr docs (e.g.
>https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>and https://cwiki.apache.org/confluence/display/solr/Result+Grouping), I've
>been under the impression that these two methods (result grouping and
>collapsing query parser) can
@Alexandre No i dont need a content of a file. i am repeating my requirement
I have a 40 millions of files which is stored in a file systems,
the filename saved as ARIA_SSN10_0007_LOCATION_129.pdf
I just split all Value from a filename only,these values i have to index.
I am interested t
@Erik Hatcher You mean i have to use Solrj for indexing to it.(right ?)
Can Solrj handle large amount of data which i have mentioned previous post ?
If i will use DIH then how will i split value from filename etc.
I want to start my development in a right direction that why i am little
confuse on
I found the issue. With GET, the legacy code I'm calling into was written
like so:
clientResponse =
resource.contentType("application/atom+xml").accept("application/atom+xml").get();
This is a bug, and should have been:
clientResponse = resource.accept("application/atom+xml").get();
Go
Just to reconfirm, are you indexing file content? Because if you are,
you need to be aware most of the PDF do not extract well, as they do
not have text flow preserved.
If you are indexing PDF files, I would run a sample through Tika
directly (that's what Solr uses under the covers anyway) and see
Ahhh, listen to Hatcher if you're not indexing the _contents_ of the
files, just the filenames
Erick
On Mon, Aug 3, 2015 at 2:22 PM, Erik Hatcher wrote:
> Most definitely yes given your criteria below. If you don’t care for the
> text to be parsed and indexed within the files, a simple fil
Hmmm, one thing that will certainly help is the new per-collection
state.json that will replace clusterstate.json. That'll reduce a lot
of chatter.
You might also get a lot of mileage out of breaking the collections
into sub-groups that are distinct thus reducing the number of
collections on each
Most definitely yes given your criteria below. If you don’t care for the text
to be parsed and indexed within the files, a simple file system crawler that
just got the directory listings and posted the file names split as you’d like
to Solr would suffice it sounds like.
—
Erik Hatcher, Senior S
I'd go with SolrJ personally. For a terabyte of data that (I'm inferring)
are PDF files and the like (aka "semi-structured documents) you'll
need to have Tika parse out the data you need to index. And doing
that through posting or DIH puts all the analysis on the Solr servers,
which will work, but
OK I figured it out. The documentation is not updated. The default
component are as follows:
FacetModule.COMPONENT_NAME = "facet_module"
Thus. The following is the default with the new facet_module.
We need someone to update the solrconfig.xml and the docs.
query
facet
face
My recommendation, start with the default configset
(data_driven_schema_configs) like this:
# grab an HTML page to use
curl http://lucene.apache.org/solr/index.html > index.html
bin/solr start
bin/solr create -c html_test
bin/post -c html_test index.html
$ curl "http://localhost:8983/
Yes, my application is in Java, no I cannot switch to SolrJ because I'm
working off legacy code for which I don't have the luxury to refactor..
If my application is sending the wrong Content-Type HTTP header, which part
is it and why the same header is working for the other query paths
such as: "/
On 8/3/2015 11:34 AM, Steven White wrote:
> Hi Everyone,
>
> I cannot figure out why I'm getting HTTP Error 500 off the following code:
> Ping query caused exception: Bad contentType for search handler
> :application/atom+xml
Your application is sending an incorrect Content-Type HTTP header tha
Thanks Erik,
I'm trying to index some html files in the same format and I need to index
them according to classes and tags. I've tried data_driven_schema_configs
but I can only get the title and id but not other tags and classes I
wanted. So now I want to edit the schema in the basic_configs but t
Hi Alexandre,
I have a 40 millions of files which is stored in a file systems,
the filename saved as ARIA_SSN10_0007_LOCATION_129.pdf
1.)I have to split all underscore value from a filename and these value have
to be index to the solr.
2.)Do Not need file contains(Text) to index.
You Told me "
My hunch is that the basic_configs is *too* basic for your needs here.
basic_configs does not include /update/extract - it’s very basic - stripped of
all the “extra” components.
Try using the default, data_driven_schema_configs instead.
If you’re still having issues, please provide full detail
Hi everyone,
I created a core with the basic config sets and schema, when I use bin/post
to post one html file, I got the error:
SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException..
HTTP ERROR 404
when I go to localhost:8983/solr/core/update, I got:
Hi Everyone,
I cannot figure out why I'm getting HTTP Error 500 off the following code:
// Using: org.apache.wink.client
String contentType = "application/atom+xml";
URI uri = new URI("http://localhost:8983"; +
"/solr/db/admin/ping?wt=xml");
Resource resource = client.resource(uri
I tried using /select and this query does not work? Cannot understand why.
Passing Parameters via JSON
We can also pass normal request parameters in the JSON body within the
params block:
$ curl "http://localhost:8983/solr/query?fl=title,author"-d '
{
params:{
q:"title:hero",
rows:1
}
Yes that was it. Had no idea this was an issue!
On Monday, August 3, 2015, Roman Chyla
mailto:roman.ch...@gmail.com>> wrote:
Hi,
inStockSkusBitSet.get(currentChildDocNumber)
Is that child a lucene id? If yes, does it include offset? Every index
segment starts at a different point, but docs are
That's still a VERY open question. The answer is Yes, but the details
depend on the shape and source of your data. And the search you are
anticipating.
Is this a lot of entries with small number of fields. Or a -
relatively - small number of entries with huge field counts. Do you
need to store/ret
Hi,
I am new in solr development and have a same requirement and I have already
got some knowledge such as how many shard have to created such amount of
data at all. with help of googling.
I want to take Some suggestion there are so many method to do indexing such
as DIH,solr,Solrj.
Please sugges
Hi,
Thanks a lot Erick and Shawn for your answers.
I am aware that it is a very particular issue with not a common use of
Solr. I just wondered if people had the similar business case. For
information we need a very important number of collections with the same
configuration cause of legally reaso
I am trying to use “languid.map.individual” setting to allow field “a” to
detect as, say, English, and be mapped to “a_en”, while in the same document,
field “b” detects as, say, German and is mapped to “b_de”.
What happens in my tests is that the global language is detected (for example,
Germa
See: https://issues.apache.org/jira/browse/SOLR-6719
It's not clear that we'll support this, so this may just be a doc
change. How would you properly support having more than one replica?
Or, for that matter, having more than one shard? Property.name would
have to do something to make the core nam
Hi,
inStockSkusBitSet.get(currentChildDocNumber)
Is that child a lucene id? If yes, does it include offset? Every index
segment starts at a different point, but docs are numbered from zero. So to
check them against the full index bitset, I'd be doing
Bitset.exists(indexBase + docid)
Just one thin
I'm using solr 4.10.2. I'm using "id" field as the unique key - it is passed in
with the document when ingesting the documents into solr. When querying I get
duplicate documents with different "_version_". Out off approx. 25K unique
documents ingested into solr, I see approx. 300 duplicates.
It
Hi fellow Solr devs / users,
I decided to resend the info on the opening assuming most of you could have
been on vacation in July. I don't intend to send it any longer :)
Company: AlphaSense https://www.alpha-sense.com/
Position: Search Engineer
AlphaSense is a one-stop financial search engin
Hi everybody,
I have about 1300 collections, 3 shards, replicationfactor = 3,
MaxShardPerNode=3.
I have 3 boxes of 64G (32 JVM).
When I want to reload all my collections I get a timeout error.
Is there a way to make a reload in async as to create collections
(async=requestid)?
I saw on this is
There are two things that are likely to cause the timeouts you are
seeing, I'd say.
Firstly, your server is overloaded - that can be handled by adding
additional replicas.
However, it doesn't seem like this is the case, because the second query
works fine.
Secondly, you are hitting garbage colle
Hi, using SOLR 5.2 after restarting the cluster, I get below exceptions
org.apache.solr.cloud.ZkController; Timed out waiting to see all nodes
published as DOWN in our cluster state.
followed by :
org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
No registered leader wa
Hello Chris,
This totally does the trick. I drastically improved relevancy. Thank you
much for your advices !
- Ben
--
View this message in context:
http://lucene.472066.n3.nabble.com/Multiple-boost-queries-on-a-specific-field-tp4217678p4220396.html
Sent from the Solr - User mailing list arch
47 matches
Mail list logo