SLOR And OpenNlp integration

2012-10-11 Thread ahmed
Hello,
I am a new user of apache solr and i have to integrate opennlp avec solr
.The problem is that i dont find a tutorial to do this integration .so i am
asking if there is someone who can help me to do this integration ?
thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SLOR And OpenNlp integration

2012-10-11 Thread ahmed
Hi, Thanks for reply
i fact i tried this tutorial but when i execute  'ant compile' i have
probleme taht class not found despite the class a re their.I dont know wats
the probleme



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013101.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SLOR And OpenNlp integration

2012-10-11 Thread ahmed
in fact 
i dowload the source of solr using svn client
then, i execute the path of the opennlp
then i do ant compile -lib /usr/share/ivy

i got the error 

[javac]   public synchronized Span[] splitSentences(String line) {
[javac]   ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:36:
cannot find symbol
[javac] symbol  : class Tokenizer
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   private final Tokenizer tokenizer;
[javac] ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:38:
cannot find symbol
[javac] symbol  : class TokenizerModel
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   public NLPTokenizerOp(TokenizerModel model) {
[javac] ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:46:
cannot find symbol
[javac] symbol  : class Span
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   public synchronized Span[] getTerms(String sentence) {
[javac]   ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/OpenNLPTokenizerFactory.java:26:
package opennlp.tools.util does not exist
[javac] import opennlp.tools.util.InvalidFormatException;
[javac]  ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/OpenNLPOpsFactory.java:9:
package opennlp.tools.chunker does not exist
[javac] import opennlp.tools.chunker.ChunkerModel;
[javac] ^
[javac] 100 errors

BUILD FAILED
/home/pfe/Téléchargements/dev/trunk/build.xml:112: The following error
occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:419: The following
error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:410: The following
error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:418: The
following error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:1482: Compile
failed; see the compiler error output for details.

I want to apply a sematique analyses for the document thet will be indexed
using solr .So solr will index and then analyse content using opennlp
instead of tika. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6 Distributed Join

2015-12-16 Thread Akiel Ahmed
Hi Dennis,

Thank you for your help. I used your explanation to construct an innerJoin 
query; I think I am getting further but didn't get the results I expected. 
The following describes what I did – is there any chance you can tell 
where I am going wrong:

Solr 6 Developer Builds: #2738 and #2743

1. Modified server/solr/configsets/basic_configs/conf/managed-schema so it 
reads:



  id
  
  
  
  
  
  
  
  
  

  
  
  
  

  


2. Modified server/solr/configsets/basic_configs/conf/solrconfig.xml, 
adding the following near the bottom of the file so it is the last request 
handler

   
 
json 
false 
 
  

3. Used solr -e cloud to setup a solr cloud instance, picking all the 
defaults except I chose basic_configs

4. After solr is running I ingested the following data via the Solr Web UI 
(/update handler, Document Type = CSV)
id,type,e1,e2,text
1,ABC,,,John Smith
2,ABC,,,Jane Smith
3,ABC,,,MiKe Smith
4,ABC,,,John Doe
5,ABC,,,Jane Doe
6,ABC,,,MiKe Doe
7,ABC,,,John Smith
8,DEF,,,Chicken Burger
9,DEF,,,Veggie Burger
10,DEF,,,Beef Burger
11,DEF,,,Chicken Donar
12,DEF,,,Chips
13,DEF,,,Drink
20,GHI,1,2,Friends
21,GHI,3,4,Friends
22,GHI,5,6,Friends
23,GHI,7,6,Friends
24,GHI,6,4,Friends
25,JKL,1,8,Order
26,JKL,2,9,Order
27,JKL,3,10,Order
28,JKL,4,11,Order
29,JKL,5,12,Order
30,JKL,6,13,Order

5. Navigating to the following URL in a browser returned an expected 
result:
http://localhost:8983/solr/gettingstarted/select?q={!join from=id 
to=e1}text:John&fl="id"


...
  

  20
  1
  2
  ...


  28
  4
  11
  ...


  23
  7
  6
  ...

  


6. Navigating to the following URL in a browser does NOT return what I 
expected:
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted
, fl="id", q=text:John, sort="id 
asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted, 
fl="id", q=text:Friends, sort="id 
asc",zkHost="localhost:9983",qt="/export"), on="id=e1")

{"result-set":{"docs":[
{"EOF":true,"RESPONSE_TIME":124}]}}


I also have a join related question. Is there any chance I can specify a 
query and join for more than 2 things. For example:

innerJoin(search(gettingstarted, fl="id", q=text:John, ...) as s1, 
  search(gettingstarted, fl="id", q=text:Chicken, ...) as s2
  search(gettingstarted, fl="id", q=text:Friends, ...) as s3)
  on="s1.id=s3.e1", 
  on="s2.id=s3.e2")
 
Sorry if the query does not make sense, but given the data above my 
intention is to find a single result made up of 3 documents: 
s1.id=1,s2.id=8,s3.id=25
Is that possible? If yes, will Solr 6 support an arbitrary number of 
queries and associated joins?

Cheers

Akiel



From:   Dennis Gove 
To: Akiel Ahmed/UK/IBM@IBMGB, solr-user@lucene.apache.org
Date:   11/12/2015 15:34
Subject:Re: Solr 6 Distributed Join



Akiel,

Without seeing your full url I assume that you're missing the
stream=innerJoin(.) part of it. A full sample url would look like this
http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers,
fl="personId,companyId,title", q=companyId:*, sort="companyId
asc",zkHost="localhost:2181",qt="/export"),search(companies,
fl="id,companyName", q=*:*, sort="id
asc",zkHost="localhost:2181",qt="/export"),on="companyId=id")

This example will return a join of career records with the company name 
for
all career records with a non-null companyId.

And the pieces have the following meaning:
http://localhost:8983/solr/careers/stream?  - you have a collection called
careers available on localhost:8983 and you're hitting its stream handler
?stream=  - you are passing the stream parameter to the stream handler
zkHost="localhost:2181"  - there is a zk instance running on 
localhost:2181
where solr can get clusterstate information. Note, that since you're
sending the request to the careers collection this param is not required 
in
the search(careers) part but is required in the search(companies)
part. For simplicity I usually just provide it for all.
qt="/export"  - tells solr to use the export handler. this assumes all 
your
fields are in docValues. if you'd rather not use the export handler then
you probably want to provide the rows=# param to tell solr to return a
large # of rows for each underlying search. Without it solr will default
to, I believe, 10 rows.

CCing the user list so others can see this as well.

We're working on additional documentation for Streaming Aggregation and
Expressions. The page can be found at
https://cwiki.apache.org/confluence/displa

Re: Solr 6 Distributed Join

2015-12-17 Thread Akiel Ahmed
Hi again,

I got the join to work. A team mate pointed out that one of the search 
functions in the innerJoin query was missing a field in the join - adding 
the e1 field to the fl parameter of the second search function gave the 
result I expected:

http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

, fl="id", q=text:John, sort="id 
asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted, 
fl="id,e1", q=text:Friends, sort="id 
asc",zkHost="localhost:9983",qt="/export"), on="id=e1")

I am still interested in whether we can specify a join, using an arbitrary 
number of searches.

Cheers

Akiel



From:   Akiel Ahmed/UK/IBM@IBMGB
To: solr-user@lucene.apache.org
Date:   16/12/2015 17:05
Subject:Re: Solr 6 Distributed Join



Hi Dennis,

Thank you for your help. I used your explanation to construct an innerJoin 

query; I think I am getting further but didn't get the results I expected. 

The following describes what I did – is there any chance you can tell 
where I am going wrong:

Solr 6 Developer Builds: #2738 and #2743

1. Modified server/solr/configsets/basic_configs/conf/managed-schema so it 

reads:



  id
  
  
  
  
  
  
  
  
  

  
  
  
  

  


2. Modified server/solr/configsets/basic_configs/conf/solrconfig.xml, 
adding the following near the bottom of the file so it is the last request 

handler

   
 
json 
false 
 
  

3. Used solr -e cloud to setup a solr cloud instance, picking all the 
defaults except I chose basic_configs

4. After solr is running I ingested the following data via the Solr Web UI 

(/update handler, Document Type = CSV)
id,type,e1,e2,text
1,ABC,,,John Smith
2,ABC,,,Jane Smith
3,ABC,,,MiKe Smith
4,ABC,,,John Doe
5,ABC,,,Jane Doe
6,ABC,,,MiKe Doe
7,ABC,,,John Smith
8,DEF,,,Chicken Burger
9,DEF,,,Veggie Burger
10,DEF,,,Beef Burger
11,DEF,,,Chicken Donar
12,DEF,,,Chips
13,DEF,,,Drink
20,GHI,1,2,Friends
21,GHI,3,4,Friends
22,GHI,5,6,Friends
23,GHI,7,6,Friends
24,GHI,6,4,Friends
25,JKL,1,8,Order
26,JKL,2,9,Order
27,JKL,3,10,Order
28,JKL,4,11,Order
29,JKL,5,12,Order
30,JKL,6,13,Order

5. Navigating to the following URL in a browser returned an expected 
result:
http://localhost:8983/solr/gettingstarted/select?q={!join from=id 
to=e1}text:John&fl="id"


...
  

  20
  1
  2
  ...


  28
  4
  11
  ...


  23
  7
  6
  ...

  


6. Navigating to the following URL in a browser does NOT return what I 
expected:
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

, fl="id", q=text:John, sort="id 
asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted, 
fl="id", q=text:Friends, sort="id 
asc",zkHost="localhost:9983",qt="/export"), on="id=e1")

{"result-set":{"docs":[
{"EOF":true,"RESPONSE_TIME":124}]}}


I also have a join related question. Is there any chance I can specify a 
query and join for more than 2 things. For example:

innerJoin(search(gettingstarted, fl="id", q=text:John, ...) as s1, 
  search(gettingstarted, fl="id", q=text:Chicken, ...) as s2
  search(gettingstarted, fl="id", q=text:Friends, ...) as s3)
  on="s1.id=s3.e1", 
  on="s2.id=s3.e2")
 
Sorry if the query does not make sense, but given the data above my 
intention is to find a single result made up of 3 documents: 
s1.id=1,s2.id=8,s3.id=25
Is that possible? If yes, will Solr 6 support an arbitrary number of 
queries and associated joins?

Cheers

Akiel



From:   Dennis Gove 
To: Akiel Ahmed/UK/IBM@IBMGB, solr-user@lucene.apache.org
Date:   11/12/2015 15:34
Subject:Re: Solr 6 Distributed Join



Akiel,

Without seeing your full url I assume that you're missing the
stream=innerJoin(.) part of it. A full sample url would look like this
http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers,
fl="personId,companyId,title", q=companyId:*, sort="companyId
asc",zkHost="localhost:2181",qt="/export"),search(companies,
fl="id,companyName", q=*:*, sort="id
asc",zkHost="localhost:2181",qt="/export"),on="companyId=id")

This example will return a join of career records with the company name 
for
all career records with a non-null companyId.

And the pieces have the following meaning:
http://localhost:8983/solr/careers/stream?  - you have a collection called
careers available on localhost:8983 and you're hitting its stream handler
?stream=  - you are passing the stream parameter to the stream handler
zkHost="localhost:2181&qu

Re: Solr 6 Distributed Join

2015-12-21 Thread Akiel Ahmed
Thank you for the help. 

I am working through what I want to do with the join - will let you know 
if I hit any issues.



From:   Joel Bernstein 
To: solr-user@lucene.apache.org
Date:   17/12/2015 15:40
Subject:Re: Solr 6 Distributed Join



One thing to note about the hashJoin is that it requires the search 
results
from the hashed query to fit entirely in memory.

The innerJoin does not have this requirement as it performs a streaming
merge join.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Dec 17, 2015 at 10:33 AM, Joel Bernstein  
wrote:

> Below is an example of nested joins where the innerJoin is done in
> parallel using the parallel function. The partitionKeys parameter needs 
to
> be added to the searches when the parallel function is used to partition
> the results across worker nodes.
>
> hashJoin(
> parallel(workerCollection,
> innerJoin(
> search(users, q="*:*",
> fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345",
> qt="/export" partitionKeys="userId"),
> search(reviews, q="*:*",
> fl="userId, review, score", sort="userId asc", zkHost="zk1:2345",
> qt="/export" partitionKeys="userId"),
> on="userId"
> ),
>  workers="20",
>  zkHost="zk1:2345",
>  sort="userId asc"
>  ),
>hashed=search(restaurants, q="city:nyc", 
fl="restaurantId, restaurantName",
> sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>on="restaurantId"
> )
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Dec 17, 2015 at 10:29 AM, Joel Bernstein 
> wrote:
>
>> The innerJoin joins two streams sorted by the same join keys (merge
>> join). If third stream has the same join keys you can nest innerJoins. 
But
>> all three tables need to be sorted by the same join keys to nest 
innerJoins
>> (merge joins).
>>
>> innerJoin(innerJoin(...),
>> search(...),
>> on...)
>>
>> If the third stream is joined on a different key you can nest inside a
>> hashJoin which doesn't require streams to be sorted on the join key. 
For
>> example:
>>
>> hashJoin(innerJoin(...),
>> hashed=search(...),
>> on..)
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Dec 17, 2015 at 9:28 AM, Akiel Ahmed  
wrote:
>>
>>> Hi again,
>>>
>>> I got the join to work. A team mate pointed out that one of the search
>>> functions in the innerJoin query was missing a field in the join - 
adding
>>> the e1 field to the fl parameter of the second search function gave 
the
>>> result I expected:
>>>
>>>
>>> 
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

>>>
>>> , fl="id", q=text:John, sort="id
>>> asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted,
>>> fl="id,e1", q=text:Friends, sort="id
>>> asc",zkHost="localhost:9983",qt="/export"), on="id=e1")
>>>
>>> I am still interested in whether we can specify a join, using an
>>> arbitrary
>>> number of searches.
>>>
>>> Cheers
>>>
>>> Akiel
>>>
>>>
>>>
>>> From:   Akiel Ahmed/UK/IBM@IBMGB
>>> To: solr-user@lucene.apache.org
>>> Date:   16/12/2015 17:05
>>> Subject:Re: Solr 6 Distributed Join
>>>
>>>
>>>
>>> Hi Dennis,
>>>
>>> Thank you for your help. I used your explanation to construct an
>>> innerJoin
>>>
>>> query; I think I am getting further but didn't get the results I
>>> expected.
>>>
>>> The following describes what I did – is there any chance you can tell
>>> where I am going wrong:
>>>
>>> Solr 6 Developer Builds: #2738 and #2743
>>>
>>> 1. Modified server/solr/configsets/basic_configs/conf/managed-schema 
so
>>> it
>>>
>>> reads:
>>>
>>> 
>>>

RE: Solr 6 Distributed Join

2015-12-22 Thread Akiel Ahmed
Hi,

I tried a straight forward join against something that is connected to 
many things but didn't get the results I expected - I wanted to check 
whether my expectations are off, and whether I can do anything in Solr to 
do what I want. So given the data:

id,type,e1,e2,text
1,ABC,,,John Smith
2,ABC,,,Jane Doe
3,DEF,1,2,1
4,DEF,1,2,2
5,DEF,1,2,4
6,DEF,1,2,8

and the query

http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted
, fl="id", q=text:John, sort="id 
asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted, 
fl="id,e1", q=type:DEF, sort="id 
asc",zkHost="localhost:9983",qt="/export"), on="id=e1")

I expected

{"result-set":{"docs":[
{"e1":"1","id":"3"},
{"e1":"1","id":"4"},
{"e1":"1","id":"5"},
{"e1":"1","id":"6"},
{"EOF":true,"RESPONSE_TIME":56}]}}

but instead I got 

{"result-set":{"docs":[
{"e1":"1","id":"3"},
{"EOF":true,"RESPONSE_TIME":58}]}}

Deleting the document with id 3, and rerunning the query (see above) 
returned 

{"result-set":{"docs":[
{"e1":"1","id":"4"},
{"EOF":true,"RESPONSE_TIME":56}]}}

So it looks like the join finds the first thing to join on. Is this 
expected behaviour? If so, is there anyway I can do to convince Solr to 
return all the things it is connected to?

Cheers

Akiel
- Forwarded by Akiel Ahmed/UK/IBM on 22/12/2015 10:47 -

From:   Akiel Ahmed/UK/IBM
To: solr-user@lucene.apache.org
Date:   21/12/2015 11:16
Subject:Re: Solr 6 Distributed Join


Thank you for the help. 

I am working through what I want to do with the join - will let you know 
if I hit any issues.



From:   Joel Bernstein 
To: solr-user@lucene.apache.org
Date:   17/12/2015 15:40
Subject:Re: Solr 6 Distributed Join



One thing to note about the hashJoin is that it requires the search 
results
from the hashed query to fit entirely in memory.

The innerJoin does not have this requirement as it performs a streaming
merge join.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Dec 17, 2015 at 10:33 AM, Joel Bernstein  
wrote:

> Below is an example of nested joins where the innerJoin is done in
> parallel using the parallel function. The partitionKeys parameter needs 
to
> be added to the searches when the parallel function is used to partition
> the results across worker nodes.
>
> hashJoin(
> parallel(workerCollection,
> innerJoin(
> search(users, q="*:*",
> fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345",
> qt="/export" partitionKeys="userId"),
> search(reviews, q="*:*",
> fl="userId, review, score", sort="userId asc", zkHost="zk1:2345",
> qt="/export" partitionKeys="userId"),
> on="userId"
> ),
>  workers="20",
>  zkHost="zk1:2345",
>  sort="userId asc"
>  ),
>hashed=search(restaurants, q="city:nyc", 
fl="restaurantId, restaurantName",
> sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>on="restaurantId"
> )
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Dec 17, 2015 at 10:29 AM, Joel Bernstein 
> wrote:
>
>> The innerJoin joins two streams sorted by the same join keys (merge
>> join). If third stream has the same join keys you can nest innerJoins. 
But
>> all three tables need to be sorted by the same join keys to nest 
innerJoins
>> (merge joins).
>>
>> innerJoin(innerJoin(...),
>> search(...),
>> on...)
>>
>> If the third stream is joined on a different key you can nest inside a
>> hashJoin which doesn't require streams to be sorted on the join key. 
For
>> example:
>>
>> hashJoin(innerJoin(...),
>> hashed=search(...),
>> on..)
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Dec 17, 2015 at 9:28 AM, Akiel Ahmed  
wrote:
>&g

Re: Solr 6 Distributed Join

2015-12-24 Thread Akiel Ahmed
Hi

Did you get a chance to check whether one-to-many joins were covered in 
your tests? If yes, can you make any suggestions for what I could be doing 
wrong?

Cheers

Akiel



From:   Joel Bernstein 
To: solr-user@lucene.apache.org
Date:   22/12/2015 13:03
Subject:Re: Solr 6 Distributed Join



Just did a quick review of the InnerJoinStream and it appears that it
should handle one-to-one, one-to-many, many-to-one and many-to-many joins.
It will take a closer review of the tests to see if all these cases are
covered. So the innerJoin is designed to handle the case you describe. If
it doesn't work properly it makes sense to file a bug report.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Dec 22, 2015 at 5:55 AM, Akiel Ahmed  wrote:

> Hi,
>
> I tried a straight forward join against something that is connected to
> many things but didn't get the results I expected - I wanted to check
> whether my expectations are off, and whether I can do anything in Solr 
to
> do what I want. So given the data:
>
> id,type,e1,e2,text
> 1,ABC,,,John Smith
> 2,ABC,,,Jane Doe
> 3,DEF,1,2,1
> 4,DEF,1,2,2
> 5,DEF,1,2,4
> 6,DEF,1,2,8
>
> and the query
>
>
> 
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

> , fl="id", q=text:John, sort="id
> asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted,
> fl="id,e1", q=type:DEF, sort="id
> asc",zkHost="localhost:9983",qt="/export"), on="id=e1")
>
> I expected
>
> {"result-set":{"docs":[
> {"e1":"1","id":"3"},
> {"e1":"1","id":"4"},
> {"e1":"1","id":"5"},
> {"e1":"1","id":"6"},
> {"EOF":true,"RESPONSE_TIME":56}]}}
>
> but instead I got
>
> {"result-set":{"docs":[
> {"e1":"1","id":"3"},
> {"EOF":true,"RESPONSE_TIME":58}]}}
>
> Deleting the document with id 3, and rerunning the query (see above)
> returned
>
> {"result-set":{"docs":[
> {"e1":"1","id":"4"},
> {"EOF":true,"RESPONSE_TIME":56}]}}
>
> So it looks like the join finds the first thing to join on. Is this
> expected behaviour? If so, is there anyway I can do to convince Solr to
> return all the things it is connected to?
>
> Cheers
>
> Akiel
> - Forwarded by Akiel Ahmed/UK/IBM on 22/12/2015 10:47 -
>
> From:   Akiel Ahmed/UK/IBM
> To: solr-user@lucene.apache.org
> Date:   21/12/2015 11:16
> Subject:Re: Solr 6 Distributed Join
>
>
> Thank you for the help.
>
> I am working through what I want to do with the join - will let you know
> if I hit any issues.
>
>
>
> From:   Joel Bernstein 
> To: solr-user@lucene.apache.org
> Date:   17/12/2015 15:40
> Subject:Re: Solr 6 Distributed Join
>
>
>
> One thing to note about the hashJoin is that it requires the search
> results
> from the hashed query to fit entirely in memory.
>
> The innerJoin does not have this requirement as it performs a streaming
> merge join.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Dec 17, 2015 at 10:33 AM, Joel Bernstein 
> wrote:
>
> > Below is an example of nested joins where the innerJoin is done in
> > parallel using the parallel function. The partitionKeys parameter 
needs
> to
> > be added to the searches when the parallel function is used to 
partition
> > the results across worker nodes.
> >
> > hashJoin(
> > parallel(workerCollection,
> > innerJoin(
> > search(users, q="*:*",
> > fl="userId, full_name, hometown", sort="userId asc", 
zkHost="zk2:2345",
> > qt="/export" partitionKeys="userId"),
> > search(reviews, q="*:*",
> > fl="userId, review, score", sort="userId asc", zkHost="zk1:2345",
> > qt="/export" partitionKeys="userId"),
> > on="userId"
> > ),
> >  workers="20",
> >  zkHost="zk1:2345",
> >  sort="userId asc"
> >  

Re: Solr 6 Distributed Join

2016-01-05 Thread Akiel Ahmed
Hi Joel,

Sorry there was an error between my chair and keyboard; there isn't a bug 
- the right hand stream was not ordered by the joined-on field. So, the 
following query does what I expected:

http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted
,fl="id",q=text:John,sort="id asc",zkHost="localhost:9983",qt="/export"), 
search(gettingstarted,fl="id,e1",q=type:DEF,sort="e1 
asc",zkHost="localhost:9983",qt="/export"), on="id=e1")

Do you know if on the release of Solr 6, the stream handler will contain 
validation code which does a syntax check as well as checking if 
appropriate fields have been used in the fl and sort properties? For 
example, for the above query, I am joining the id field on the e1 field, 
so the id field needs to be in the fl and sort property of the left-hand 
stream, and e1 needs to be in the fl and sort property in the right-hand 
stream foe the join to work.

Cheers

Akiel



From:   Joel Bernstein 
To: solr-user@lucene.apache.org
Date:   24/12/2015 15:51
Subject:Re: Solr 6 Distributed Join



I haven't had a chance to review. If you have a reproducible failure on a
one-to-many join go ahead and create a jira ticket.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Dec 24, 2015 at 3:25 AM, Akiel Ahmed  wrote:

> Hi
>
> Did you get a chance to check whether one-to-many joins were covered in
> your tests? If yes, can you make any suggestions for what I could be 
doing
> wrong?
>
> Cheers
>
> Akiel
>
>
>
> From:   Joel Bernstein 
> To: solr-user@lucene.apache.org
> Date:   22/12/2015 13:03
> Subject:Re: Solr 6 Distributed Join
>
>
>
> Just did a quick review of the InnerJoinStream and it appears that it
> should handle one-to-one, one-to-many, many-to-one and many-to-many 
joins.
> It will take a closer review of the tests to see if all these cases are
> covered. So the innerJoin is designed to handle the case you describe. 
If
> it doesn't work properly it makes sense to file a bug report.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Dec 22, 2015 at 5:55 AM, Akiel Ahmed  
wrote:
>
> > Hi,
> >
> > I tried a straight forward join against something that is connected to
> > many things but didn't get the results I expected - I wanted to check
> > whether my expectations are off, and whether I can do anything in Solr
> to
> > do what I want. So given the data:
> >
> > id,type,e1,e2,text
> > 1,ABC,,,John Smith
> > 2,ABC,,,Jane Doe
> > 3,DEF,1,2,1
> > 4,DEF,1,2,2
> > 5,DEF,1,2,4
> > 6,DEF,1,2,8
> >
> > and the query
> >
> >
> >
>
> 
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

>
> > , fl="id", q=text:John, sort="id
> > asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted,
> > fl="id,e1", q=type:DEF, sort="id
> > asc",zkHost="localhost:9983",qt="/export"), on="id=e1")
> >
> > I expected
> >
> > {"result-set":{"docs":[
> > {"e1":"1","id":"3"},
> > {"e1":"1","id":"4"},
> > {"e1":"1","id":"5"},
> > {"e1":"1","id":"6"},
> > {"EOF":true,"RESPONSE_TIME":56}]}}
> >
> > but instead I got
> >
> > {"result-set":{"docs":[
> > {"e1":"1","id":"3"},
> > {"EOF":true,"RESPONSE_TIME":58}]}}
> >
> > Deleting the document with id 3, and rerunning the query (see above)
> > returned
> >
> > {"result-set":{"docs":[
> > {"e1":"1","id":"4"},
> > {"EOF":true,"RESPONSE_TIME":56}]}}
> >
> > So it looks like the join finds the first thing to join on. Is this
> > expected behaviour? If so, is there anyway I can do to convince Solr 
to
> > return all the things it is connected to?
> >
> > Cheers
> >
> > Akiel
> > - Forwarded by Akiel Ahmed/UK/IBM on 22/12/2015 10:47 -
> >
> > From:   Akiel Ahmed/UK/IBM
> > To: solr-user@lucene.apache.org
> > Date:   21/12/2015 11:16
> > Subject:Re: Solr 6 Distributed Join
> >
> >
> > Thank you for the help.
> >
> > I am working through what I want to do with the join - will let you 
know
> >

Re: Solr 6 Distributed Join

2016-01-06 Thread Akiel Ahmed
Hi Dennis/Joel,

Thank you for your help to date - I must say this user group is very 
responsive :-)

Cheers

Akiel



From:   Dennis Gove 
To: solr-user@lucene.apache.org
Date:   05/01/2016 13:22
Subject:Re: Solr 6 Distributed Join



Akiel,

https://issues.apache.org/jira/browse/SOLR-7554 added checks on the sort
with streams, where required. If a particular stream requires that 
incoming
streams be ordered in a compatible way then that check will be performed
during creation of the stream and an error will be thrown if that check
fails. This is only a check on the sorts of the incoming streams and
doesn't do any checks related to if expected fields are included in the
incoming streams. In your case, it'd have found the error and told you 
that
the streams aren't sorted in a compatible way.

- Dennis

On Tue, Jan 5, 2016 at 8:11 AM, Akiel Ahmed  wrote:

> Hi Joel,
>
> Sorry there was an error between my chair and keyboard; there isn't a 
bug
> - the right hand stream was not ordered by the joined-on field. So, the
> following query does what I expected:
>
>
> 
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

> ,fl="id",q=text:John,sort="id 
asc",zkHost="localhost:9983",qt="/export"),
> search(gettingstarted,fl="id,e1",q=type:DEF,sort="e1
> asc",zkHost="localhost:9983",qt="/export"), on="id=e1")
>
> Do you know if on the release of Solr 6, the stream handler will contain
> validation code which does a syntax check as well as checking if
> appropriate fields have been used in the fl and sort properties? For
> example, for the above query, I am joining the id field on the e1 field,
> so the id field needs to be in the fl and sort property of the left-hand
> stream, and e1 needs to be in the fl and sort property in the right-hand
> stream foe the join to work.
>
> Cheers
>
> Akiel
>
>
>
> From:   Joel Bernstein 
> To: solr-user@lucene.apache.org
> Date:   24/12/2015 15:51
> Subject:Re: Solr 6 Distributed Join
>
>
>
> I haven't had a chance to review. If you have a reproducible failure on 
a
> one-to-many join go ahead and create a jira ticket.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Dec 24, 2015 at 3:25 AM, Akiel Ahmed  
wrote:
>
> > Hi
> >
> > Did you get a chance to check whether one-to-many joins were covered 
in
> > your tests? If yes, can you make any suggestions for what I could be
> doing
> > wrong?
> >
> > Cheers
> >
> > Akiel
> >
> >
> >
> > From:   Joel Bernstein 
> > To: solr-user@lucene.apache.org
> > Date:   22/12/2015 13:03
> > Subject:Re: Solr 6 Distributed Join
> >
> >
> >
> > Just did a quick review of the InnerJoinStream and it appears that it
> > should handle one-to-one, one-to-many, many-to-one and many-to-many
> joins.
> > It will take a closer review of the tests to see if all these cases 
are
> > covered. So the innerJoin is designed to handle the case you describe.
> If
> > it doesn't work properly it makes sense to file a bug report.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Dec 22, 2015 at 5:55 AM, Akiel Ahmed 
> wrote:
> >
> > > Hi,
> > >
> > > I tried a straight forward join against something that is connected 
to
> > > many things but didn't get the results I expected - I wanted to 
check
> > > whether my expectations are off, and whether I can do anything in 
Solr
> > to
> > > do what I want. So given the data:
> > >
> > > id,type,e1,e2,text
> > > 1,ABC,,,John Smith
> > > 2,ABC,,,Jane Doe
> > > 3,DEF,1,2,1
> > > 4,DEF,1,2,2
> > > 5,DEF,1,2,4
> > > 6,DEF,1,2,8
> > >
> > > and the query
> > >
> > >
> > >
> >
> >
>
> 
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted

>
> >
> > > , fl="id", q=text:John, sort="id
> > > asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted,
> > > fl="id,e1", q=type:DEF, sort="id
> > > asc",zkHost="localhost:9983",qt="/export"), on="id=e1")
> > >
> > > I expected
> > >
> > > {"result-set":{"docs":[
> > > {"e1":"1","id":"3"},
> > > {"e1":"1","id":"4"},
> > >

Exporting Score value from export handler

2016-01-29 Thread Akiel Ahmed
Hi,

I would like to issue a query and get the ID and Score for each matching 
document. There may be lots of results so I wanted to use the export 
handler, but unfortunately the current version of Solr doesn't seem to 
export the Score - I read the comments on 
https://issues.apache.org/jira/browse/SOLR-5244 (Exporting Full Sorted 
Result Sets) but am not sure what happened with the idea of exporting the 
Score. Does anybody know of an existing or future version where this can 
be done?

I compared exporting 100,000 IDs via the export handler with getting 
100,000 ID,Score pairs using the cursor mark - exporting 100,000 IDs was 
an order of magnitude faster on my laptop. Does anybody know of a faster 
way to retrieve the ID,Score pairs for a query on a SolrScloud deployment 
and/or have an idea on the possible performance characteristics of 
exporting ID, Score (without ranking) if it was to be implemented?

Cheers

Akiel
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: Exporting Score value from export handler

2016-02-09 Thread Akiel Ahmed
Hi Joel,

I saw your response this morning, and have created an issue, SOLR-8664, 
and linked it to SOLR-8125. As context, I included my original question 
and your answer, as a comment.

Cheers

Akiel



From:   Joel Bernstein 
To: solr-user@lucene.apache.org
Date:   29/01/2016 13:46
Subject:Re: Exporting Score value from export handler



Exporting scores would be a great feature to have. I don't believe it will
add too much complexity to export and sort by score. The main 
consideration
has been memory consumption for every large export sets. The export 
feature
powers SQL queries that are unlimited in Solr 6. So adding scores to 
export
would support queries like:

select id, title, score from tableX where a = '(a query)'

Where currently you can only do this:

select id, title, score from tableX where a = '(a query)' limit 1000

Can you create a jira for this and link it to SOLR-8125.




Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jan 29, 2016 at 8:26 AM, Akiel Ahmed  wrote:

> Hi,
>
> I would like to issue a query and get the ID and Score for each matching
> document. There may be lots of results so I wanted to use the export
> handler, but unfortunately the current version of Solr doesn't seem to
> export the Score - I read the comments on
> https://issues.apache.org/jira/browse/SOLR-5244 (Exporting Full Sorted
> Result Sets) but am not sure what happened with the idea of exporting 
the
> Score. Does anybody know of an existing or future version where this can
> be done?
>
> I compared exporting 100,000 IDs via the export handler with getting
> 100,000 ID,Score pairs using the cursor mark - exporting 100,000 IDs was
> an order of magnitude faster on my laptop. Does anybody know of a faster
> way to retrieve the ID,Score pairs for a query on a SolrScloud 
deployment
> and/or have an idea on the possible performance characteristics of
> exporting ID, Score (without ranking) if it was to be implemented?
>
> Cheers
>
> Akiel
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU
>


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Facet By Distance

2015-02-25 Thread Ahmed Adel
Hello,

I'm trying to get Facet By Distance working on an index with LatLonType
fields. The schema is as follows:


...





...



And the query I'm running is:

q=*:*&sfield=start_station&pt=40.71754834,-74.01322069&facet.query={!frange
l=0.0 u=0.1}geodist()&facet.query={!frange l=0.10001 u=0.2}geodist()


But it returns all the documents in the index so it seems something is
missing. I'm using Solr 4.9.0.

--

A. Adel


Re: Facet By Distance

2015-02-25 Thread Ahmed Adel
Hi,
Thank you for your reply. I added a filter query to the query in two ways
as follows:

fq={!geofilt}&sfield=start_station&pt=40.71754834,-74.01322069&facet.query={!frange
l=0.0 u=0.1}geodist()&facet.query={!frange l=0.10001 u=0.2}geodist()&d=0.2
--> returns 0 docs

q=*:*&fq={!geofilt}&sfield=start_station&pt=40.71754834,-74.01322069&d=0.2
--> returns 1484 docs

Not sure why the first query with returns 0 documents

On Wed, Feb 25, 2015 at 8:46 PM, david.w.smi...@gmail.com <
david.w.smi...@gmail.com> wrote:

> Hi,
> This will "return all the documents in the index" because you did nothing
> to filter them out.  Your query is *:* (everything) and there are no filter
> queries.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
> On Wed, Feb 25, 2015 at 12:27 PM, Ahmed Adel 
> wrote:
>
> > Hello,
> >
> > I'm trying to get Facet By Distance working on an index with LatLonType
> > fields. The schema is as follows:
> >
> > 
> > ...
> > 
> >  />
> > 
> > 
> > 
> > ...
> > 
> >
> >
> > And the query I'm running is:
> >
> >
> q=*:*&sfield=start_station&pt=40.71754834,-74.01322069&facet.query={!frange
> > l=0.0 u=0.1}geodist()&facet.query={!frange l=0.10001 u=0.2}geodist()
> >
> >
> > But it returns all the documents in the index so it seems something is
> > missing. I'm using Solr 4.9.0.
> >
> > --
> >
> > A. Adel
> >
>

A. Adel


Re: Facet By Distance

2015-02-26 Thread Ahmed Adel
Thank you for your replies, added q and it works! I agree the examples are
a bit confusing. It turned out also that points are clustered around the
center and had to increase d as well.

On Wed, Feb 25, 2015 at 11:46 PM, Alexandre Rafalovitch 
wrote:

> In the examples it used to default to *:* with default params, which
> caused even more confusion.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 25 February 2015 at 15:21, david.w.smi...@gmail.com
>  wrote:
> > If 'q' is absent, then you always match nothing (there may be
> exceptions?);
> > so it's sort of required, in effect.  I wish it defaulted to *:*.
> >
> > ~ David Smiley
> > Freelance Apache Lucene/Solr Search Consultant/Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> > On Wed, Feb 25, 2015 at 2:28 PM, Ahmed Adel 
> wrote:
> >
> >> Hi,
> >> Thank you for your reply. I added a filter query to the query in two
> ways
> >> as follows:
> >>
> >>
> >>
> fq={!geofilt}&sfield=start_station&pt=40.71754834,-74.01322069&facet.query={!frange
> >> l=0.0 u=0.1}geodist()&facet.query={!frange l=0.10001
> u=0.2}geodist()&d=0.2
> >> --> returns 0 docs
> >>
> >>
> q=*:*&fq={!geofilt}&sfield=start_station&pt=40.71754834,-74.01322069&d=0.2
> >> --> returns 1484 docs
> >>
> >> Not sure why the first query with returns 0 documents
> >>
> >> On Wed, Feb 25, 2015 at 8:46 PM, david.w.smi...@gmail.com <
> >> david.w.smi...@gmail.com> wrote:
> >>
> >> > Hi,
> >> > This will "return all the documents in the index" because you did
> nothing
> >> > to filter them out.  Your query is *:* (everything) and there are no
> >> filter
> >> > queries.
> >> >
> >> > ~ David Smiley
> >> > Freelance Apache Lucene/Solr Search Consultant/Developer
> >> > http://www.linkedin.com/in/davidwsmiley
> >> >
> >> > On Wed, Feb 25, 2015 at 12:27 PM, Ahmed Adel 
> >> > wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > I'm trying to get Facet By Distance working on an index with
> LatLonType
> >> > > fields. The schema is as follows:
> >> > >
> >> > > 
> >> > > ...
> >> > >  stored="true"/>
> >> > >  >> stored="true"
> >> > />
> >> > >  stored="true"
> >> />
> >> > > 
> >> > > 
> >> > > ...
> >> > > 
> >> > >
> >> > >
> >> > > And the query I'm running is:
> >> > >
> >> > >
> >> >
> >>
> q=*:*&sfield=start_station&pt=40.71754834,-74.01322069&facet.query={!frange
> >> > > l=0.0 u=0.1}geodist()&facet.query={!frange l=0.10001 u=0.2}geodist()
> >> > >
> >> > >
> >> > > But it returns all the documents in the index so it seems something
> is
> >> > > missing. I'm using Solr 4.9.0.
> >> > >
> >> > > --
> >> > >
> >> > > A. Adel
> >> > >
> >> >
> >>
> >> A. Adel
> >>
>



-- 
A. Adel


Re: Log numfound, qtime, ...

2015-03-04 Thread Ahmed Adel
Hi, I believe a better approach than Solarium is to create a custom search
component that extends SearchComponent class and override process() method
to store query, QTime, and numFound to a database for further analysis.
This approach would cut steps 2 through 6 into one step. Analysis can be
done using Banana (https://github.com/LucidWorks/banana) analytics and
search dashboard.

On Fri, Feb 27, 2015 at 1:36 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Did you check Kibana/Banana ?
>
> On Fri, Feb 27, 2015 at 2:07 PM, bengates  wrote:
>
> > Hello everyone,
> >
> > Here's my need : I'd like to log Solr Responses so as to achieve some
> > business statistics.
> > I'd like to report, as a daily/weekly/yearly/whateverly basis, the
> > following
> > KPIs :
> > - Most popular requests (hits)
> > - Average numfound for each request
> > - Average response time for each request
> > - Requests that have returned an error
> > - Request that have a numfound of 0.
> >
> > The idea is to give the searchandizer the keys to analyze and enhance in
> > real-time the relevancy of his data. I think it's not the job of a
> > developer
> > to detect that the keyword TV never has results because "Television" is
> the
> > referring word in the whole catalog, for instance. The searchandizer
> should
> > analyze this at anytime and provide the correct synonyms to improve
> > relevance.
> >
> > I'm using Solr with PHP and the Solarium library.
> > Actually the only way I found to manage this, is the following way :
> >
> > 1. The user sends the request
> > 2. Nginx intercepts the request, and forwards it to a PHP app
> > 3. The PHP app loads the Solarium library and forwards the request to
> > Solr/Jetty
> > 4. Solr replies a JSON and Solarium turns it into a PHP Solarium Response
> > Object
> > 5. The PHP app sends the user the raw JSON through NGINX (as if it were
> > Jetty)
> > 6. The PHP app stores the query, the QTime and the numfound in a database
> >
> > I think I'll soon get into performance issues, as you guess.
> > Do you know a better approach ?
> >
> > Thanks,
> > Ben
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Log-numfound-qtime-tp4189561.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
A. Adel


clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
What I really meant is trying to get cluster status directly through ZK
API. Your approach a bit different from what I meant but it's a nice one as
it seems will work across versions 4 and 5.

On Thursday, April 9, 2015, Shalin Shekhar Mangar > wrote:

> I don't quite follow. Are you saying that you intend to use the ZK REST API
> to fetch live_nodes and then send the 'clusterstatus' API call to one of
> the live nodes?
>
> On Thu, Apr 9, 2015 at 7:13 PM, Ahmed Adel  wrote:
>
> > In fact, the advantage I see of using ZK is that we don't have to iterate
> > over nodes in case the first node receiving that request is down,
> whereas,
> > by using ZK REST API, we can do that in a single request as I assume we
> can
> > check live_nodes (in case this approach is guaranteed when using Solr
> 4.x)
> > and send the request directly to a live node. Let me know if this makes
> > sense.
> >
> > On Thu, Apr 9, 2015 at 2:31 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Yes, you can use the 'clusterstatus' API which will return an
> aggregation
> > > of all states. See
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18
> > >
> > >
> > > On Thu, Apr 9, 2015 at 5:52 PM, Ahmed Adel 
> > wrote:
> > >
> > > > Hi Shalin,
> > > >
> > > > Thanks for your response. I'm actually looking inside ZooKeeper in
> > order
> > > to
> > > > obtain highest availability. What I expected is that
> clusterstate.json
> > > > contains the aggregation of all state.json children nodes of each
> > > > collection. But your second paragraph explains the behavior I see in
> > Solr
> > > > 5.0 while others using prior versions of Solr don't see.
> > > >
> > > > By the way, is there one method to retrieve state across 4.x and 5.x?
> > It
> > > > seems that there are different methods depending on Solr version.
> > > >
> > > > On Thu, Apr 9, 2015 at 12:23 PM, Shalin Shekhar Mangar <
> > > > shalinman...@gmail.com> wrote:
> > > >
> > > > > Hi Ahmed,
> > > > >
> > > > > Can you give more details? What did you expect and what was the
> > actual?
> > > > > Also, are you looking directly at the clusterstate.json inside
> > > ZooKeeper
> > > > or
> > > > > are you using the 'clusterstatus' Collection API?
> > > > >
> > > > > You shouldn't look at the clusterstate.json directly because 1)
> > things
> > > > like
> > > > > live-ness is not stored in clusterstate.json and 2) collections
> > created
> > > > > with Solr 5.0 have their own individual state.json inside
> > > > > /collections/collection_name/state.json
> > > > >
> > > > > On Thu, Apr 9, 2015 at 3:37 PM, Ahmed Adel 
> > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > On Solr 5.0 and ZK 3.4.6 sometimes clusterstate.json does not
> > reflect
> > > > the
> > > > > > aggregation of states of collections, the latter is always
> > correct. I
> > > > > could
> > > > > > verify this from the admin panel (under Tree view) and from
> ZKCli.
> > Is
> > > > > there
> > > > > > something I'm missing that could generate this issue?
> > > > > >
> > > > > > --
> > > > > >
> > > > > > A. Adel
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Shalin Shekhar Mangar.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > A. Adel
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
> >
> >
> > --
> > *Ahmed Adel*
> > www.badrit.com
> > <http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2F>
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


-- 
Sent from my iPhone


Re: clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
In fact, the advantage I see of using ZK is that we don't have to iterate
over nodes in case the first node receiving that request is down, whereas,
by using ZK REST API, we can do that in a single request as I assume we can
check live_nodes (in case this approach is guaranteed when using Solr 4.x)
and send the request directly to a live node. Let me know if this makes
sense.

On Thu, Apr 9, 2015 at 2:31 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Yes, you can use the 'clusterstatus' API which will return an aggregation
> of all states. See
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18
>
>
> On Thu, Apr 9, 2015 at 5:52 PM, Ahmed Adel  wrote:
>
> > Hi Shalin,
> >
> > Thanks for your response. I'm actually looking inside ZooKeeper in order
> to
> > obtain highest availability. What I expected is that clusterstate.json
> > contains the aggregation of all state.json children nodes of each
> > collection. But your second paragraph explains the behavior I see in Solr
> > 5.0 while others using prior versions of Solr don't see.
> >
> > By the way, is there one method to retrieve state across 4.x and 5.x? It
> > seems that there are different methods depending on Solr version.
> >
> > On Thu, Apr 9, 2015 at 12:23 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Hi Ahmed,
> > >
> > > Can you give more details? What did you expect and what was the actual?
> > > Also, are you looking directly at the clusterstate.json inside
> ZooKeeper
> > or
> > > are you using the 'clusterstatus' Collection API?
> > >
> > > You shouldn't look at the clusterstate.json directly because 1) things
> > like
> > > live-ness is not stored in clusterstate.json and 2) collections created
> > > with Solr 5.0 have their own individual state.json inside
> > > /collections/collection_name/state.json
> > >
> > > On Thu, Apr 9, 2015 at 3:37 PM, Ahmed Adel 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > On Solr 5.0 and ZK 3.4.6 sometimes clusterstate.json does not reflect
> > the
> > > > aggregation of states of collections, the latter is always correct. I
> > > could
> > > > verify this from the admin panel (under Tree view) and from ZKCli. Is
> > > there
> > > > something I'm missing that could generate this issue?
> > > >
> > > > --
> > > >
> > > > A. Adel
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
> >
> >
> > --
> > A. Adel
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
*Ahmed Adel*
www.badrit.com
<http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2F>


clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
Hi All,

On Solr 5.0 and ZK 3.4.6 sometimes clusterstate.json does not reflect the
aggregation of states of collections, the latter is always correct. I could
verify this from the admin panel (under Tree view) and from ZKCli. Is there
something I'm missing that could generate this issue?

-- 

A. Adel


Re: clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
Hi Shalin,

Thanks for your response. I'm actually looking inside ZooKeeper in order to
obtain highest availability. What I expected is that clusterstate.json
contains the aggregation of all state.json children nodes of each
collection. But your second paragraph explains the behavior I see in Solr
5.0 while others using prior versions of Solr don't see.

By the way, is there one method to retrieve state across 4.x and 5.x? It
seems that there are different methods depending on Solr version.

On Thu, Apr 9, 2015 at 12:23 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Ahmed,
>
> Can you give more details? What did you expect and what was the actual?
> Also, are you looking directly at the clusterstate.json inside ZooKeeper or
> are you using the 'clusterstatus' Collection API?
>
> You shouldn't look at the clusterstate.json directly because 1) things like
> live-ness is not stored in clusterstate.json and 2) collections created
> with Solr 5.0 have their own individual state.json inside
> /collections/collection_name/state.json
>
> On Thu, Apr 9, 2015 at 3:37 PM, Ahmed Adel  wrote:
>
> > Hi All,
> >
> > On Solr 5.0 and ZK 3.4.6 sometimes clusterstate.json does not reflect the
> > aggregation of states of collections, the latter is always correct. I
> could
> > verify this from the admin panel (under Tree view) and from ZKCli. Is
> there
> > something I'm missing that could generate this issue?
> >
> > --
> >
> > A. Adel
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--
A. Adel


json facet API response size

2017-02-13 Thread ahmed darweesh
I tried migrating our facet search from the old facet method to the new
json facet API, but there is one problem in the size of the returned
response. for example one query response size is around 1.2 MB while the
same query using the old facet method produces a response of around 160 KB.

Is there any way to reduce the size of the response from the json facet
response?

I'm using version 5.2 BTW.


Solr Seach proposal

2014-03-29 Thread ahmed shawki



Hi All,
My name is Ahmed. I am from Egypt. I have spent the last two months in 
developing a custom web-interface for Solr using HTML and JavaScript.
I have called it "Solr Search" (until now).
Actually, the basic idea was inspired to me from "AJAX Solr".
But "Solr Search" provides a different approach in terms of the usability and 
the available options to the users.
(I know that) this might not be very interesting information to you, because 
you already have better search interfaces.
But I just send this email in the hope of sharing "Solr Search" to the 
community.
I don't know the exact steps of how to share it. So, kindly please, guide me if 
appropriate.
Also, please find the attached "quick overview" document about "Solr Search".
ThanksAhmed
  

Re: Solr Seach proposal

2014-04-01 Thread ahmed shawki



Hi All,Hi Furkan and Ahmet,
Thanks for your reply to my last email about "Solr Search" proposal (sent on 
last Sunday, 30-Mar-2014).
This is just to announce "Solr Search", which is a simple HTML interface for 
searching documents which are indexed by Apache Solr (TM).
Actually, it was developed during the last two months (in spare time). So, this 
small HTML interface for Solr is far from a complete or a mature project.
But its features and options might be found useful by some users of Solr, that 
is why I am glad to share it here.
The code is hosted now at:https://code.google.com/p/solr-search-html/
In this page, a "quick overview" link about "Solr Search" can be found, as well 
as a link for downloading it.
Thanks
Best Regards,Ahmed shawkiasha...@hotmail.com

  

Export feature issue in Solr 4.10

2014-10-02 Thread Ahmed Adel
Hi All,

I'm trying to use Solr 4.10 export feature, but I'm getting an error. Maybe
I missed something.

Here's the scenario:


   1. Download Solr 4.10.0
   2. Use collection1 schema out of the box
   3. Add docValues="true" to price and pages fields in schema.xml
   4. Index books.json using command line:
   curl http://localhost:8984/solr/collection1/update -H
"Content-Type: text/json" --data-binary
@example/exampledocs/books.json
   5. Try running this query:
   http://localhost:8984/solr/collection1/export?q=*:*&sort=price%20asc&fl=price
   6. Here's the error I get:

   java.lang.IllegalArgumentException: docID must be >= 0 and <
maxDoc=4 (got docID=4)
at 
org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182)
at 
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109)
at 
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700)
at 
org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213)
at 
org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
...


Any ideas what could be missing?

Thanks,
A. Adel


Re: Export feature issue in Solr 4.10

2014-10-04 Thread Ahmed Adel
Thanks Joel, I changed enableLazyFieldLoading to false and it worked just
fine.

However, for some reason, I was expecting it to return
"content-disposition: attachment" in the response. Maybe because response
of this request would most probably be huge and if returned to the browser,
it makes sense to be downloaded as browser won't be able to handle it
efficiently, at least by supplying a request parameter. What do you think?

A. Adel
On Oct 2, 2014 11:06 PM, "Joel Bernstein"  wrote:

> There is bug in how the export handler is working when you have very few
> documents in the index and the solrconfig.xml is configured to enable lazy
> document loading:
>
> true
>
> The tests didn't catch this because lazy loading was set to the default
> which is false in the tests. The manual testing I did, didn't catch this
> because I tested with a large number of documents in the index.
>
> Your example will work if you change:
>
> false
>
> And if you load a typical index with lots of documents you should have no
> problems running with lazy loading enabled.
>
> I'll create jira to fix this issue.
>
>
>
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
> On Thu, Oct 2, 2014 at 4:10 PM, Joel Bernstein  wrote:
>
> > Yep getting the same error. Investigating...
> >
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
> > On Thu, Oct 2, 2014 at 12:59 PM, Ahmed Adel 
> wrote:
> >
> >> Hi All,
> >>
> >> I'm trying to use Solr 4.10 export feature, but I'm getting an error.
> >> Maybe
> >> I missed something.
> >>
> >> Here's the scenario:
> >>
> >>
> >>1. Download Solr 4.10.0
> >>2. Use collection1 schema out of the box
> >>3. Add docValues="true" to price and pages fields in schema.xml
> >>4. Index books.json using command line:
> >>curl http://localhost:8984/solr/collection1/update -H
> >> "Content-Type: text/json" --data-binary
> >> @example/exampledocs/books.json
> >>5. Try running this query:
> >>
> >>
> http://localhost:8984/solr/collection1/export?q=*:*&sort=price%20asc&fl=price
> >>6. Here's the error I get:
> >>
> >>java.lang.IllegalArgumentException: docID must be >= 0 and <
> >> maxDoc=4 (got docID=4)
> >> at
> >>
> org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182)
> >> at
> >>
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109)
> >> at
> >> org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700)
> >> at
> >>
> org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213)
> >> at
> >>
> org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623)
> >> at
> >>
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507)
> >> at
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
> >> at
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
> >> ...
> >>
> >>
> >> Any ideas what could be missing?
> >>
> >> Thanks,
> >> A. Adel
> >>
> >
> >
>


Indexed epoch time in Solr

2015-01-25 Thread Ahmed Adel
Hi All,

Is there a way to convert unix time field that is already indexed to
ISO-8601 format in query response? If this is not possible on the query
level, what is the best way to copy this field to a new Solr standard date
field.

Thanks,

-- 
*Ahmed Adel*
<http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2F>


Re: highlight on prefix query

2011-08-31 Thread Ahmed Boubaker
Well, that's one use case, there're others where you need to highlight only
what is matching.

For now, I solved the problem by writing an additional procedure to correct
the highlighting.  Not nice, but it works!

On Sat, Aug 6, 2011 at 11:10 AM, Kissue Kissue  wrote:

> I think this is correct behaviour. If you go to google and search for
> "Tel",
> you will see that telephone is highlighted.
>
> On Fri, Aug 5, 2011 at 5:42 PM, Ahmed Boubaker
> wrote:
>
> > Hi,
> >
> > I am using solr 3 and highlighting is working fine.  However when using
> > prefix query like tel*, the highlighter highlights the whole matching
> words
> > (i.e. television, telephone, ...).  I am highlighting a very short field
> > (3~5 words length).
> >
> > How can I prevent the highlighter from doing so?  I want to get only the
> > prefix of these words highlighted (i.e. television,
> > telephone, ...), any solution or idea ?
> >
> > Many thanks for your help,
> >
> > Boubaker
> >
>


DataImportHandler Transformer and environment property

2011-08-31 Thread Ahmed Boubaker
Hello,

Anyone knows how can you access property environmen from a custom
Transformer I defined?
Also, I am wondering where "solrcore.properties" should be located in a
multicore setup and how can I access the properties defined inside from
various solr plugins?

Many thanks for your help,

Boubaker


advice on creating a solr index when data source is from many unrelated db tables

2010-07-29 Thread S Ahmed
I understand (and its straightforward) when you want to create a index for
something simple like Products.

But how do you go about creating a Solr index when you have data coming from
10-15 database tables, and the tables have unrelated data?

The issue is then you would have many 'columns' in your index, and they will
be NULL for much of the data since you are trying to shove 15 db tables into
a single Solr/Lucense index.


This must be a common problem, what are the potential solutions?


Re: advice on creating a solr index when data source is from many unrelated db tables

2010-07-30 Thread S Ahmed
So I have tables like this:

Users
UserSales
UserHistory
UserAddresses
UserNotes
ClientAddress
CalenderEvent
Articles
Blogs

Just seems odd to me, jamming on these tables into a single index.  But I
guess the idea of using a 'type' field to quality exactly what I am
searching is a good idea, in case I need to filter for only 'articles' or
blogs or contacts etc.

But there might be 50 fields if I do this no?



On Fri, Jul 30, 2010 at 4:01 AM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:

> Hi Ahmed,
>
> fields that are empty do not impact the index. It's different from a
> database.
> I have text fields for different languages and per document there is
> always only one of the languages set (the text fields for the other
> languages are empty/not set). It works all very well and fast.
>
> I wonder more about what you describe as "unrelated data" - why would
> you want to put unrelated data into a single index? If you want to
> search on all the data and return mixed results there surely must be
> some kind of relation between the documents?
>
> Chantal
>
> On Thu, 2010-07-29 at 21:33 +0200, S Ahmed wrote:
> > I understand (and its straightforward) when you want to create a index
> for
> > something simple like Products.
> >
> > But how do you go about creating a Solr index when you have data coming
> from
> > 10-15 database tables, and the tables have unrelated data?
> >
> > The issue is then you would have many 'columns' in your index, and they
> will
> > be NULL for much of the data since you are trying to shove 15 db tables
> into
> > a single Solr/Lucense index.
> >
> >
> > This must be a common problem, what are the potential solutions?
>
>
>
>


MoreLikeThis to extract relevant terms to the query from the index

2010-11-07 Thread farag ahmed
Hi All,
 
I am using MoreLikeThis.java in lucene to expand the query with related terms. 
It works fine and I could't retrieve the relevant documents to the query but I 
couldn’t know how to extract the related terms to the query for the index. 
 
my task is:
 
 For example query is "bank" related terms can be "money", "credit" and so on 
that appeares frequntly with "bank" in the index.
 what I should write in the main even I get the interesting terms to my query?
 
i tried 
 
BooleanQuery result = (BooleanQuery) mlt.like(docNum); 

result.add(query, BooleanClause.Occur.MUST_NOT); 

System.out.println(result.getClauses().toString());
 
but it doesnt help

any idea




MoreLikeThis to extract relevant terms to the query from the index

2010-11-07 Thread farag ahmed
Hi All,

I am using MoreLikeThis.java in lucene to expand the query with related terms. 
It works fine and I could retrieve the relevant documents to the query but I 
couldn’t know how to extract the related terms to the query for the index. 

my task is:

For example query is "bank" related terms can be "money", "credit" and so on 
that appeares frequntly with "bank" in the index.
what I should write in the main even I get the interesting terms to my query?

i tried 

BooleanQuery result = (BooleanQuery) mlt.like(docNum); 

result.add(query, BooleanClause.Occur.MUST_NOT); 

System.out.println(result.getClauses().toString());

but it doesnt help

any idea






Dispatching a query to multiple different cores

2011-08-03 Thread Ahmed Boubaker
Hello there!

I have a multicore solr with 6 different "simple" cores and somewhat
different schemas and I defined another "meta" core which I would it to be a
dispatcher:  the requests are sent to "simple" cores and results are
aggregated before sending back the results to the user.

Any idea or hints how can I achieve this?
I am wondering whether writing custom SearchComponent or a custom
SearchHandler are good entry points?
Is it possible to acces other SolrCore which are in the same container as
the "meta" core?

Many thanks for your help.

Boubaker


highlight on prefix query

2011-08-05 Thread Ahmed Boubaker
Hi,

I am using solr 3 and highlighting is working fine.  However when using
prefix query like tel*, the highlighter highlights the whole matching words
(i.e. television, telephone, ...).  I am highlighting a very short field
(3~5 words length).

How can I prevent the highlighter from doing so?  I want to get only the
prefix of these words highlighted (i.e. television,
telephone, ...), any solution or idea ?

Many thanks for your help,

Boubaker


Regex Transformer Error

2008-11-05 Thread Ahmed Hammad
Hi,

I am using Solr 1.3 data import handler. One of my table fields has html
tags, I want to strip it of the field text. So obviously I need the Regex
Transformer.

I added transformer="RegexTransformer" attribute to my entity and a new
field with:



Every thing works fine. The text is replace without any problem. The provlem
happend with my regular experession to strip html tags. So I use
regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not allowed in
XML. I tried the following
regex="<(.|\n)*?>" and regex="C;(.|\n)*?E;" but I get the
following error:

The value of attribute "regex" associated with an element type "field" must
not contain the '<' character. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
...

The full stack trace is following:

*FATAL: Could not create importer. DataImporter config invalid
org.apache.solr.common.SolrException: FATAL: Could not create importer.
DataImporter config invalid at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
occurred while initializing context Processing Document # at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
at
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:93)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
"regex" associated with an element type "field" must not contain the '<'
character. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source) at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
... 19 more *

*description* *The server encountered an internal error (FATAL: Could not
create importer. DataImporter config invalid
org.apache.solr.common.SolrException: FATAL: Could not create importer.
DataImporter config invalid at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
at java.lang.

Re: Regex Transformer Error

2008-11-05 Thread Ahmed Hammad
Hi,

It works with the attribute regex="<(.|\n)*?>"

Sorry for the disturbance.

Regards,

ahmd


On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am using Solr 1.3 data import handler. One of my table fields has html
> tags, I want to strip it of the field text. So obviously I need the Regex
> Transformer.
>
> I added transformer="RegexTransformer" attribute to my entity and a new
> field with:
>
>  replaceWith="X"/>
>
> Every thing works fine. The text is replace without any problem. The
> provlem happend with my regular experession to strip html tags. So I use
> regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not allowed in
> XML. I tried the following
> regex="<(.|\n)*?>" and regex="C;(.|\n)*?E;" but I get the
> following error:
>
> The value of attribute "regex" associated with an element type "field" must
> not contain the '<' character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> ...
>
> The full stack trace is following:
>
> *FATAL: Could not create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
> at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
> occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
> ... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
> "regex" associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
> ... 19 more *
>
> *description* *The server encountered an internal error (FATAL: Could not
> create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>

Re: Regex Transformer Error

2008-11-06 Thread Ahmed Hammad
It worked by replace < with < and > with >

Thank you for your support,
ahmd

On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote:

> There is a nice HTML stripper inside Solr.
> "solr.HTMLStripStandardTokenizerFactory"
>



>
> -Original Message-
> From: Ahmed Hammad [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 05, 2008 10:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Regex Transformer Error
>
> Hi,
>
> It works with the attribute regex="<(.|\n)*?>"
>
> Sorry for the disturbance.
>
> Regards,
>
> ahmd
>
>
> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I am using Solr 1.3 data import handler. One of my table fields has
> > html tags, I want to strip it of the field text. So obviously I need
> > the Regex Transformer.
> >
> > I added transformer="RegexTransformer" attribute to my entity and a
> > new field with:
> >
> >  > replaceWith="X"/>
> >
> > Every thing works fine. The text is replace without any problem. The
> > provlem happend with my regular experession to strip html tags. So I
> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
> > allowed in XML. I tried the following regex="<(.|\n)*?>" and
> > regex="C;(.|\n)*?E;" but I get the following error:
> >
> > The value of attribute "regex" associated with an element type "field"
>
> > must not contain the '<' character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) ...
> >
> > The full stack trace is following:
> >
> > *FATAL: Could not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:106) ... 17 more Caused by:
> > org.xml.sax.SAXParseException: The value of attribute "regex"
> > associated with an element type "field" must not contain the '<'
> > character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) at
> > com.sun.org.a

Re: DataImportHandler not indexing all the records

2008-11-15 Thread Ahmed Hammad
I had a similar problem like Giri. I have 17,000 record in one table and DIH
can import only 12464.

After some investigation, I found my problem.

I have a regular expression to strip off html tags form input text, as
following:



The DIH RegEx have stack overflow on the record 17,000 due to error in the
content and then DIH exit without any error in the log on in the status
command. Here is the status:


0:0:31.657
1
12464
12464
0
2008-11-15 20:40:58


I found the error in Eclipse Console window while debugging; it was a stack
overflow in the RegEx library.

The problem is that, DIH does not show any problem in log file on in status
message.
What I think is important is to show whatever error happen in the log file.

I noticed also that, in case of no error a log message show completness:

Nov 15, 2008 8:57:34 PM org.apache.solr.handler.dataimport.DocBuilder
execute
INFO: Time taken = 0:0:40.656

In case of RegEx stack overflow error, this log message does not appear.

I am researching on how to catch such error in DIH. Any ideas?


Regards,
ahmd

On Sat, Nov 15, 2008 at 6:32 AM, Noble Paul നോബിള്‍ नोब्ळ् <
[EMAIL PROTECTED]> wrote:

> There is no obvious problem
>
> I can be reasonably sure that
> the query
>
> select * from climatedata.ws_record limit 100
>
> would have fetched only  615360 rows.
> This is a very reliable pice of information
> 615360
>
> On Sat, Nov 15, 2008 at 12:41 AM, Giri <[EMAIL PROTECTED]> wrote:
> > Hi Noble,
> > thanks for the help, here are the details: the field "id" is unique, when
> I
> > did a select distinct(id), it returned 1 million rows.
> >
> > ---
> > db-data-config.xml
> > note: I limit the resultset to 1 million in the select query
> > ---
> > 
> > > url="jdbc:mysql://localhost:3306/climatedata" user="user" password="pw"
> > batchSize ="-1"/>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 
> >
> > 
> >
> > -
> > in the solr Schema.xml:
> > 
> > 
> >> multiValued="false"/>
> > > multiValued="true" required="false"/>
> > > multiValued="true" required="false"/>
> > > multiValued="true" required="false"/>
> > > indexed="true" stored="true"  required="false"/>
> > > indexed="true" stored="true"  required="false"/>
> > > multiValued="true"/>
> > > multiValued="true"/>
> > > multiValued="true"/>
> >
> >   
> >> multiValued="true" required="false"/>
> >
> >   
> >> required="false"/>
> >
> >
> >   
> > stored="true"/>
> > stored="true"/>
> > stored="true"/>
> > stored="true"/>
> > stored="true"/>
> > stored="true"/>
> > stored="true"/>
> > stored="true"/>
> >  
> >
> > 
> > I run the index via  firefox browser using
> > http://localhost:8080/solr/dataimport?command=full-import
> > I checked the status using
> > http://localhost:8080/solr/dataimport?command=status
> > initially the status increased steadily, but after reaching 613071, the
> > status stayed for a while (as below), and then it displayed the completed
> > message :
> > 
> > 
> > -
> > 
> > 0
> > 1
> > 
> > -
> > 
> > -
> > 
> > db-data-config.xml
> > 
> > 
> > status
> > busy
> > A command is still running...
> > -
> > 
> > 0:3:24.266
> > 1
> > 613071
> > 613070
> > 0
> > 2008-11-14 12:12:16
> > 
> > -
> > 
> > This response format is experimental.  It is likely to change in the
> future.
> > 
> > 
> >
> > ---
> >
> >>>NOTE: this is the status result after it completed
> > ---
> >
> > 
> > -
> > 
> > 0
> > 1
> > 
> > -
> > 
> > -
> > 
> > db-data-config.xml
> > 
> > 
> > status
> > idle
> > 
> > -
> > 
> > 1
> > 615360
> > 0
> > 2008-11-14 12:12:16
> > -
> > 
> > Indexing completed. Added/Updated: 615360 documents. Deleted 0 documents.
> > 
> > 2008-11-14 12:16:32
> > 2008-11-14 12:16:32
> > 0:4:16.154
> > 
> > -
> > 
> > This response format is experimental.  It is likely to change in the
> future.
> > 
> > 
> >
> > -
> >
> > here is the full solr scehma.xml content:
> > 
> > 
> > 
> >
> > 
> >  
> >
> >
> >
> > sortMissingLast="true"/>
> >
> >
> > > sortMissingLast="true"/>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > sortMissingLast="true"/>
> > > sortMissingLast="true"/>
> > > sortMissingLast="true"/>
> > > sortMissingLast="true"/>
> >
> >
> >
> > 

Re: DataImportHandler not indexing all the records

2008-11-15 Thread Ahmed Hammad
Thanks Shalin,

I have added a new field type in my schema as following:



  

  


and added my field

   

After restarting Solr, and making full-import, everything work just fine.
Many thanks.


Regards,
Ahmed


My best wishes,

Regards,
Ahmed Hammad



On Sat, Nov 15, 2008 at 9:21 PM, Shalin Shekhar Mangar <
[EMAIL PROTECTED]> wrote:

> I think the problem is that DIH catches Exception but not Error so a
> StackOverFlowError will slip past it. Normally, the SolrDispatchFilter will
> log such errors but the import is performed in a new thread, so the error
> is
> not logged anywhere. However, DIH will not commit documents in this case
> (and there is no mention of a commit in your DIH status).
>
> We should change the catch clause to catch Throwable so that this is not
> repeated. I'll open an issue and give a patch.
>
> Btw, Ahmed, Solr has a Tokenizer which is much better at striping html --
> HTMLStripWhitespaceTokenizerFactory which you can use for such tasks.
>
> On Sun, Nov 16, 2008 at 12:30 AM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:
>
> > I had a similar problem like Giri. I have 17,000 record in one table and
> > DIH
> > can import only 12464.
> >
> > After some investigation, I found my problem.
> >
> > I have a regular expression to strip off html tags form input text, as
> > following:
> >
> >  > replaceWith=" "/>
> >
> > The DIH RegEx have stack overflow on the record 17,000 due to error in
> the
> > content and then DIH exit without any error in the log on in the status
> > command. Here is the status:
> >
> > 
> > 0:0:31.657
> > 1
> > 12464
> > 12464
> > 0
> > 2008-11-15 20:40:58
> > 
> >
> > I found the error in Eclipse Console window while debugging; it was a
> stack
> > overflow in the RegEx library.
> >
> > The problem is that, DIH does not show any problem in log file on in
> status
> > message.
> > What I think is important is to show whatever error happen in the log
> file.
> >
> > I noticed also that, in case of no error a log message show completness:
> >
> > Nov 15, 2008 8:57:34 PM org.apache.solr.handler.dataimport.DocBuilder
> > execute
> > INFO: Time taken = 0:0:40.656
> >
> > In case of RegEx stack overflow error, this log message does not appear.
> >
> > I am researching on how to catch such error in DIH. Any ideas?
> >
> >
> > Regards,
> > ahmd
> >
> > On Sat, Nov 15, 2008 at 6:32 AM, Noble Paul നോബിള്‍ नोब्ळ् <
> > [EMAIL PROTECTED]> wrote:
> >
> > > There is no obvious problem
> > >
> > > I can be reasonably sure that
> > > the query
> > >
> > > select * from climatedata.ws_record limit 100
> > >
> > > would have fetched only  615360 rows.
> > > This is a very reliable pice of information
> > > 615360
> > >
> > > On Sat, Nov 15, 2008 at 12:41 AM, Giri <[EMAIL PROTECTED]> wrote:
> > > > Hi Noble,
> > > > thanks for the help, here are the details: the field "id" is unique,
> > when
> > > I
> > > > did a select distinct(id), it returned 1 million rows.
> > > >
> > > > ---
> > > > db-data-config.xml
> > > > note: I limit the resultset to 1 million in the select query
> > > > ---
> > > > 
> > > > > > > url="jdbc:mysql://localhost:3306/climatedata" user="user"
> password="pw"
> > > > batchSize ="-1"/>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > >
> > > > 
> > > >
> > > > -
> > > > in the solr Schema.xml:
> > > > 
> > > > 
> > > >> > > multiValued="false"/>
> > > > > > > multiValued="true" required="false"/>
> > > > > > > multiValued="true" req

Re: Regex Transformer Error

2008-11-17 Thread Ahmed Hammad
Hi All,

Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
will be stored in the index and needed to be removed while searching. In my
case the HTML tags has no need at all. So I created HTMLStripTransformer for
the DIH to remove the HTML tags and save space on the index. I have used the
HTML parser included with Lucene ( org.apache.lucene.demo.html). It is well
performing and worked with me (while working with Lucene before moving to
Solr)

What do you think? Does it worth contribution?

My best wishes,

Regards,
Ahmed

On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote:

> There is a nice HTML stripper inside Solr.
> "solr.HTMLStripStandardTokenizerFactory"
>
> -Original Message-
> From: Ahmed Hammad [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, November 05, 2008 10:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Regex Transformer Error
>
> Hi,
>
> It works with the attribute regex="<(.|\n)*?>"
>
> Sorry for the disturbance.
>
> Regards,
>
> ahmd
>
>
> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I am using Solr 1.3 data import handler. One of my table fields has
> > html tags, I want to strip it of the field text. So obviously I need
> > the Regex Transformer.
> >
> > I added transformer="RegexTransformer" attribute to my entity and a
> > new field with:
> >
> >  > replaceWith="X"/>
> >
> > Every thing works fine. The text is replace without any problem. The
> > provlem happend with my regular experession to strip html tags. So I
> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
> > allowed in XML. I tried the following regex="<(.|\n)*?>" and
> > regex="C;(.|\n)*?E;" but I get the following error:
> >
> > The value of attribute "regex" associated with an element type "field"
>
> > must not contain the '<' character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) ...
> >
> > The full stack trace is following:
> >
> > *FATAL: Could not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr

Re: Regex Transformer Error

2008-11-29 Thread Ahmed Hammad
OK, I contributed it at:
https://issues.apache.org/jira/browse/SOLR-887

I changed it to use Solr class org.apache.solr.analysis.HTMLStripReader

Thank you all.

Ahmed



On Tue, Nov 18, 2008 at 5:49 AM, Noble Paul നോബിള്‍ नोब्ळ् <
[EMAIL PROTECTED]> wrote:

> On Tue, Nov 18, 2008 at 2:49 AM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:
> > Hi All,
> >
> > Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
> > will be stored in the index and needed to be removed while searching. In
> my
> > case the HTML tags has no need at all. So I created HTMLStripTransformer
> for
> > the DIH to remove the HTML tags and save space on the index. I have used
> the
> > HTML parser included with Lucene ( org.apache.lucene.demo.html). It is
> well
> > performing and worked with me (while working with Lucene before moving to
> > Solr)
> >
> > What do you think? Does it worth contribution?
> Yes. You can contribute this new transformer as an enhancement .
> >
> > My best wishes,
> >
> > Regards,
> > Ahmed
> >
> > On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote:
> >
> >> There is a nice HTML stripper inside Solr.
> >> "solr.HTMLStripStandardTokenizerFactory"
> >>
> >> -Original Message-
> >> From: Ahmed Hammad [mailto:[EMAIL PROTECTED]
> >> Sent: Wednesday, November 05, 2008 10:43 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Regex Transformer Error
> >>
> >> Hi,
> >>
> >> It works with the attribute regex="<(.|\n)*?>"
> >>
> >> Sorry for the disturbance.
> >>
> >> Regards,
> >>
> >> ahmd
> >>
> >>
> >> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am using Solr 1.3 data import handler. One of my table fields has
> >> > html tags, I want to strip it of the field text. So obviously I need
> >> > the Regex Transformer.
> >> >
> >> > I added transformer="RegexTransformer" attribute to my entity and a
> >> > new field with:
> >> >
> >> >  >> > replaceWith="X"/>
> >> >
> >> > Every thing works fine. The text is replace without any problem. The
> >> > provlem happend with my regular experession to strip html tags. So I
> >> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
> >> > allowed in XML. I tried the following regex="<(.|\n)*?>" and
> >> > regex="C;(.|\n)*?E;" but I get the following error:
> >> >
> >> > The value of attribute "regex" associated with an element type "field"
> >>
> >> > must not contain the '<' character. at
> >> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> >> > Source) ...
> >> >
> >> > The full stack trace is following:
> >> >
> >> > *FATAL: Could not create importer. DataImporter config invalid
> >> > org.apache.solr.common.SolrException: FATAL: Could not create
> >> importer.
> >> > DataImporter config invalid at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> >> > Handler.java:114)
> >> > at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> >> > (DataImportHandler.java:206)
> >> > at
> >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> >> > rBase.java:131) at
> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> >> > java:303)
> >> > at
> >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> >> > .java:232)
> >> > at
> >> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> >> > cationFilterChain.java:235)
> >> > at
> >> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> >> > lterChain.java:206)
> >> > at
> >> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> >> > lve.java:233)
> >> > at
>

DIH Admin Page Commands

2008-12-13 Thread Ahmed Hammad
Hi,

I would like to add a few utility commands to the DIH admin page. I
frequently need these commands to manage the index. The commands are: full
Import, delta Import, status, reload config, ... In addition to "Return to
Admin Page" link.

It will be a simple forms at the end of debug.jsp as following:


























Return to Admin Page



What do you think? I would like to know if there exist better ways.


Regards,
Ahmed


Re: DIH Admin Page Commands

2008-12-13 Thread Ahmed Hammad
Thanks for your feedback :-)

Sure, I will open issue and attach a patch.

Best wishes,
Ahmed


On Sat, Dec 13, 2008 at 7:44 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Thanks for the suggestions. I'm sure these will be useful to a lot of
> users.
> Do you mind opening an issue in Jira? If you have a patch, that'd be
> awesome
> :-)
>
> On Sat, Dec 13, 2008 at 8:58 PM, Ahmed Hammad  wrote:
>
> > Hi,
> >
> > I would like to add a few utility commands to the DIH admin page. I
> > frequently need these commands to manage the index. The commands are:
> full
> > Import, delta Import, status, reload config, ... In addition to "Return
> to
> > Admin Page" link.
> >
> > It will be a simple forms at the end of debug.jsp as following:
> >
> > 
> >
> >
> > > value="full-import">
> > > value="delta-import">
> >
> > > value="reload-config">
> >
> > 
> >
> > 
> >    
> >
> >
> >
> > 
> >
> > 
> >
> >
> >
> >
> >
> > 
> >
> > Return to Admin Page
> >
> > 
> >
> > What do you think? I would like to know if there exist better ways.
> >
> >
> > Regards,
> > Ahmed
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


How to indexing non-english html text in unicode with Solr?

2009-04-23 Thread ahmed baseet
Hi All,
I'm trying to index some regional/non-eng html pages with Solr. I thought of
indexing the corresponding unicode text for that page as Solr supports
Unicode indexing, right?
But I'm not able to extract Xml from the html page, because for posting to
Solr we require Xml. Can anyone tell me any good method of extracting Xml
from html or just let me know how to index non-english html pages with Solr
that will enable me searching with unicode queries (for corresponding
regional query). Thanks in advance.

--Ahmed.


Re: How to indexing non-english html text in unicode with Solr?

2009-04-24 Thread ahmed baseet
Grant, thanks for your quick response.

In the mean time I did a bit googling and found that there are java swing
html parsers that can extract the plain text from the html page. I tried
running sample examples with non-english pages and found that its working
fine. Then I thought of putting this whole extracted text(the unicode text
obviously)  under one field say "PageContent" and add the basic xml tags
like  ,  etc that will form my Xml and then push that off to Solr
for indexing. Now since its just a single page I don't know if the size will
be supported by Solr, because the page sizes can be quite large sometimes.
What is the maximum field length supported by Solr [Is it 1 by default?
I think so]. Will this make sense during searching?

Requesting all Solr users to give me their valuable advice. Thanks.

--Ahmed.

On Fri, Apr 24, 2009 at 4:32 PM, Grant Ingersoll wrote:

> See the Solr Cell contrib:
> http://wiki.apache.org/solr/ExtractingRequestHandler.  Note, it's 1.4-dev
> only.  If you want it for 1.3, you'll have to use Tika on the client side.
>
> Solr does support Unicode indexing.
>
>
> On Apr 24, 2009, at 2:22 AM, ahmed baseet wrote:
>
>  Hi All,
>> I'm trying to index some regional/non-eng html pages with Solr. I thought
>> of
>> indexing the corresponding unicode text for that page as Solr supports
>> Unicode indexing, right?
>> But I'm not able to extract Xml from the html page, because for posting to
>> Solr we require Xml. Can anyone tell me any good method of extracting Xml
>> from html or just let me know how to index non-english html pages with
>> Solr
>> that will enable me searching with unicode queries (for corresponding
>> regional query). Thanks in advance.
>>
>> --Ahmed.
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Hi All,
I'm trying to post some files to Solr server. I've done this using the
post.jar files for posting xml files residing on my local disk[I tried
posting all those xml files from example directory]. Now I'm trying to
generate xml files on the fly, with required text to be indexed included
therein though, and want to post these files to solr. As per the examples
we've used "SimplePostTool" for posting locally resinding files but can some
one give me direction on indexing in-memory xml files[files generated on the
fly]. Actually I want to automate this process in a loop, so that I'll
extract some information and put that to xml file and push it off to Solr
for indexing.
Thanks in appreciation.

--Ahmed.


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Shalin, thanks for your quick response.

Actually I'm trying to pull plaintext from html pages and trying to make xml
files for each page. I went through the SolrJ webpage and found that the
we've to add all the field and its contents anyway, right? but yes it makes
adding/updating etc quite easier than using that SimplePostTool.
 I tried to use SolrJ client but it doesnot seem to be working. I added all
the jar files mentioned in SolrJ wiki to classpath but still its giving me
some error.

To be precise it gives me the following error,
 .cannot find symbol:
symbol : class CommonsHttpSolrServer

I rechecked to make sure that "commons-httpclient-3.1.jar" is in the class
path. Can someone please point me what is the issue?

I'm working on Windows and my classpath variable is this:

.;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar

Thank you very much.
Ahmed.


On Mon, Apr 27, 2009 at 3:55 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Apr 27, 2009 at 3:30 PM, ahmed baseet  >wrote:
>
> > Hi All,
> > I'm trying to post some files to Solr server. I've done this using the
> > post.jar files for posting xml files residing on my local disk[I tried
> > posting all those xml files from example directory]. Now I'm trying to
> > generate xml files on the fly, with required text to be indexed included
> > therein though, and want to post these files to solr. As per the examples
> > we've used "SimplePostTool" for posting locally resinding files but can
> > some
> > one give me direction on indexing in-memory xml files[files generated on
> > the
> > fly]. Actually I want to automate this process in a loop, so that I'll
> > extract some information and put that to xml file and push it off to Solr
> > for indexing.
> > Thanks in appreciation.
> >
>
>
> You can use the Solrj client to avoid building the intermediate XML
> yourself. Extract the information, use the Solrj api to add the extracted
> text to fields and send them to the solr server.
>
> http://wiki.apache.org/solr/Solrj
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Hi,
After going through the solrj wiki I found that we've to set some
dependencies in pom.xml for using Solrj, which I haven't done yet. So I
googled to know how to do that but no help. I searched the solr directory
and found a bunch of *-pom.template files [like solr-core-pom.xml,
solr-solrj-pom.xml etc] and I'm not able to figure out which one to use. Any
help would be appreciated.

Thanks,
Ahmed.

On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet wrote:

> Shalin, thanks for your quick response.
>
> Actually I'm trying to pull plaintext from html pages and trying to make
> xml files for each page. I went through the SolrJ webpage and found that the
> we've to add all the field and its contents anyway, right? but yes it makes
> adding/updating etc quite easier than using that SimplePostTool.
>  I tried to use SolrJ client but it doesnot seem to be working. I added all
> the jar files mentioned in SolrJ wiki to classpath but still its giving me
> some error.
>
> To be precise it gives me the following error,
>  .cannot find symbol:
> symbol : class CommonsHttpSolrServer
>
> I rechecked to make sure that "commons-httpclient-3.1.jar" is in the class
> path. Can someone please point me what is the issue?
>
> I'm working on Windows and my classpath variable is this:
>
> .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar
>
> Thank you very much.
> Ahmed.
>
>
>
> On Mon, Apr 27, 2009 at 3:55 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> On Mon, Apr 27, 2009 at 3:30 PM, ahmed baseet > >wrote:
>>
>> > Hi All,
>> > I'm trying to post some files to Solr server. I've done this using the
>> > post.jar files for posting xml files residing on my local disk[I tried
>> > posting all those xml files from example directory]. Now I'm trying to
>> > generate xml files on the fly, with required text to be indexed included
>> > therein though, and want to post these files to solr. As per the
>> examples
>> > we've used "SimplePostTool" for posting locally resinding files but can
>> > some
>> > one give me direction on indexing in-memory xml files[files generated on
>> > the
>> > fly]. Actually I want to automate this process in a loop, so that I'll
>> > extract some information and put that to xml file and push it off to
>> Solr
>> > for indexing.
>> > Thanks in appreciation.
>> >
>>
>>
>> You can use the Solrj client to avoid building the intermediate XML
>> yourself. Extract the information, use the Solrj api to add the extracted
>> text to fields and send them to the solr server.
>>
>> http://wiki.apache.org/solr/Solrj
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
Can anyone help me selecting the proper pom.xml file out of the bunch of
*-pom.xml.templates available.
I got the following when searched for pom.xml files,
solr-common-csv-pom.xml
solr-lucene-analyzers-pom.xml
solr-lucene-contrib-pom.xml
solr-lucene-*-pom.xml [ a lot of solr-lucene-... pom files are available,
hence shortened to avoid typing all]
solr-dataimporthandler-pom.xml
solr-common-pom.xml
solr-core-pom.xml
solr-parent-pom.xml
solr-solr-pom.xml

Thanks,
Ahmed.

On Mon, Apr 27, 2009 at 5:38 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet  >wrote:
>
> >
> > To be precise it gives me the following error,
> >  .cannot find symbol:
> > symbol : class CommonsHttpSolrServer
> >
> > I rechecked to make sure that "commons-httpclient-3.1.jar" is in the
> class
> > path. Can someone please point me what is the issue?
> >
> > I'm working on Windows and my classpath variable is this:
> >
> > .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
> >
> >
> download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar
> >
>
> The jars look right. It is likely a problem with your classpath.
> CommonsHttpSolrServer is in the solr-solrj jar.
>
> If you are using Maven, then you'd need to change your pom.xml
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-27 Thread ahmed baseet
As far as I know, Maven is a build/mgmt tool for java projects quite similar
to Ant, right? No I'm not using this , then I think I don't need to worry
about those pom files.
But  I'm still not able to figure out the error with classpath/jar files I
mentioned in my previous mails. Shall I try getting those jar files,
specifically that solr-solrj jar that contains commons-http-solr-server
class files? If yes then can you tell me where to get those jar files from,
on the web?  Has anyone ever faced similar problems? Please help me fixing
these silly issues?

Thanks,
Ahmed.
On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet  >wrote:
>
> > Can anyone help me selecting the proper pom.xml file out of the bunch of
> > *-pom.xml.templates available.
> >
>
> Ahmed, are you using Maven? If not, then you do not need these pom files.
> If
> you are using Maven, then you need to add a dependency to solrj.
>
>
> http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?

2009-04-28 Thread ahmed baseet
Thank you very much. Now its working fine, fixed those minor classpath
issues.

Thanks,
Ahmed.

2009/4/28 Noble Paul നോബിള്‍ नोब्ळ् 

> the Solr distro contains all the jar files. you can take either the
> latest release (1.3) or a nightly
>
> On Tue, Apr 28, 2009 at 11:34 AM, ahmed baseet 
> wrote:
> > As far as I know, Maven is a build/mgmt tool for java projects quite
> similar
> > to Ant, right? No I'm not using this , then I think I don't need to worry
> > about those pom files.
> > But  I'm still not able to figure out the error with classpath/jar files
> I
> > mentioned in my previous mails. Shall I try getting those jar files,
> > specifically that solr-solrj jar that contains commons-http-solr-server
> > class files? If yes then can you tell me where to get those jar files
> from,
> > on the web?  Has anyone ever faced similar problems? Please help me
> fixing
> > these silly issues?
> >
> > Thanks,
> > Ahmed.
> > On Mon, Apr 27, 2009 at 6:59 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >> On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet  >> >wrote:
> >>
> >> > Can anyone help me selecting the proper pom.xml file out of the bunch
> of
> >> > *-pom.xml.templates available.
> >> >
> >>
> >> Ahmed, are you using Maven? If not, then you do not need these pom
> files.
> >> If
> >> you are using Maven, then you need to add a dependency to solrj.
> >>
> >>
> >>
> http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
> >
>
>
>
> --
> --Noble Paul
>


Addition of new field to Solr schema.xml not getting reflected properly

2009-04-28 Thread ahmed baseet
Hi All,
I'm trying to add a new field to Solr, so I stopped the tomcat[I'm working
on Windows] using the "Configure Tomcat" menu of Tomcat, then added the
following field

After restarting Tomcat, I couldn't see the changes, so I did the restart
couple of times, and then the schema showed me the changes. Now I tried to
change the type to
"text" from the current "string". I did that and restarted tomcat many times
but the changes are still not getting reflected. I've used Solr1.2 earlier
on linux, and every time I just had to touch the web.xml in webapp directory
thereby forcing tomcat to restart itself for changes in schema.xml to take
effect, but in windows I'm not able to figure out whats the issue. Is there
anything wrong, I'm I supposed to restart tomcat in some other way instead
of using the "Configure tomcat" menu. Has anyone faced similar issues with
solr1.3 on windows? Any suggestion would be appreciated.

Thanks,
Ahmed.


Re: Addition of new field to Solr schema.xml not getting reflected properly

2009-04-29 Thread ahmed baseet
I added some new documents,  and for these docs I can use the new field,
right? Though to reflect the changes for all docs I need to delete the old
index and build a new one.
As I mentioned earlier after a couple of restarts its worked. Still don't
know whats the issue. :-)

Thanks,
Ahmed.

On Wed, Apr 29, 2009 at 4:13 PM, Erik Hatcher wrote:

> Did you reindex your documents after making changes and restarting?  The
> types of changes you're making require reindexing.
>
>Erik
>
>
> On Apr 29, 2009, at 2:13 AM, ahmed baseet wrote:
>
>  Hi All,
>> I'm trying to add a new field to Solr, so I stopped the tomcat[I'm working
>> on Windows] using the "Configure Tomcat" menu of Tomcat, then added the
>> following field
>> 
>> After restarting Tomcat, I couldn't see the changes, so I did the restart
>> couple of times, and then the schema showed me the changes. Now I tried to
>> change the type to
>> "text" from the current "string". I did that and restarted tomcat many
>> times
>> but the changes are still not getting reflected. I've used Solr1.2 earlier
>> on linux, and every time I just had to touch the web.xml in webapp
>> directory
>> thereby forcing tomcat to restart itself for changes in schema.xml to take
>> effect, but in windows I'm not able to figure out whats the issue. Is
>> there
>> anything wrong, I'm I supposed to restart tomcat in some other way instead
>> of using the "Configure tomcat" menu. Has anyone faced similar issues with
>> solr1.3 on windows? Any suggestion would be appreciated.
>>
>> Thanks,
>> Ahmed.
>>
>
>


Problem adding unicoded docs to Solr through SolrJ

2009-04-29 Thread ahmed baseet
Hi All,
I'm trying to automate the process of posting xml s to Solr using Solrj.
Essentially I'm extracting the text from a given Url, then creating a
solrDoc and posting the same using the following function,

public void postToSolrUsingSolrj(String rawText, String pageId) {
String url = "http://localhost:8983/solr";;
CommonsHttpSolrServer server;

try {
// Get connection to Solr server
  server = new CommonsHttpSolrServer(url);

// Set XMLResponseParser : Reqd for older version of Solr 1.3
server.setParser(new XMLResponseParser());

server.setSoTimeout(1000);  // socket read timeout
  server.setConnectionTimeout(100);
  server.setDefaultMaxConnectionsPerHost(100);
  server.setMaxTotalConnections(100);
  server.setFollowRedirects(false);  // defaults to false
  // allowCompression defaults to false.
  // Server side must support gzip or deflate for this to have
any effect.
  server.setAllowCompression(true);
  server.setMaxRetries(1); // defaults to 0.  > 1 not
recommended.

// WARNING : this will delete all pre-existing Solr index
//server.deleteByQuery( "*:*" );// delete everything!

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", pageId );
doc.addField("features", rawText );


// Add the docs to Solr Server
server.add(doc);

// Do commit the changes
server.commit();

}catch (Exception e) {}
}

In the above the param rawText is just the html stripped off of all its
tags, js, css etc and pageId is the Url for that page. When I'm using this
for English pages its working perfectly fine but the problem comes up when
I'm trying to index some non-english pages. For them, say pages in tamil,
the encoding Unicode/Utf-8 seems to create some problem, because after
indexing some non-english pages when I'm trying to search those from solr
admin search interface, it gives the result but the content is not showing
in that language i.e tamil rather it just displays just some characters, i
think in unicode. The same thing worked fine for pages in English.

Now what I did is just extracted the raw text from that html page and
manually created an xml page like this



  
UTF2TEST
Test with some UTF-8 encoded characters
*some tamil unicode text here*
   


and posted this from command line using the post.jar file. Now searching
gives me the result but unlike last time browser shows the indexed text in
tamil itself and not the raw unicode. So this clearly shows that the string
that I'm using to create the solrDoc seems to have some encoding issues,
right? Or something else? I tried doing something like this also,

// Encode in Unicode UTF-8
 utfEncodedText = new String(rawText.getBytes("UTF-8"));

but even this didn't help eighter.
Its seems some silly problem some where, which I'm not able to catch. :-)

I appreciate if some one can point me the bug...

Thanks,
Ahmed.


Re: Problem adding unicoded docs to Solr through SolrJ

2009-04-29 Thread ahmed baseet
Thanks a lot for your quick and detailed response.
I got the point. But as I've mentioned earlier I've  a string of
rawtext[default encoding] that needs to be encoded in utf-8, so I tried
something stupid but working though. I first converted the whole string to
byte array and then used that byte array to create a new utf-8 encoded sting
like this,

// Encode in Unicode UTF-8
byte [] utfEncodeByteArray = textOnly.getBytes();
String utfString = new String(utfEncodeByteArray,
Charset.forName("UTF-8"));

then passed the utfString to the function for posting to Solr and it works
prefectly.
But is there any intelligent way of doing all this, like straight from
default encoded string to utf-8 encoded string, without going via byte
array.
Thank you very much.

--Ahmed.



On Wed, Apr 29, 2009 at 6:45 PM, Michael Ludwig  wrote:

> ahmed baseet schrieb:
>
>  public void postToSolrUsingSolrj(String rawText, String pageId) {
>>
>
> doc.addField("features", rawText );
>>
>
>  In the above the param rawText is just the html stripped off of all
>> its tags, js, css etc and pageId is the Url for that page. When I'm
>> using this for English pages its working perfectly fine but the
>> problem comes up when I'm trying to index some non-english pages.
>>
>
> Maybe you're constructing a string without specifying the encoding, so
> Java uses your default platform encoding?
>
> String(byte[] bytes)
>  Constructs a new String by decoding the specified array of
>  bytes using the platform's default charset.
>
> String(byte[] bytes, Charset charset)
>  Constructs a new String by decoding the specified array of bytes using
>  the specified charset.
>
>  Now what I did is just extracted the raw text from that html page and
>> manually created an xml page like this
>>
>> 
>> 
>>  
>>UTF2TEST
>>Test with some UTF-8 encoded characters
>>*some tamil unicode text here*
>>   
>> 
>>
>> and posted this from command line using the post.jar file. Now searching
>> gives me the result but unlike last time browser shows the indexed text in
>> tamil itself and not the raw unicode.
>>
>
> Now that's perfect, isn't it?
>
>  I tried doing something like this also,
>>
>
>  // Encode in Unicode UTF-8
>>  utfEncodedText = new String(rawText.getBytes("UTF-8"));
>>
>> but even this didn't help eighter.
>>
>
> No encoding specified, so the default platform encoding is used, which
> is likely not what you want. Consider the following example:
>
> package milu;
> import java.nio.charset.Charset;
> public class StringAndCharset {
>  public static void main(String[] args) {
>byte[] bytes = { 'K', (byte) 195, (byte) 164, 's', 'e' };
>System.out.println(Charset.defaultCharset().displayName());
>System.out.println(new String(bytes));
>System.out.println(new String(bytes,  Charset.forName("UTF-8")));
>  }
> }
>
> Output:
>
> windows-1252
> Käse (bad)
> Käse (good)
>
> Michael Ludwig
>


How to iterate the solrdocumentlist result

2009-05-03 Thread ahmed baseet
Hi All,
I'm able to get the whole result bundle by using the following method,

   QueryResponse qr = server.query(query);

SolrDocumentList sdl = qr.getResults();

but I'm not able to iterate over the results. I converted this to string and
displayed that and that is a full result bundle, I think its in XML.
Actually I want to display the result in a browser and each one separately,
I mean not as a bundle. There must be some standard methods for this, right
? Can some one give me some pointers in this regard... I'm trying to
integrate the java method calls withing html code itself[ the solr server is
on my box, and I want to do the testing on my box, so I want to access the
indexer from my local box's browser only]. Any good ideas on this?

Thanks,
Ahmed.


org.apache.*.*.... class not found exception in Internet Explorer

2009-05-03 Thread ahmed baseet
Hi,
I'm trying to query solr indexer thru a web page and trying to display the
result. I've the following class to query solr [I'm using the query(string)
method],

import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrDocument;
import java.util.Map;
import java.util.Iterator;
import java.util.List;
import java.util.ArrayList;
import java.util.HashMap;

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.FacetField;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;


public class SolrjTest
{
public String query(String q)
{
CommonsHttpSolrServer server = null;

try
{
server = new CommonsHttpSolrServer("http://localhost:8983/solr/
");
}
catch(Exception e)
{
e.printStackTrace();
}

SolrQuery query = new SolrQuery();
query.setQuery(q);
query.setQueryType("dismax");
query.setIncludeScore(true);

try
{
QueryResponse qr = server.query(query);

SolrDocumentList sdl = qr.getResults();

//System.out.println("Found: " + sdl.getNumFound());
//System.out.println("Start: " + sdl.getStart());
//System.out.println("Max Score: " + sdl.getMaxScore());
//System.out.println("");
//System.out.println("Result doc : " + sdl);

return sdl.toString();

}
catch (SolrServerException e)
{
e.printStackTrace();
return null;
}


}

  }


and the following to pass the queries to solr, get results and display it on
the browser,

import java.applet.Applet;
import java.awt.Graphics;
import java.awt.Font;

public class ControlJava extends Applet {
Font f = new Font("TimesRoman", Font.BOLD, 20);
String Message;

public void init() {
  Message = new String("ubuntu");
}

public void SetMessage(String MsgText) {
SolrjTest solr = new SolrjTest();
Message = solr.query(MsgText);

   repaint();
}

public void paint(Graphics g) {
  g.setFont(f);
  g.drawString(Message, 15, 50);
  }
}

and finally the html page is this,



Control a Java Applet


Control a Java Applet

The Java applet below displays text in a large font. You can enter
new text to display in the form below, and JavaScript will call the
Java applet to change the text.









End of page.



When I'm trying to access this page and putting the query in the box that
this html shows, the browser [IE] gives some error and after checking I
found that the error is some class not found exception, its not able to find
the org.apache.*.* classes and hence giving errors. Now instead of
calling that I wrote a simpe class not using any apache.solr classes and
called the method therein [just returns a string] and it worked fine. I
added both both the classes [.class files] given above to the same location
where this web page resides.
The problem is that browser is not able to find those org.apache.*** classes
and creating the mess. Can anyone help this newbie fixing the problem.
Thanks a lot.
Do let me know if some information is missing/want some extra information on
this issue.

--Ahmed.


Re: org.apache.*.*.... class not found exception in Internet Explorer

2009-05-03 Thread ahmed baseet
Missed some information.
I'm working on Windows XP and my class path is this,

.;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox
download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar


Thanks,
Ahmed.

On Mon, May 4, 2009 at 12:10 PM, ahmed baseet wrote:

> Hi,
> I'm trying to query solr indexer thru a web page and trying to display the
> result. I've the following class to query solr [I'm using the query(string)
> method],
>
> import org.apache.solr.common.SolrDocumentList;
> import org.apache.solr.common.SolrDocument;
> import java.util.Map;
> import java.util.Iterator;
> import java.util.List;
> import java.util.ArrayList;
> import java.util.HashMap;
>
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.SolrQuery;
> import org.apache.solr.client.solrj.response.QueryResponse;
> import org.apache.solr.client.solrj.response.FacetField;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
>
>
> public class SolrjTest
> {
> public String query(String q)
> {
> CommonsHttpSolrServer server = null;
>
> try
> {
> server = new CommonsHttpSolrServer("
> http://localhost:8983/solr/";);
> }
> catch(Exception e)
> {
> e.printStackTrace();
> }
>
> SolrQuery query = new SolrQuery();
> query.setQuery(q);
> query.setQueryType("dismax");
> query.setIncludeScore(true);
>
> try
> {
> QueryResponse qr = server.query(query);
>
> SolrDocumentList sdl = qr.getResults();
>
> //System.out.println("Found: " + sdl.getNumFound());
> //System.out.println("Start: " + sdl.getStart());
> //System.out.println("Max Score: " + sdl.getMaxScore());
> //System.out.println("");
> //System.out.println("Result doc : " + sdl);
>
> return sdl.toString();
>
> }
> catch (SolrServerException e)
> {
> e.printStackTrace();
> return null;
> }
>
>
> }
>
>   }
>
>
> and the following to pass the queries to solr, get results and display it
> on the browser,
>
> import java.applet.Applet;
> import java.awt.Graphics;
> import java.awt.Font;
>
> public class ControlJava extends Applet {
> Font f = new Font("TimesRoman", Font.BOLD, 20);
> String Message;
>
> public void init() {
>   Message = new String("ubuntu");
> }
>
> public void SetMessage(String MsgText) {
> SolrjTest solr = new SolrjTest();
> Message = solr.query(MsgText);
>
>repaint();
> }
>
> public void paint(Graphics g) {
>   g.setFont(f);
>   g.drawString(Message, 15, 50);
>   }
> }
>
> and finally the html page is this,
>
> 
> 
> Control a Java Applet
> 
> 
> Control a Java Applet
> 
> The Java applet below displays text in a large font. You can enter
> new text to display in the form below, and JavaScript will call the
> Java applet to change the text.
> 
> 
> 
>  onClick="document.ControlJava.SetMessage(document.form1.text1.value);">
> 
> 
> 
> 
> 
> End of page.
> 
> 
>
> When I'm trying to access this page and putting the query in the box that
> this html shows, the browser [IE] gives some error and after checking I
> found that the error is some class not found exception, its not able to find
> the org.apache.*.* classes and hence giving errors. Now instead of
> calling that I wrote a simpe class not using any apache.solr classes and
> called the method therein [just returns a string] and it worked fine. I
> added both both the classes [.class files] given above to the same location
> where this web page resides.
> The problem is that browser is not able to find those org.apache.***
> classes and creating the mess. Can anyone help this newbie fixing the
> problem. Thanks a lot.
> Do let me know if some information is missing/want some extra information
> on this issue.
>
> --Ahmed.
>
>
>


Does solrj return result in XML format? If not then how to make it do that.

2009-05-04 Thread ahmed baseet
Can we get the results as received by Solrj in XML format? If yes how to do
that. I think there must be some way to make solrj returns results in XML
format.
I need some pointers in this direction. As I know solrs returns the result
in solrdocument format that we've to iterate to extract the fields. Thank
you.

--Ahmed.


Re: Update an existing Solr Index

2009-05-04 Thread ahmed baseet
As I know when you resend another index request with some old ID, old field
but new content, the old one gets overwritten by the new one.
@solr-users, Views???

--Ahmed


On Mon, May 4, 2009 at 5:26 PM, appleman1982 wrote:

>
> Hi All,
> I have a requirement wherein i want to update an existing index in solr.
> For example : I have issued an index command in solr as
> 
> 
> 123
> xxx
> 
> 
>
> The id field is a unique key here.
>
> My requirement is that i should be able to update this inex i.e add another
> field to it without the need to build the entire index again.
> For example
> if i issue the following solr command
> 
> 
> 123
> delhi
> 
> 
>
> it should give me a merged index like
> 
> 
> 123
> xxx
> delhi
> 
> 
>
>
> Any pointers or workarounds to achieve this in solr would be highly
> appreciated.
>
> Thanks, Jugesh
> --
> View this message in context:
> http://www.nabble.com/Update-an-existing-Solr-Index-tp23366705p23366705.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Does solrj return result in XML format? If not then how to make it do that.

2009-05-04 Thread ahmed baseet
As I know when we query solr from solr admin interface we get back the
results in xml format, so thought there must be something similar for solrj
as well, which I'll make to go thru an xml parser at the other end and
display all the results in the browser. Otherwise I've to iterate the
solrdocumentlist and create a list[may be] to put the results and return it
back to the browser which will handle displaying that list/map etc.

--Ahmed.



On Mon, May 4, 2009 at 5:52 PM, Erik Hatcher wrote:

> Just out of curiosity, what's the use case for getting the result back in
> XML from SolrJ?
>
>Erik
>
>
> On May 4, 2009, at 8:13 AM, ahmed baseet wrote:
>
>  Can we get the results as received by Solrj in XML format? If yes how to
>> do
>> that. I think there must be some way to make solrj returns results in XML
>> format.
>> I need some pointers in this direction. As I know solrs returns the result
>> in solrdocument format that we've to iterate to extract the fields. Thank
>> you.
>>
>> --Ahmed.
>>
>
>


run on reboot on windows

2010-05-01 Thread S Ahmed
Hi,

I'm trying to get Solr to run on windows, such that if it reboots the Solr
service will be running.

How can I do this?


Re: run on reboot on windows

2010-05-02 Thread S Ahmed
By default it uses Jetty, so your saying Tomcat on windows server 2008/ IIS7
runs as a native windows service?

On Sun, May 2, 2010 at 12:46 AM, Dave Searle wrote:

> Set tomcat6 service to auto start on boot (if running tomat)
>
> Sent from my iPhone
>
> On 2 May 2010, at 02:31, "S Ahmed"  wrote:
>
> > Hi,
> >
> > I'm trying to get Solr to run on windows, such that if it reboots
> > the Solr
> > service will be running.
> >
> > How can I do this?
>


Re: run on reboot on windows

2010-05-02 Thread S Ahmed
its not tomcat/jetty that's the issue, its how to get things to re-start on
a windows server (tomcat and jetty don't run as native windows services) so
I am a little confused..thanks.

On Sun, May 2, 2010 at 7:37 PM, caman wrote:

>
> Ahmed,
>
>
>
> Best is if you take a look at the documentation of jetty or tomcat. SOLR
> can
> run on any web container, it's up to you how you  configure your web
> container to run
>
>
>
> Thanks
>
> Aboxy
>
>
>
>
>
>
>
>
>
>
>
> From: S Ahmed [via Lucene]
> [mailto:ml-node+772174-2097041460-124...@n3.nabble.com
> ]
> Sent: Sunday, May 02, 2010 4:33 PM
> To: caman
> Subject: Re: run on reboot on windows
>
>
>
> By default it uses Jetty, so your saying Tomcat on windows server 2008/
> IIS7
>
> runs as a native windows service?
>
> On Sun, May 2, 2010 at 12:46 AM, Dave Searle <[hidden email]>wrote:
>
>
> > Set tomcat6 service to auto start on boot (if running tomat)
> >
> > Sent from my iPhone
> >
> > On 2 May 2010, at 02:31, "S Ahmed" <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > I'm trying to get Solr to run on windows, such that if it reboots
> > > the Solr
> > > service will be running.
> > >
> > > How can I do this?
> >
>
>
>
>   _
>
> View message @
>
> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772174.
> html
> To start a new topic under Solr - User, email
> ml-node+472068-464289649-124...@n3.nabble.com
> To unsubscribe from Solr - User, click
> < (link removed)
> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772178.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: run on reboot on windows

2010-05-02 Thread S Ahmed
Thanks, for some reason I was looking for a solution outside of
jetty/tomcat, when that was the obvious way to get things restarted :)

On Sun, May 2, 2010 at 7:53 PM, Dave Searle wrote:

> Tomcat is installed as a service on windows. Just go into service
> control panel and set startup type to automatic
>
> Sent from my iPhone
>
> On 3 May 2010, at 00:43, "S Ahmed"  wrote:
>
> > its not tomcat/jetty that's the issue, its how to get things to re-
> > start on
> > a windows server (tomcat and jetty don't run as native windows
> > services) so
> > I am a little confused..thanks.
> >
> > On Sun, May 2, 2010 at 7:37 PM, caman
> > wrote:
> >
> >>
> >> Ahmed,
> >>
> >>
> >>
> >> Best is if you take a look at the documentation of jetty or tomcat.
> >> SOLR
> >> can
> >> run on any web container, it's up to you how you  configure your web
> >> container to run
> >>
> >>
> >>
> >> Thanks
> >>
> >> Aboxy
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> From: S Ahmed [via Lucene]
> >> [mailto:ml-node+772174-2097041460-124...@n3.nabble.com
>  >> %2b772174-2097041460-124...@n3.nabble.com>
> >> ]
> >> Sent: Sunday, May 02, 2010 4:33 PM
> >> To: caman
> >> Subject: Re: run on reboot on windows
> >>
> >>
> >>
> >> By default it uses Jetty, so your saying Tomcat on windows server
> >> 2008/
> >> IIS7
> >>
> >> runs as a native windows service?
> >>
> >> On Sun, May 2, 2010 at 12:46 AM, Dave Searle <[hidden email]>wrote:
> >>
> >>
> >>> Set tomcat6 service to auto start on boot (if running tomat)
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On 2 May 2010, at 02:31, "S Ahmed" <[hidden email]> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I'm trying to get Solr to run on windows, such that if it reboots
> >>>> the Solr
> >>>> service will be running.
> >>>>
> >>>> How can I do this?
> >>>
> >>
> >>
> >>
> >>  _
> >>
> >> View message @
> >>
> >>
> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772174
> >> .
> >> html
> >> To start a new topic under Solr - User, email
> >> ml-node+472068-464289649-124...@n3.nabble.com
>  >> %2b472068-464289649-124...@n3.nabble.com>
> >> To unsubscribe from Solr - User, click
> >> < (link removed)
> >> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772178.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>


How to add solr admin ui

2018-08-22 Thread Ahmed Musallam
Hi,

I'd like to build a UI plugin for solr, I can see all the ui related assets
in `/server/solr-webapp` but is there a way to add UI plugins without
modifying the ui assets under `/server/solr-webapp`?

by plugin, I mean some way I can add a some form of UI to the admin UI, and
even better, make is specific to a certain core.

any tutorials or documentation would be greatly appreciated!

Thanks!
Ahmed


Code review for SOLR related changes.

2019-03-01 Thread Fiz Ahmed
Hi Solr Experts,

Can you please suggest Code review techniques for SOLR related changes in a
Project.


Thanks
FIZ
AML Team.


Issue with solr.HTMLStripCharFilterFactory

2018-01-19 Thread Fiz Ahmed
Hi Solr Experts,

I am using the HTMLStripCharFilterFactory for removing  tags in Body
element.

Body contains data like Ipad

I made changes in managed schema .







---


 

  



















  

  



















  




I restarted the Solr and Indexed again.


But When I Query in Solr Admin.. I am still getting the Search results with
Html Tags in it.



"body":"Practically everytime I log onto Mogran, suddenly I see it
running


*Please let me know what will be the Issue…Am I Missing anything.*


Thanks

Fiz..


Faceting with Stats

2019-07-02 Thread Ahmed Adel
Hi,

How can stats field value be calculated for top facet values? In other
words, the following request parameters should return the stats.field
measures for facets sorted by count:

q: *
wt: json
stats: true
stats.facet: authors_s
stats.field: average_rating_f
facet.missing: true
f.authors_s.facet.sort: count

However, the response is not sorted by facet field count. Is there
something missing?

Best,
A.


Re: Faceting with Stats

2019-07-03 Thread Ahmed Adel
Hi,

As per the documentation recommendation of using pivot with stats component
instead (
https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots),
replacing the stats options that were previously used with the newer pivot
options as follows:

q: *
stats=true
stats.field={!tag=piv1 mean=true}average_rating_f
facet=true
facet.pivot={!stats=piv1}author_s

returns the following error:

Bad Message 400
reason: Illegal character SPACE=' '

This is a syntax issue rather than a logical one, however. Any thoughts of
what could be missing would be appreciated.

Thanks,
A. Adel

On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:

> Hi,
>
> How can stats field value be calculated for top facet values? In other
> words, the following request parameters should return the stats.field
> measures for facets sorted by count:
>
> q: *
> wt: json
> stats: true
> stats.facet: authors_s
> stats.field: average_rating_f
> facet.missing: true
> f.authors_s.facet.sort: count
>
> However, the response is not sorted by facet field count. Is there
> something missing?
>
> Best,
> A.
>


Re: Faceting with Stats

2019-07-04 Thread Ahmed Adel
Hi,

As per the documentation recommendation of using pivot with stats component
instead (
https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots),
replacing the stats options that were previously used with the newer pivot
options as follows:

q: *
stats=true
stats.field={!tag=piv1 mean=true}average_rating_f
facet=true
facet.pivot={!stats=piv1}author_s

returns the following error:

Bad Message 400
reason: Illegal character SPACE=' '

This is a syntax issue rather than a logical one, however. Any thoughts of
what could be missing would be appreciated.

Thanks,
A. Adel

On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:

> Hi,
>
> How can stats field value be calculated for top facet values? In other
> words, the following request parameters should return the stats.field
> measures for facets sorted by count:
>
> q: *
> wt: json
> stats: true
> stats.facet: authors_s
> stats.field: average_rating_f
> facet.missing: true
> f.authors_s.facet.sort: count
>
> However, the response is not sorted by facet field count. Is there
> something missing?
>
> Best,
> A.
>


Re: Faceting with Stats

2019-07-04 Thread Ahmed Adel
Thanks for your reply! Yes, it turned out to be an issue with the way the
request was being sent, which was cURL that required special handling and
escaping of spaces and special characters. Using another client cleared
this issue and the request below worked perfectly now.

Best,
A.

On Thu, Jul 4, 2019 at 4:53 PM Erick Erickson 
wrote:

> Might be a formatting error with my mail client, but the very first line
> is not well formed.
>
> q: * is incorrect
>
> q=*:*
>
>
>
> I do not see that example on the page either. Looks like you took the bit
> that starts with stats=true and mis-typed the q clause.
>
> Best,
> Erick
> > On Jul 3, 2019, at 5:08 AM, Ahmed Adel  wrote:
> >
> > Hi,
> >
> > As per the documentation recommendation of using pivot with stats
> component
> > instead (
> >
> https://lucene.apache.org/solr/guide/8_1/faceting.html#combining-stats-component-with-pivots
> ),
> > replacing the stats options that were previously used with the newer
> pivot
> > options as follows:
> >
> > q: *
> > stats=true
> > stats.field={!tag=piv1 mean=true}average_rating_f
> > facet=true
> > facet.pivot={!stats=piv1}author_s
> >
> > returns the following error:
> >
> > Bad Message 400
> > reason: Illegal character SPACE=' '
> >
> > This is a syntax issue rather than a logical one, however. Any thoughts
> of
> > what could be missing would be appreciated.
> >
> > Thanks,
> > A. Adel
> >
> > On Tue, Jul 2, 2019 at 4:38 PM Ahmed Adel  wrote:
> >
> >> Hi,
> >>
> >> How can stats field value be calculated for top facet values? In other
> >> words, the following request parameters should return the stats.field
> >> measures for facets sorted by count:
> >>
> >> q: *
> >> wt: json
> >> stats: true
> >> stats.facet: authors_s
> >> stats.field: average_rating_f
> >> facet.missing: true
> >> f.authors_s.facet.sort: count
> >>
> >> However, the response is not sorted by facet field count. Is there
> >> something missing?
> >>
> >> Best,
> >> A.
> >>
>
> --
Sent from my iPhone


Returning multiple fields in graph streaming expression response documents

2019-07-16 Thread Ahmed Adel
Hi,

How can multiple fields be returned in graph traversal streaming expression
response documents? For example, the following query:

nodes(emails,
  walk="john...@apache.org->from",
  gather="to")


returns these documents in the response:

{
  "result-set": {
"docs": [
  {
"node": "sl...@campbell.com",
"collection": "emails",
"field": "to",
"level": 1
  },
  {
"node": "catherine.per...@enron.com",
"collection": "emails",
"field": "to",
"level": 1
  },
  {
"node": "airam.arte...@enron.com",
"collection": "emails",
"field": "to",
"level": 1
  },
  {
"EOF": true,
"RESPONSE_TIME": 44
  }
]
  }
}

How can the query above be modified to return more document fields,
"subject" for example?

Best regards,

A.


Re: Returning multiple fields in graph streaming expression response documents

2019-07-17 Thread Ahmed Adel
Hi,

Thank you for your reply. Could you give more details on the „join“
operation, such as what the sides of the join and the joining condition
would be in this case?

Best regards,
A.

On Tue, Jul 16, 2019 at 2:02 PM markus kalkbrenner <
markus.kalkbren...@biologis.com> wrote:

>
>
> You have to perform a „join“ to get more fields.
>
> > Am 16.07.2019 um 13:52 schrieb Ahmed Adel :
> >
> > Hi,
> >
> > How can multiple fields be returned in graph traversal streaming
> expression
> > response documents? For example, the following query:
> >
> > nodes(emails,
> >  walk="john...@apache.org->from",
> >  gather="to")
> >
> >
> > returns these documents in the response:
> >
> > {
> >  "result-set": {
> >"docs": [
> >  {
> >"node": "sl...@campbell.com",
> >"collection": "emails",
> >"field": "to",
> >"level": 1
> >  },
> >  {
> >"node": "catherine.per...@enron.com",
> >"collection": "emails",
> >"field": "to",
> >"level": 1
> >  },
> >  {
> >"node": "airam.arte...@enron.com",
> >"collection": "emails",
> >"field": "to",
> >"level": 1
> >  },
> >  {
> >"EOF": true,
> >"RESPONSE_TIME": 44
> >  }
> >]
> >  }
> > }
> >
> > How can the query above be modified to return more document fields,
> > "subject" for example?
> >
> > Best regards,
> >
> > A.
>


Re: Returning multiple fields in graph streaming expression response documents

2019-07-19 Thread Ahmed Adel
Hi Joel,

Thank you for your thoughts. I tried the fetch function, however, the
response does not contain "fl" fields of the "fetch" expression. For the
above example, the modified query is as follows:

fetch(names, select(nodes(emails,
  walk="john...@apache.org->from",
  gather="to"), node as to_s), fl="name", on="email=to_s")


where "names" is a collection that contains two fields representing pairs
of name and email: ("name", "email")

The response returned is:

{ "result-set": { "docs": [ { "to_s": "john...@apache.org"
}, { "to_s": "johnsm...@apache.org"
},
... { "EOF": true, "RESPONSE_TIME": 33 } ] } }

The response should have an additional "name" field in each document
returned. Any additional thoughts are appreciated.

Best,
A.

On Thu, Jul 18, 2019 at 6:12 PM Joel Bernstein  wrote:

> Hi Ahmed,
>
> Take a look at the fetch
>
> https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch
>
> It probably makes sense to allow more field to be returned from a nodes
> expression as well.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jul 17, 2019 at 3:12 AM Ahmed Adel  wrote:
>
> > Hi,
> >
> > Thank you for your reply. Could you give more details on the „join“
> > operation, such as what the sides of the join and the joining condition
> > would be in this case?
> >
> > Best regards,
> > A.
> >
> > On Tue, Jul 16, 2019 at 2:02 PM markus kalkbrenner <
> > markus.kalkbren...@biologis.com> wrote:
> >
> > >
> > >
> > > You have to perform a „join“ to get more fields.
> > >
> > > > Am 16.07.2019 um 13:52 schrieb Ahmed Adel :
> > > >
> > > > Hi,
> > > >
> > > > How can multiple fields be returned in graph traversal streaming
> > > expression
> > > > response documents? For example, the following query:
> > > >
> > > > nodes(emails,
> > > >  walk="john...@apache.org->from",
> > > >  gather="to")
> > > >
> > > >
> > > > returns these documents in the response:
> > > >
> > > > {
> > > >  "result-set": {
> > > >"docs": [
> > > >  {
> > > >"node": "sl...@campbell.com",
> > > >"collection": "emails",
> > > >"field": "to",
> > > >"level": 1
> > > >  },
> > > >  {
> > > >"node": "catherine.per...@enron.com",
> > > >"collection": "emails",
> > > >"field": "to",
> > > >"level": 1
> > > >  },
> > > >  {
> > > >"node": "airam.arte...@enron.com",
> > > >"collection": "emails",
> > > >"field": "to",
> > > >"level": 1
> > > >  },
> > > >  {
> > > >"EOF": true,
> > > >"RESPONSE_TIME": 44
> > > >  }
> > > >]
> > > >  }
> > > > }
> > > >
> > > > How can the query above be modified to return more document fields,
> > > > "subject" for example?
> > > >
> > > > Best regards,
> > > >
> > > > A.
> > >
> >
>


Re: Returning multiple fields in graph streaming expression response documents

2019-07-19 Thread Ahmed Adel
Hi - Tried swapping the equality sides but (surprisingly?) got the same
exact response. Any additional thoughts are appreciated.

Best,
A.
http://aadel.io

On Fri, Jul 19, 2019 at 5:27 PM Joel Bernstein  wrote:

> Try:
>
> fetch(names,
>  select(
>  nodes(emails,
>  walk="john...@apache.org->from",
>  gather="to"),
>  node as to_s),
>  fl="name",
> on="to_s=email")
>
>
> According to the docs it looks like you have the fields reversed on the
> fetch. If that doesn't work, I'll investigate further.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Jul 19, 2019 at 5:51 AM Ahmed Adel  wrote:
>
> > Hi Joel,
> >
> > Thank you for your thoughts. I tried the fetch function, however, the
> > response does not contain "fl" fields of the "fetch" expression. For the
> > above example, the modified query is as follows:
> >
> > fetch(names, select(nodes(emails,
> >   walk="john...@apache.org->from",
> >   gather="to"), node as to_s), fl="name", on="email=to_s")
> >
> >
> > where "names" is a collection that contains two fields representing pairs
> > of name and email: ("name", "email")
> >
> > The response returned is:
> >
> > { "result-set": { "docs": [ { "to_s": "john...@apache.org"
> > }, { "to_s": "johnsm...@apache.org"
> > },
> > ... { "EOF": true, "RESPONSE_TIME": 33 } ] } }
> >
> > The response should have an additional "name" field in each document
> > returned. Any additional thoughts are appreciated.
> >
> > Best,
> > A.
> >
> > On Thu, Jul 18, 2019 at 6:12 PM Joel Bernstein 
> wrote:
> >
> > > Hi Ahmed,
> > >
> > > Take a look at the fetch
> > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch
> > >
> > > It probably makes sense to allow more field to be returned from a nodes
> > > expression as well.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Wed, Jul 17, 2019 at 3:12 AM Ahmed Adel  wrote:
> > >
> > > > Hi,
> > > >
> > > > Thank you for your reply. Could you give more details on the „join“
> > > > operation, such as what the sides of the join and the joining
> condition
> > > > would be in this case?
> > > >
> > > > Best regards,
> > > > A.
> > > >
> > > > On Tue, Jul 16, 2019 at 2:02 PM markus kalkbrenner <
> > > > markus.kalkbren...@biologis.com> wrote:
> > > >
> > > > >
> > > > >
> > > > > You have to perform a „join“ to get more fields.
> > > > >
> > > > > > Am 16.07.2019 um 13:52 schrieb Ahmed Adel :
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > How can multiple fields be returned in graph traversal streaming
> > > > > expression
> > > > > > response documents? For example, the following query:
> > > > > >
> > > > > > nodes(emails,
> > > > > >  walk="john...@apache.org->from",
> > > > > >  gather="to")
> > > > > >
> > > > > >
> > > > > > returns these documents in the response:
> > > > > >
> > > > > > {
> > > > > >  "result-set": {
> > > > > >"docs": [
> > > > > >  {
> > > > > >"node": "sl...@campbell.com",
> > > > > >"collection": "emails",
> > > > > >"field": "to",
> > > > > >"level": 1
> > > > > >  },
> > > > > >  {
> > > > > >"node": "catherine.per...@enron.com",
> > > > > >"collection": "emails",
> > > > > >"field": "to",
> > > > > >"level": 1
> > > > > >  },
> > > > > >  {
> > > > > >"node": "airam.arte...@enron.com",
> > > > > >"collection": "emails",
> > > > > >"field": "to",
> > > > > >"level": 1
> > > > > >  },
> > > > > >  {
> > > > > >"EOF": true,
> > > > > >"RESPONSE_TIME": 44
> > > > > >  }
> > > > > >]
> > > > > >  }
> > > > > > }
> > > > > >
> > > > > > How can the query above be modified to return more document
> fields,
> > > > > > "subject" for example?
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > A.
> > > > >
> > > >
> > >
> >
>
-- 
Sent from my iPhone


Re: Returning multiple fields in graph streaming expression response documents

2019-07-20 Thread Ahmed Adel
To validate this, I indexed the datasets and ran the same query on Solr
6.5.0 environment (https://archive.apache.org/dist/lucene/solr/6.5.0/)
before cb9f15 commit gets into release but got the same response, no
additional fields, as Solr 8.1.1. I have used the default managed schema
settings in both Solr versions, which I guess means qparser is not used for
/select in this case, is it?

On Sat, Jul 20, 2019 at 2:02 AM Joel Bernstein  wrote:

> I suspect fetch is having problem due to this commit:
>
>
> https://github.com/apache/lucene-solr/commit/cb9f151db4b5ad5c5f581b6b8cf2e5916ddb0f35#diff-98abfc8855d347035205c6f3afc2cde3
>
> Later local params were turned off for anything but the lucene qparser.
> Which means this query doesn't work if /select is using edismax etc...
>
> This needs to be fixed.
> Can you check to see if the qparser is for the /select handler on your
> install?
>
> Anyway fetch needs to be reverted back to it's previous implementation
> before the above commit basically broke it.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Jul 19, 2019 at 2:20 PM Ahmed Adel  wrote:
>
> > Hi - Tried swapping the equality sides but (surprisingly?) got the same
> > exact response. Any additional thoughts are appreciated.
> >
> > Best,
> > A.
> > http://aadel.io
> >
> > On Fri, Jul 19, 2019 at 5:27 PM Joel Bernstein 
> wrote:
> >
> > > Try:
> > >
> > > fetch(names,
> > >  select(
> > >  nodes(emails,
> > >  walk="john...@apache.org->from",
> > >  gather="to"),
> > >  node as to_s),
> > >  fl="name",
> > > on="to_s=email")
> > >
> > >
> > > According to the docs it looks like you have the fields reversed on the
> > > fetch. If that doesn't work, I'll investigate further.
> > >
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Fri, Jul 19, 2019 at 5:51 AM Ahmed Adel  wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > Thank you for your thoughts. I tried the fetch function, however, the
> > > > response does not contain "fl" fields of the "fetch" expression. For
> > the
> > > > above example, the modified query is as follows:
> > > >
> > > > fetch(names, select(nodes(emails,
> > > >   walk="john...@apache.org->from",
> > > >   gather="to"), node as to_s), fl="name", on="email=to_s")
> > > >
> > > >
> > > > where "names" is a collection that contains two fields representing
> > pairs
> > > > of name and email: ("name", "email")
> > > >
> > > > The response returned is:
> > > >
> > > > { "result-set": { "docs": [ { "to_s": "john...@apache.org"
> > > > }, { "to_s": "johnsm...@apache.org"
> > > > },
> > > > ... { "EOF": true, "RESPONSE_TIME": 33 } ] } }
> > > >
> > > > The response should have an additional "name" field in each document
> > > > returned. Any additional thoughts are appreciated.
> > > >
> > > > Best,
> > > > A.
> > > >
> > > > On Thu, Jul 18, 2019 at 6:12 PM Joel Bernstein 
> > > wrote:
> > > >
> > > > > Hi Ahmed,
> > > > >
> > > > > Take a look at the fetch
> > > > >
> > > > >
> > > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch
> > > > >
> > > > > It probably makes sense to allow more field to be returned from a
> > nodes
> > > > > expression as well.
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > >
> > > > > On Wed, Jul 17, 2019 at 3:12 AM Ahmed Adel 
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thank you for your reply. Could you give more details on the
> „join“
> > > > > > operation, such as what the sides of the join and the joining
> >

Re: Returning multiple fields in graph streaming expression response documents

2019-07-21 Thread Ahmed Adel
Yeah, it turned out to be related to the data. The “fetch” method works
fine as you described, it’s just the data distribution that caused name
field not to be fetched in a number of responses. I tested it with two
other collections and it worked as expected as well. Thank you for your
help getting this running.

Best,
A. Adel

On Sun, Jul 21, 2019 at 2:36 AM Joel Bernstein  wrote:

> Ok, then it sounds like a different issue. Let's look at the logs following
> a request and see what the issue is. There will be a log record that shows
> the query that is sent to Solr by the fetch expression. When we look at
> that log we'll be able to see what the query is, and if results are
> returned. It could be a bug in the code or it could be something related to
> the data that's being fetched.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Sat, Jul 20, 2019 at 5:21 PM Ahmed Adel  wrote:
>
> > To validate this, I indexed the datasets and ran the same query on Solr
> > 6.5.0 environment (https://archive.apache.org/dist/lucene/solr/6.5.0/)
> > before cb9f15 commit gets into release but got the same response, no
> > additional fields, as Solr 8.1.1. I have used the default managed schema
> > settings in both Solr versions, which I guess means qparser is not used
> for
> > /select in this case, is it?
> >
> > On Sat, Jul 20, 2019 at 2:02 AM Joel Bernstein 
> wrote:
> >
> > > I suspect fetch is having problem due to this commit:
> > >
> > >
> > >
> >
> https://github.com/apache/lucene-solr/commit/cb9f151db4b5ad5c5f581b6b8cf2e5916ddb0f35#diff-98abfc8855d347035205c6f3afc2cde3
> > >
> > > Later local params were turned off for anything but the lucene qparser.
> > > Which means this query doesn't work if /select is using edismax etc...
> > >
> > > This needs to be fixed.
> > > Can you check to see if the qparser is for the /select handler on your
> > > install?
> > >
> > > Anyway fetch needs to be reverted back to it's previous implementation
> > > before the above commit basically broke it.
> > >
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Fri, Jul 19, 2019 at 2:20 PM Ahmed Adel  wrote:
> > >
> > > > Hi - Tried swapping the equality sides but (surprisingly?) got the
> same
> > > > exact response. Any additional thoughts are appreciated.
> > > >
> > > > Best,
> > > > A.
> > > > http://aadel.io
> > > >
> > > > On Fri, Jul 19, 2019 at 5:27 PM Joel Bernstein 
> > > wrote:
> > > >
> > > > > Try:
> > > > >
> > > > > fetch(names,
> > > > >  select(
> > > > >  nodes(emails,
> > > > >  walk="john...@apache.org->from",
> > > > >  gather="to"),
> > > > >  node as to_s),
> > > > >  fl="name",
> > > > > on="to_s=email")
> > > > >
> > > > >
> > > > > According to the docs it looks like you have the fields reversed on
> > the
> > > > > fetch. If that doesn't work, I'll investigate further.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > >
> > > > > On Fri, Jul 19, 2019 at 5:51 AM Ahmed Adel 
> > wrote:
> > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > Thank you for your thoughts. I tried the fetch function, however,
> > the
> > > > > > response does not contain "fl" fields of the "fetch" expression.
> > For
> > > > the
> > > > > > above example, the modified query is as follows:
> > > > > >
> > > > > > fetch(names, select(nodes(emails,
> > > > > >   walk="john...@apache.org->from",
> > > > > >   gather="to"), node as to_s), fl="name", on="email=to_s")
> > > > > >
> > > > > >
> > > > > > where "names" is a collection that contains two fields
> representing
> > > > pairs
> > > > > > of name and 

Returning multiple fields in /graph streaming expression response

2019-07-22 Thread Ahmed Adel
Hi,

Similar to this question (
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201907.mbox/browser),
how can additional fields be returned when using /graph request handler?

For example, from the documentation, for the request:

nodes(enron_emails,
nodes(enron_emails,
walk="kayne.coul...@enron.com->from",
  trackTraversal="true",
gather="to"),  walk="node->from",
scatter="leaves,branches",
 trackTraversal="true",
  gather="to")


is there a way to add more fields to the response:

http://graphml.graphdrawing.org/xmlns"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd";>
 
   node
   0
   0.0
 
 
   to
   1
   1.0
 
 
 
   to
   1
   1.0



  to
  1
  1.0




Best,
A. Adel
http://aadel.io


Re: Returning multiple fields in /graph streaming expression response

2019-07-23 Thread Ahmed Adel
Wrapping the expression in a fetch function as follows works:

fetch(names, select(nodes(enron_emails,
  nodes(enron_emails,
walk="kayne.coul...@enron.com->from",
  trackTraversal="true",
gather="to"),  walk="node->from",
scatter="leaves,branches",
 trackTraversal="true",
  gather="to"), node as from), fl="name", on="from")


however, the response loses some of its structure and no edges are
returned, i.e. it becomes:



http://graphml.graphdrawing.org/xmlns";

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";

xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd";>



  

kayne.coul...@enron.com

Kayne Coulter

  

  

randal.maff...@enron.com

Randal Maffett

  

...



which can be used as a subsequent request to the first in order to retrieve
additional fields but it would be more efficient if there's a way to
retrieve the required fields in one request.

Best,
A. Adel

On Mon, Jul 22, 2019 at 4:00 PM Ahmed Adel  wrote:

> Hi,
>
> Similar to this question (
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201907.mbox/browser),
> how can additional fields be returned when using /graph request handler?
>
> For example, from the documentation, for the request:
>
> nodes(enron_emails,  nodes(enron_emails,  
>   walk="kayne.coul...@enron.com->from",   
>  trackTraversal="true",   
>  gather="to"),  
> walk="node->from",  
> scatter="leaves,branches",  
> trackTraversal="true",  gather="to")
>
>
> is there a way to add more fields to the response:
>
>  xmlns="http://graphml.graphdrawing.org/xmlns"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
>  http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd";> edgedefault="directed">
>  
>node
>0
>0.0
>  
>  
>to
>1
>1.0
>  
>   target="don.baugh...@enron.com"/>
>  
>to
>1
>1.0
> 
>  target="john.kin...@enron.com"/>
> 
>   to
>   1
>   1.0
> 
>  target="jay.wi...@enron.com"/>
>
>
> Best,
> A. Adel
> http://aadel.io
>


SOLR 6.6 with MS-SQL.

2019-07-24 Thread Fiz Ahmed
Hi SOLR Experts,

 We are using Apache Solr 6.6 stand-alone currently in a number
of locations.Most indexes are holding 250,000 to 400,000 documents.Our data
comes from MS-SQL.We’re using a front-end JavaScript solution to
communicate with Solr to perform queries.


   - Solr Performance


   - We have machine that are running on limited resources. Our Indexes
  (more-so Deltas) are seemingly causing system slowdowns.


   - Can we improve how our Deltas function?
 - Frequency of Deltas: How frequent should they be ideally.


   - Garbage Collection tuning:


   - How can we tune our JVM garbage collection to improve overall
 performance?


   - Java Heap Memory Settings:


   - How can we tune this to ensure the best memory usage.



Thanks & Regards
Fiz  N Coleman
AML Team.


Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

2019-10-18 Thread Ahmed Adel
This could be because Zookeeper ensemble is not properly configured. Using
a very similar setup which consists of ZK cluster of three hosts and one
Solr Cloud node (all are containers), the system got running. Each ZK host
has ZOO_MY_ID and ZOO_SERVERS environment variables set before running ZK.
In this case, the former variable value would be from 1 to 3 on each host
and the latter would be "server.1=z1:2888:3888;2181
server.2=z2:2888:3888;2181 server.3=z3:2888:3888;2181" the same on all
hosts (the double quotes may be needed for proper parsing). This
ZOO_SERVERS syntax is for ZK version 3.5. 3.4 is slightly different.

http://aadel.io

On Fri, Oct 18, 2019 at 5:28 PM Drew Kidder  wrote:

> Thank you all for your suggestions! I appreciate the fast turnaround.
>
> My setup is using Amazon ECS for our solr cloud installation. Each ZK is in
> its own container, using Route53 Service Discovery to provide the DNS name.
> The ZK nodes can all talk to each other, and I can communicate to each one
> of those nodes from my local machine and from within the solr container.
> Solr is one node per container, as Martijn correctly assumed. I am not
> using a zkRoot at present because my intention is to use ZK solely for Solr
> Cloud and nothing else.
>
> I have tried removing the "-z" option from the Dockerfile CMD and using the
> ZK_HOST environment variable (see below). I have even also modified the
> solr.in.sh and set the ZK_HOST variable there, all to no avail. I have
> tried both the Dockerfile command route, and have logged into the solr
> container and tried to run the CMD manually to see if there was a problem
> with the way I was using the CMD entry. All of those methods give me the
> same result output captured in the gist below.
>
> The gist for my solr.log output is here:
> https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087
>
> My Dockerfile for the solr container looks like this:
>
>
> FROMsolr:8.2
>
> EXPOSE8983 8999 2181
>
> VOLUME/app/logs
> VOLUME/app/data
> VOLUME/app/conf
>
> ## add our jetty configuration (increased request size!)
> COPY   jetty.xml /opt/solr/server/etc/
>
> ## SolrCloud configuration
> ENV ZK_HOST zk1:2181,zk2:2181,zk3:2181
> ENV ZK_CLIENT_TIMEOUT 3
>
> USER   root
> RUNapt-get update
> RUNapt-get install -y netcat net-tools vim procps
> USER   solr
>
> # Copy over custom solr plugins
> COPYmyplugins/src/resources/* /opt/solr/server/solr/my-resources/
> COPYlib/*.jar /opt/solr/my-lib/
>
> # Copy over my configs
> COPYconf/ /app/conf
>
> #Start solr in cloud mode, connecting to zookeeper
> CMD   ["solr","start","-f","-c"]
>
> The docker command I use to execute this Dockerfile is `docker run -p
> 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest`
>
> Output of `ps -eflww` from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww
> F S UIDPID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME
> CMD
> 4 S solr 1 0  9  80   0 - 1043842 -14:36 ?00:00:07
> /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC
> -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
> -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch
>
> -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983
> -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=3
> -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs
> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
> -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data
> -Dsolr.data.home= -Dsolr.install.dir=/opt/solr
> -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k
> -Dsolr.jetty.https.port=8983 -jar start.jar --module=http
> 4 S root90 0  0  80   0 -  4988 -  14:37 pts/000:00:00
> /bin/bash
> 0 R root9590  0  80   0 -  9595 -  14:37 pts/000:00:00
> ps -eflww
>
> Output of netstat from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# netstat
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address   Foreign Address State
> tcp0  0 fe0ad5b40b42:43678  172.20.28.179:2181
>  TIME_WAIT
> tcp0  0 fe0ad5b40b42:60164  172.20.155.241:2181
> TIME_WAIT
> tcp0  0 fe0ad5b40b42:60500  172.20.60.138:2181
>  TIME_WAIT
> Active UNIX domain sockets (w/o servers)
> Proto RefCnt Flags   Type   State I-Node   Path
> unix  2  [ ] STREAM CONNECTED 129252
> unix  2  [ ] STREAM CONNECTED 129270
>
> I'm beginning to thin

Joins and text fields projection

2019-10-20 Thread Ahmed Adel
Hi,

Is there a way to select text fields in a query with a join clause in
Streaming Expressions or Parallel SQL? The following query:

SELECT field_s, field_t FROM t1 INNER JOIN t2 ON t1.a = t2.a LIMIT 10

requires that field_t, which is of type text, have docValues enabled, which
is not supported afaik:

java.io.IOException: --> http://172.31.34.56:8983/solr/t1:Failed to execute
sqlQuery 'SELECT field_s, field_t FROM t1 INNER JOIN t2 ON t1.a = t2.a
LIMIT 10' against JDBC connection 'jdbc:calcitesolr:'. Error while
executing SQL "SELECT field_s, field_t FROM t1 INNER JOIN t2 ON t1.a = t2.a
LIMIT 10": java.io.IOException: java.util.concurrent.ExecutionException:
java.io.IOException: -->
http://172.18.0.2:8983/solr/t1_shard1_replica_n1/:field_t{type=text_general,properties=indexed,tokenized,stored,useDocValuesAsStored,uninvertible}
must have DocValues to use this feature.

Its equivalent streaming expression clearly results in the same:

innerJoin(
  search(t1, q="*:*", qt="/export", fl="a1,field_t", sort="a asc"),
  search(t2, q="*:*", qt="/export", fl="a2,field_s", sort="a asc"),
  on="a"
)

java.io.IOException: -->
http://172.31.34.56:8983/solr/reviews:java.util.concurrent.ExecutionException:
java.io.IOException: -->
http://172.18.0.2:8983/solr/t1_shard1_replica_n1/:field_t{type=text_general,properties=indexed,tokenized,stored,useDocValuesAsStored,uninvertible}
must have DocValues to use this feature.

Thanks,
A.


Re: Clustering always return labels":["Other Topics"]

2019-12-26 Thread Ahmed Adel
Hi - adding carrot.title field should resolve this issue

On Thu, Dec 19, 2019 at 2:22 AM Nehemia Litterat 
wrote:

> Hi,
> I am using stand alone solr 8.2 server.
> Used this guide to define Clustering
> https://carrot2.github.io/solr-integration-strategies/carrot2-3.6.3/index.html
>
>
> (Attached the config file)
>
> When running a query no real results are returned
> Included the file with the query and return results as seen in the admin
> GUI
>
> I will appreciate any suggestions.
>
> Thanks,
>
> [image: photo]
> *Nehemia Litterat*
> Our story
>
> +972-54-6609351 | nlitte...@gmail.com
>
> Skype: nlitterat
> <#m_-766721743052806071_SignatureSanitizer_SafeHtmlFilter_>
> 
> Please consider your environmental responsibility. Before printing this
> e-mail message, ask yourself whether you really need a hard copy.
> Create your own email signature
> 
>
-- 
Sent from my iPhone


Update schema.xml using solrj APIs

2011-12-21 Thread Ahmed Abdeen Hamed
Hello friend,

I am new to Solrj and I am wondering if there is a away you can update the
schema.xml file via the APIs.

I would appreciate any help.

Thanks very much,
-Ahmed


Re: Update schema.xml using solrj APIs

2011-12-22 Thread Ahmed Abdeen Hamed
Thanks everyone! That was very helpful.
-Ahmed

On Thu, Dec 22, 2011 at 5:15 AM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:

>
> Hi Ahmed,
>
> if you have a multi core setup, you could change the file
> programmatically (e.g. via XML parser), copy the new file to the
> existing one (programmatically, of course), then reload the core.
>
> I haven't reloaded the core programmatically, yet, but that should be
> doable via SolrJ. Or - if you are not using Java, then call the specific
> core admin URL in your programme.
>
> You will have to re-index after changing the schema.xml.
>
> Chantal
>
>
> On Thu, 2011-12-22 at 04:34 +0100, Otis Gospodnetic wrote:
> > Ahmed,
> >
> > At this point in time - no.  You need to edit it manually and restart
> Solr to see the changed.
> > This will change in the future.
> >
> > Otis
> > 
> > Performance Monitoring SaaS for Solr -
> http://sematext.com/spm/solr-performance-monitoring/index.html
> >
> >
> >
> > >
> > > From: Ahmed Abdeen Hamed 
> > >To: solr-user@lucene.apache.org
> > >Sent: Wednesday, December 21, 2011 4:12 PM
> > >Subject: Update schema.xml using solrj APIs
> > >
> > >Hello friend,
> > >
> > >I am new to Solrj and I am wondering if there is a away you can update
> the
> > >schema.xml file via the APIs.
> > >
> > >I would appreciate any help.
> > >
> > >Thanks very much,
> > >-Ahmed
> > >
> > >
> > >
>
>


Re: Using solr with the new TokenStream API

2009-12-16 Thread Ahmed El-dawy
I think the problem is that my jar file is added to the class path at run
time. This causes Class.forName to be not working correctly. Is there a way
to add this jar file to classpath during tomcat startup?

On Tue, Dec 15, 2009 at 8:42 PM, Ahmed El-dawy  wrote:

> Hi,
>  I'm using the new API provided with Lucene 2.9.1 for TokenStream. I mean
> the one that is using the decorator pattern. I made a new attribute called
> GlossAttribute with its implementation called GlossAttributeImpl. When I run
> it in a desktop application of mine it works correctly in both indexing and
> searching. However, when I used the same jar file with solr it throws an
> exception when trying to instantiate the attribute.
> I'm sure that the implementing class is there in the jar file but it seems
> to be looking in somewhere else. Do you have any solution?
>
>
> Here's the exception stack trace
>
> HTTP Status 500 - Could not find implementing class for
> gpl.pierrick.brihaye.aramorph.lucene.GlossAttribute 
> java.lang.IllegalArgumentException:
> Could not find implementing class for
> gpl.pierrick.brihaye.aramorph.lucene.GlossAttribute at 
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:79)
> at
> org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:64)
> at
> org.apache.lucene.analysis.TokenStream$TokenWrapperAttributeFactory.createAttributeInstance(TokenStream.java:149)
> at
> org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:224)
> at
> gpl.pierrick.brihaye.aramorph.lucene.ArabicStemmerReplicator.(ArabicStemmerReplicator.java:69)
> at
> gpl.pierrick.brihaye.aramorph.solr.ArabicStemmerReplicatorFactory.create(ArabicStemmerReplicatorFactory.java:18)
> at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:72)
> at
> org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74)
> at
> org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364)
> at
> org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:567)
> at
> org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:153)
> at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1449) at
> org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1337) at
> org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1265) at
> org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1254)
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:200) at
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at
> org.apache.solr.search.QParser.getQuery(QParser.java:131) at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:636)
>
> --
> regards,
> Ahmed Saad
>



-- 
regards,
Ahmed Saad


Re: Using the new tokenizer API from a jar file

2009-12-30 Thread Ahmed El-dawy
ar in the "lib" directory under your solr home
> > > (or refrence it using a  directive in your solrconfig.xml) Solr's
> > > plugin loader will take care of hte classloading for you.
> > >
> > > if you are confident you have your jar in the correct place, please
> > email
> > > solr-user with the ClassNotFound stack trace from your solr logs, as
> > well
> > > as hierarchy  of files from your solr home (ie: the output of "find .")
> > >
> > >
> > > -Hoss
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
regards,
Ahmed Saad


Master Slave Replication of Solr with Basic Authentication

2018-03-25 Thread Basheeruddin Ahmed (syedbahm)
Hello,
Seems even when we use Secuirty.json with BasicAuthentication Plugin as 
documented here -- 
https://lucene.apache.org/solr/guide/7_2/basic-authentication-plugin.html
, which nicely encrypts the user password using SHA256 encryption,  when it 
comes to configuring the slave in a Master/Slave Index Replication Strategy, 
the slave config requires to give the
BasicAuthentication password in plain text?  Is it something I got wrong?  But 
in my setup of HA with Master/Slave replication it works in this manner.

https://lucene.apache.org/solr/guide/7_2/index-replication.html  this also 
indicates the config is in plain text.



username
password


Please let me know how I can use the same encrypted password as in 
Security.json when setting up Master/Slave Replication for Solr.

Thx
-Syed Ahmed.